Thursday, May 11th 2017

NVIDIA GV100 Silicon Detailed

May 11th, 2017 00:49 Discuss (23 Comments)

NVIDIA at the GTC 2017 event, announced its next-generation "Volta" GPU architecture. As with its current "Pascal" architecture, "Volta" was unveiled in its biggest, most feature-rich implementation, the Tesla V100 HPC board, driven by the GV100 silicon. Given the HPC applications of NVIDIA's Tesla family of products, the GV100 has certain components that won't make it to the consumer GeForce family. Despite these, the GV100 is the pinnacle of NVIDIA's silicon engineering. According to the GPU block diagram released by the company, the GV100 has a similar component hierarchy to previous-generation NVIDIA chips, with some major changes to its basic number-crunching machinery, the streaming multiprocessor (SM).

The "Volta" streaming multiprocessor (SM) on the GV100 silicon features both FP32 and FP64 CUDA cores. Consumer graphics implementations of "Volta" which drive future GeForce products could lack those specialized FP64 cores. Each SM features 64 FP32 CUDA cores, and 32 FP64 cores. The FP64 cores can handle 32-bit, 16-bit, and even primitive 8-bit operations. The GV100 features 80 SMs, so you're looking at 5,120 FP32 and 2,560 FP64 CUDA cores. In addition, Volta introduces a component called Tensor cores, specialized machinery designed to speed up deep-learning training and neural net building. An SM has 8 of these, so the GV100 has 640. As with FP64 cores, Tensor cores may not make it to consumer-graphics implementations. Given its SM count, the GV100 features 320 TMUs. NVIDIA clocked the GV100 to run at 1455 MHz boost.

The Tesla V100 is advertised to offer 50% higher FP32 and FP64 peak performance over the "Pascal" based Tesla P100. Its peak FP32 throughput is rated at 15 TFLOP/s, with 7.5 TFLOP/s FP64 peak throughput. The Tensor cores "effectively" run at 120 TFLOP/s to perform their very specialized task of training deep-learning neural nets. These components feature matrix-matrix multiplication units, which is a key math operation in neural net training. They accelerate neural net building/training by 12X.

Built on the new 12 nanometer process, the GV100 is a multi-chip module with a large, 815 mm² GPU die, with a gargantuan transistor-count of 21.1 billion, neighbored by four 32 Gbit HBM2 memory stacks, which make up 16 GB of memory. These stacks interface with the GV100 over a 4096-bit wide memory interface, through a silicon interposer. At 1 GHz, this memory setup could cushion the GV100 with a memory bandwidth of 1 TB/s. HBM2 could still be exclusive to the Tesla family of products in NVIDIA's product-stack, as it continues to be expensive to implement in the consumer-segment for NVIDIA. Besides FP64 and Tensor cores, consumer implementations of "Volta" could feature inexpensive yet suitably fast GDDR6 memory. One of the pioneering manufacturers of HBM, SK Hynix, even demonstrated GDDR6 at GTC, so unless NVIDIA is fighting for its life in performance against AMD, we expect it to stick to GDDR6 in the consumer segment.

The Tesla V100 HPC card will be developed in two packages - integrated boards with NVLink interface for more high-density farm builds, and add-on card with PCI-Express interface for workstations. It will be sold through specialized retail channels.

Add your own comment

23 Comments on NVIDIA GV100 Silicon Detailed

Caring1

Good that they appear to be using the full speed HBM2 and not the slightly slower version.

DeathtoGnomes

The thing I hate about announcing new architecture is that they always say "some of these features wont be be available to consumers" Why the f*** say anything at all? idiots.

ratirt

Wonder how would the Volta consumer cards look like. This Volta Tesla seems pretty monstrosity to me.

DeathtoGnomesThe thing I hate about announcing new architecture is that they always say "some of these features wont be available to consumers" Why the f*** say anything at all? idiots.

Cause its not needed or it's just way too expensive for NV and consumers would not afford it. Looking at current top notch cards from NV would you pay let say 3 grand for a video card?

medi01

Transistor/mm2 figure didn't change much, hm.

DeathtoGnomesThe thing I hate about announcing new architecture is that they always say "some of these features wont be be available to consumers" Why the f*** say anything at all? idiots.

It's a 15k$ card aimed at certain use (not gaming) and some of it is not for consumers, exactly what is your problem with stating it?

ZoneDymo

medi01Transistor/mm2 figure didn't change much, hm.

It's a 15k$ card aimed at certain use (not gaming) and some of it is not for consumers, exactly what is your problem with stating it?

Ermm everyone who buys these cards are consumers, they consume the products.
Its not like only gamers are consumers.

Their point makes sense, its a bit like those concept cars that get shown on car shows with all kinds of nifty gadgets that never make it into actual production cars, what is the point then?
If this card actually has features that we cant ever get, then again, what is the point of making them or talking about them at all?

Vayra86

Inb4 Nvidia announces consumer GPUs

... with GDDR6 :)

You all know this is what its gonna be. Volta will be the usual 30-35% perf bump on each price point within the Geforce stack. From what I could read on GV100, all the new bits are for enterprise, not GFX.

With GDDR6 up to 16gb/s they have more than enough headroom to cover that perf bump, they could even stretch it out to the Volta Refresh seeing as 10gb/s > 16gb/s is +60%.

bug

DeathtoGnomesThe thing I hate about announcing new architecture is that they always say "some of these features wont be be available to consumers" Why the f*** say anything at all? idiots.

For the same reason you don't buy an army Hummer for your daily commute. But don't let that stand in the way of trolling.

@Vayra86 I'd love to see a 30-35% performance increase, but my gut feeling tells me Nvidia will try to milk it a little more. I hope I'm wrong.

Vayra86

bugFor the same reason you don't buy an army Hummer for your daily commute. But don't let that stand in the way of trolling.

@Vayra86 I'd love to see a 30-35% performance increase, but my gut feeling tells me Nvidia will try to milk it a little more. I hope I'm wrong.

To be fair, Pascal did a little more than 30% in many cases on high end, and also had an increased price point to go with that. Nvidia's milking every % they give you, so you're not gonna be wrong.

Caring1

bugFor the same reason you don't buy an army Hummer for your daily commute.

Because it is illegal to own one?

#10

bug

Caring1Because it is illegal to own one?

Because it has hardware that does stuff you don't need ;)

#11

DeathtoGnomes

ratirtWonder how would the Volta consumer cards look like. This Volta Tesla seems pretty monstrosity to me.

Cause its not needed or it's just way too expensive for NV and consumers would not afford it. Looking at current top notch cards from NV would you pay let say 3 grand for a video card?

If it dances a jig and sings Hallelujah, and my wallet found some spare fold full of cash, hell ya. But as @bug says $NVDA is milking gamers for every penny of performance by teasing features that appear to be built into every card sold, but disabled cuz gamers can afford that extra feature that prolly wont be developed into anything useful anyways. So ya why both effin talking about something thats meant to just tease those with small, limited wallets.

:lovetpu:

#12

bug

DeathtoGnomesIf it dances a jig and sings Hallelujah, and my wallet found some spare fold full of cash, hell ya. But as @bug says $NVDA is milking gamers for every penny of performance by teasing features that appear to be built into every card sold, but disabled cuz gamers can afford that extra feature that prolly wont be developed into anything useful anyways. So ya why both effin talking about something thats meant to just tease those with small, limited wallets.

:lovetpu:

Please don't put words in my mouth, I never said that. I never even implied it.

#13

DeathtoGnomes

bugFor the same reason you don't buy an army Hummer for your daily commute. But don't let that stand in the way of trolling.

@Vayra86 I'd love to see a 30-35% performance increase, but my gut feeling tells me Nvidia will try to milk it a little more. I hope I'm wrong.

sorry if I misread your intent here.

#14

bug

DeathtoGnomessorry if I misread your intent here.

I meant, they may try to deliver the smallest possible performance increment with the smallest die they can use. Thus saving costs and keeping something in store for future iterations. Then again, I expected the same for Pascal and I was wrong.

Plus, consumer Pascal doesn't have disabled FP64 units. It's a different silicon, built without them. See: forums.anandtech.com/threads/gp100-and-gp104-are-different-architectures.2473319/ (resources were even added in the consumer chip, where it made sense)

#15

idx

Basically the new GTX cards going to be :

GTX**80 - 3584 SP
GTX**70 - 2688 SP
GTX**60 - 1792 SP

Expect more restrictions on clock speed.

EDIT:
GTX**80Ti will be whatever leftover of what nvidia can't sell as GV100.
GTX**50 will be something much smaller.. I guess.

#16

btarunr

Editor & Senior Moderator

idxBasically the new GTX cards going to be :

GTX**80 - 3584 SP
GTX**70 - 2688 SP
GTX**60 - 1792 SP

Expect more restrictions on clock speed.

EDIT:
GTX**80Ti will be whatever leftover of what nvidia can't sell as GV100.
GTX**50 will be something much smaller.. I guess.

I'm predicting these CUDA core counts:

GTX 2080: 3,072
GTX 2070: 2,688
GTX 2060: 1,536

#17

medi01

ZoneDymoErmm everyone who buys these cards are consumers, they consume the products.

consumer - a person who purchases goods and services for personal use.

says google.

#18

idx

btarunrI'm predicting these CUDA core counts:
GTX 2080: 3,072
GTX 2070: 2,688
GTX 2060: 1,536

I know what you thinking, if they do disable 4 SMs just like they did with the GV100 then indeed thats exactly what are we going to see.
Nvidia may do that for 2 reasons:

Sell as much yields as possible.
Locking the performance ( in case if these gpus can clock really high ? ).

EDIT: I don't think they are going to leave the GTX 2070 and GTX 2080 that close in configuration ( Nvidia was so bothered by those who did overclock their GTX970s ... remember).
If they are going to disable 4 SMs from the GTX 2080, then they are probably going to disable more than 25% of the GTX 2070.

#19

bug

btarunrI'm predicting these CUDA core counts:
GTX 2080: 3,072
GTX 2070: 2,688
GTX 2060: 1,536

They can release the cards with a single shader. I only care about overall performance :D

#20

jabbadap

Hmm just trying to remember that summit information. Was it 40TFlops of fp64 per node? So there's probably 6*V100 teslas per node(GV100 has six nvlinks vs four of GP100). That would be at maximum steam 45Tflops though, but I presume there will be some sort of throttled base clock to keep temps and power sane. And yeah of course that power 9 offers some TFlops grunt too.

#21

TheoneandonlyMrK

looks like Q3 is the earliest we will see them ish ..

And now, Jensen announces NVIDIA DGX-1 with eight Telsa v100. It’s labeled on the slide as the “essential instrument of AI research. What used to take a week now takes a shift. It replaces 400 servers. It offers 960 tensor TFLOPS. It will ship in Q3. It will cost $149,000. He notes that if you get one now powered by Pascal, you’ll get a free upgrade to Volta.

Turns out, there’s also a small version of DGX-1, DGXX Station. Think of it as a personal sized one. It’s liquid cooled and whisper quiet. Every one of our deep learning engineers has one.

It has four Tesla V100s. It’s $69K. Order it now and we’ll deliver it in Q3. “So place your order now,” he avers. via NVIDIA

#22

DeathtoGnomes

I want AI software to play with, maybe I'll come up $69k after that. :cool::kookoo:

#23

Kanan

Tech Enthusiast & Gamer

Thanks for this thorough, non-childish actually adult, news article. This is why I'm here.

PS. On speculation for GTX 1200 series (probable name, not "2080", skipping 10 gens) I think new Titan Xv will have 4096-4584 or even up to 5120 cores (fully activated chip). GTX 1280 could have 3072 to 3584 shaders, 2560-3072 should be new GTX 1270 (partly deactivated chip) making these pretty powerful at the 400-600 dollar range, 4k gaming will be a easy thing by then, being a "normal" high end gamer. Enthusiasts will have over 100 fps for 4k without using SLI.

Add your own comment

NVIDIA GV100 Silicon Detailed

23 Comments on NVIDIA GV100 Silicon Detailed

Latest GPU Drivers

New Forum Posts

Popular Reviews

TPU on YouTube

Controversial News Posts

NVIDIA GV100 Silicon Detailed

Related News

23 Comments on NVIDIA GV100 Silicon Detailed

Latest GPU Drivers

New Forum Posts

Popular Reviews

TPU on YouTube

Controversial News Posts