NVIDIA GA100 Scalar Processor Specs Sheet Released

T4C Fantasy · May 14, 2020

theoneandonlymrk said:
From others this is the A100 not GA100

The GA100 is the full fat 8192 GPU.

There is no difference, A100 is just the Tesla name it uses a GA100

IceShroom · May 14, 2020

400W??? Isn't Nvidia suppose to be efficient??

TheoneandonlyMrK · May 14, 2020

T4C Fantasy said:
There is no difference, A100 is just the Tesla name it uses a GA100

I don't know about no difference one's cut down and the price will vary , so they carry the same name though, weird.

Fluffmeister · May 14, 2020

IceShroom said:
400W??? Isn't Nvidia suppose to be efficient??

Compared to what exactly?

T4C Fantasy · May 14, 2020

theoneandonlymrk said:
I don't know about no difference one's cut down and the price will vary , so they carry the same name though, weird.

The different one will be GA102, No HBM but just as many cuda cores more or less.

But there won't be 2 different 100s, technically that is what the 102 is.

IceShroom · May 14, 2020

Fluffmeister said:
Compared to what exactly?

Compared to AMD. Nvidia's 12nm GPUs have same efficiency of AMD's 7nm GPUs, as a result Nvidia's 7nm GPU's should be more efficient.

Fluffmeister · May 14, 2020

IceShroom said:
Compared to AMD. Nvidia's 12nm GPUs have same efficiency of AMD's 7nm GPUs, as a result Nvidia's 7nm GPU's should be more efficient.

So your comparing a 10.3 billion transistor Navi to a 54 billion transistor 40GB HBM2 HPC AI compute monster.

Got ya.

dyonoctis · May 14, 2020

IceShroom said:
400W??? Isn't Nvidia suppose to be efficient??

For what it's supposed to be the perf/watt ratio is actually great. A Single rack of a DGX A100 can replace several old racks.
From this :

To this:

Fluffmeister · May 14, 2020

A picture paints a thousand words, thank you.

Vya Domus · May 14, 2020

T4C Fantasy said:
But there won't be 2 different 100s, technically that is what the 102 is.

But this one has an entire GPC disabled due to horrendous yields, I presume, and probably because it would throw even that eye watering 400W TDP out the window. There has to be one fully enabled chip right ? One would assume there would be different 100s.

To be honest this is borderline Thermi 2.0, a great compute architecture that can barley be implemented in actual silicon due to power and yields. These aren't exactly Nvidia's brightest hours in terms of chip design, it seems like they bit more than what they could chew, the chip was probably cut down in a last minute decision.

Suffice to say I doubt we'll see the full 8192 shaders in any GPU this generation, I doubt they could realistically fit that in a 250W power envelope and it seems like GA100 runs at 1.4 Ghz, no change from Volta nor from Turing probably. Let's see 35% more shaders than Volta but 60% more power and same clocks. It's not shaping up to be the "50% more efficient and 50% faster per SM" some hoped for.

Fluffmeister · May 14, 2020

Well they can scrap the FP64 performance that the 5700XT offers in FP32 for starters so that is a bonus, with the TU102 being 18.6 billion transistors I'd suggest they have wiggle room. Just a thought.

RH92 · May 14, 2020

theoneandonlymrk said:
Some of what your saying is wrong ,it takes up quite a lot of die space relatively hence Nvidia's large die sizes which are added to by the requirements of extra cache resources and hardware needed to keep the special units busy.

I'm afraid you are wrong . The myth that larger die sized are correlated to fixed function hardware has been already debunked , im trying to find the source , might be TPU , Anandtech , or Youtube but it might take time until i find it so i will link it here ASAP .

There is no real correlation between die size increase and fixed function as the latter eats relatively very low die space , more likely than not the higher die size in Turing is explained by the fact that it has more SMs .

This is further backed up by GA100 which has increased dies size compared to GV100 ( 826mm^2 vs 815mm^2 ) but significantly lower TensorCore count ( 432 vs 640 ) . So it is pretty obvious that fixed function hardware is not responsible for the die size expansion !

theoneandonlymrk said:
The other reason being because they can, and to make more money, it's not rocket science just business, people should have chosen with their wallet's.

This was exactly my point , the only tangible argument that justifies higher prices for Turing ( other than the increased silicon size ) is because the lack of competition allows them to do so .

M2B · May 14, 2020

Vya Domus said:
. These aren't exactly Nvidia's brightest hours in terms of chip design

These are exactly Nvidia's brightest hours in terms of chip design.
The A100 packs 54 billion transistors, 2.5 times as much as a V100, and those transistors aren't there for nothing.
You can't just compare SM counts and base stupid assumptions upon that. The A100 is clearly a much more efficient solution for what it's been designed for.

TheoneandonlyMrK · May 14, 2020

RH92 said:
I'm afraid you are wrong . The myth that larger die sized are correlated to fixed function hardware has been already debunked , im trying to find the source , might be TPU , Anandtech , or Youtube but it might take time until i find it so i will link it here ASAP .

There is no real correlation between die size increase and fixed function as the latter eats relatively very low die space , more likely than not the higher die size in Turing is explained by the fact that it has more SMs .

This is further backed up by GA100 which has increased dies size compared to GV100 ( 826mm^2 vs 815mm^2 ) but significantly lower TensorCore count ( 432 vs 640 ) . So it is pretty obvious that fixed function hardware is not responsible for the die size expansion !

This was exactly my point , the only tangible argument that justifies higher prices for Turing ( other than the increased silicon size ) is because the lack of competition allows them to do so .

We disagree , so be it.

Vya Domus · May 14, 2020

M2B said:
These are exactly Nvidia's brightest hours in terms of chip design.

Why is almost 20% of the chip disabled then ? That's great design, right ?

M2B said:
You can't just compare SM counts and base stupid assumptions upon that.

Comparing SM counts and power is a totally legit way of inferring efficiency, how else would you do it? The SMs aren't same, but that's the point, efficiency wouldn't come just from the node.

M2B said:
those transistors aren't there for nothing.

Guess what buddy, some of them are for nothing, I'd say about 8-9 billion give or take.

Let's face reality, they couldn't enable the entire chip because of power constraints. Making a chip like that isn't desirable, it's painfully obvious they've missed their target by miles.

M2B · May 14, 2020

Vya Domus said:
Why is almost 20% of the chip disabled then ? That's great design, right ?

Comparing SM counts and power is a totally legit way of inferring efficiency, how else would you do it, smart ass ? The SMs aren't same, but that's the point, efficiency wouldn't come just from the node.

Guess what buddy, some of them are for nothing, I'd say about 9 billion give or take.

Look at this clueless person acting like he really knows how to design GPUs better than a 200$ billion company which have been designing GPUs for ages.
So, based on your logic the Vega 56 is a more efficient GPU than AMD's latest and greatest 5700 XT, because it has more TFLOPS and much more compute units, and consumes similar amounts of power, right?
Based on the density figures, I think Nvidia is using TSMC's high-density version of their 7nm node, not the high-performance one, and that was not the case with previous generations.
They could just use the normal high performance version and scale up the GV100 chip, but they clearly needed more density for their design goals.
What I'm saying is that you have to see how the chip performs in applications that actully matter and base efficiency figures upon that, not just some raw numbers.

Vya Domus · May 14, 2020

M2B said:
Look at this clueless person acting like he really knows how to design GPUs better a 200$ billion company which have been designing GPUs for ages.
So, based on your logic the Vega 56 is a more efficient GPU than AMD's latest and greatest 5700 XT, because it has more TFLOPS and much more compute units, and consumes similiar amounts of power,

Because you do know how to design a GPU, right ? Sorry your GPU architect badge must have fallen off.

M2B said:
So, based on your logic the Vega 56 is a more efficient GPU than AMD's latest and greatest 5700 XT, because it has more TFLOPS and much more compute units, and consumes similiar amounts of power,

Nope, that's based on your logic. Your understanding of what I said was obviously severely limited.

First of all Vega 56 uses more power, and runs at lower clocks. A legendary GPU architect like yourself would know that a larger processor at lower clocks runs more efficiently because shaders scale relativity linearly with power whereas a change in clocks incurs a change in voltage which isn't linear. In other words if let's say we have a GPU with N/2 shaders at 2 Ghz it will generally consume more power than a GPU with N shaders at 1 Ghz.

Let's compile that with how Navi works : RX 5700XT runs at a considerably higher voltages and clocks and has way less shaders and yet it generates a similar amount of FP32 compute with less power. It's obviously way more efficient architecturally, but as I already mentioned I am sure a world renowned GPU architect as yourself knew all that.

On the other hand, Volta and Ampere run at pretty much the same frequency and likely similar voltages since TSMC's 7nm doesn't seem to change that in any significant manner (in fact all 7nm CPU/GPU up until know seem to run at the same or even higher voltages), GA100 has 20% more shaders compared to V100 but also consumes 60% more power. It doesn't take much to see that efficiency isn't that great. It's not that hard to infer these things, don't overestimate their complexity.

Yes, I am sure when you factor in Nvidia's novel floating point formats it looks great, but if you look just at FP32, it's doesn't look great. It's rather mediocre. Do you not find it strange that our boy Jensen never once mentioned FP32 performance ?

I never said I knew how to design it better, stop projecting made up staff onto me. I said it was obvious they failed to do what they originally set out to do, hence why a considerable porton of the chip is fused off. They've done it in the past too.

Caring1 · May 14, 2020

Vya Domus said:
By the way I've just noticed the power , 400W, that's 150W over V100. Ouch, 7nm hasn't been kind, I was right that this is a power hungry monster.

Plot twist.
Jensen wasn't baking it in his oven, he used them to heat his oven.

chodaboy19 · May 14, 2020

I think this is exactly what data center customers want and have been asking for, these will sell like hot cakes to the big cloud operators.

Dante Uchiha · May 15, 2020

RH92 said:
I'm afraid you are wrong . The myth that larger die sized are correlated to fixed function hardware has been already debunked , im trying to find the source , might be TPU , Anandtech , or Youtube but it might take time until i find it so i will link it here ASAP .

There is no real correlation between die size increase and fixed function as the latter eats relatively very low die space , more likely than not the higher die size in Turing is explained by the fact that it has more SMs .

This is further backed up by GA100 which has increased dies size compared to GV100 ( 826mm^2 vs 815mm^2 ) but significantly lower TensorCore count ( 432 vs 640 ) . So it is pretty obvious that fixed function hardware is not responsible for the die size expansion !

This was exactly my point , the only tangible argument that justifies higher prices for Turing ( other than the increased silicon size ) is because the lack of competition allows them to do so .

How do these huge tensor cores do not take up space and increase the die size ? Maybe this will help to understand the relationship between die size, yields and GPU cost.

https://www.reddit.com/r/nvidia/comments/99r2x3

CALY Technologies

caly-technologies.com

MuhammedAbdo · May 15, 2020

Vya Domus said:
Very unimpressive FP32 and FP64 performance, I was way off in my estimations. Again, it's a case of optimizing for way too many things. So much silicon is dedicated to non traditional performance metrics that I wonder if it makes sense trying to shove everything in one package.

GA 100 is 20X faster than V100 in AI workloads and 2.5X in FP64 workloads, that's a generational leap like no other. This is an AI optimized chip, it has no RT cores, no encoders and no display connectors, it's focus is mainly on AI training and inference, for which it provides stellar performance that crushes any hope of competition in the near future. And you are comparing regular crap like FP32 and FP64?

Alright A100 provides 156 TF FP32 compared to only 15 TF in V100. That alone is 10X increase in FP32 compute power without the need to change any code. They can extend that lead to 20X through sparse network optimizations to 312 TF of FP32 without code change.

In FP16 the increase is also 2.5X in non optimzied code, and 6X in optimized code, same for INT8 and INT4 numbers, so A100 is really several orders of magnitude faster than V100 in any AI workload.

Also the 400w of power consumption is nothing relative to the size of this monster, you have 40GB of HBM2, loads of NVLink connections, loads of tensor cores that take up die area, heat and power, the chip is also cut down (which means lost power consumption), also the trend in data centers and AI is to open power consumption up to allow for more comfortable performance, V100 reached 350W in it's second iteration and 450W in its third iteration.

You seem to lack any ounce of data center experience, so I just suggest you stick to the of analysis consumer GPUs. This isn't your area.

Vya Domus · May 15, 2020

MuhammedAbdo said:
You seem to lack any ounce of data center experience, so I just suggest you stick to the of analysis consumer GPUs.

I'll stick with whatever the hell I want, thanks. You, copy pasting boiler plate from Nvidia's website can be considered anything but an "analysis". What are you, a sales man ? You're barking at the wrong tree buddy.

MuhammedAbdo said:
GA 100 is 20X faster than V100 in AI workloads and 2.5X in FP64 workloads

It turns out I overestimated your ability to copy paste information, you can't even do that :

9.7 / 7.8 = 1.24X (FP64)

Or maybe Jensen did a good job deceiving the less tech literate with their fine print by mixing together FP64 with FP64 TF.

Nice paint skills by the way.

Bytales · May 15, 2020

Mark Little said:
Figuring out how they get 40 GB from 6 HBM stacks is a little confusing.

They cheaped out, instead of offering 6 modules of 8 gb, for a total of 48 gb, they wen for higher margins. Will offer a better improved version with full 48 gb memory, 25 mhz more, on core and memory, for 5000 dollar more. Dunn worry about it.

Is so typical of nvidia.

Mark Little said:
Figuring out how they get 40 GB from 6 HBM stacks is a little confusing.

They cheaped out, instead of offering 6 modules of 8 gb, for a total of 48 gb, they wen for higher margins. Will offer a better improved version with full 48 gb memory, 25 mhz more, on core and memory, for 5000 dollar more. Dunn worry about it.

Is so typical of nvidia.

MuhammedAbdo · May 15, 2020

Vya Domus said:
9.7 / 7.8 = 1.24X (FP64)

Or maybe Jensen did a good job deceiving the less tech literate with their fine print.

Hey genius, I already provided you with a chart explaining all the metrics, good to know you can't read.

FP64 from Tensor cores is 19.5TF. Which is a 2.5X increase over V100. FP64 from CUDA cores is 9.7TF. If you can use both at the same time you will get about 30TF of FP64 for AI actually.

Vya Domus said:
You, copy pasting boiler plate from Nvidia's website can be considered anything but an "analysis"

It's much more meaningful than the ignorant job you did, analysing regular FP32/FP64 in an AI GPU. Talk about an extreme case of stuff that are way over your head.

Vya Domus · May 15, 2020

MuhammedAbdo said:
Hey genius, I already provided you with a chart explaining all the metrics, good to know you can't read.

FP64 from Tensor cores is 19.5TF. Which is a 2.5X increase over V100. FP64 from CUDA cores is 9.7TF. If you can use both at the same time you will get about 30TF of FP64 for AI actually.

You're so cute when you try to explain your utter lack of understanding about these metrics.

You wrote "FP64 workloads", you ｇｅｎｉｕｓ. That's pure FP64 not tensor ops, you're clueless and stubborn.

System Name	Whaaaat Kiiiiiiid!
Processor	Intel Core i9-12900K @ Default
Motherboard	Gigabyte Z690 AORUS Elite AX
Cooling	Corsair H150i AIO Cooler
Memory	Corsair Dominator Platinum 32GB DDR4-3200
Video Card(s)	EVGA GeForce RTX 3080 FTW3 ULTRA @ Default
Storage	Samsung 970 PRO 512GB + Crucial MX500 2TB x3 + Crucial MX500 4TB + Samsung 980 PRO 1TB
Display(s)	27" LG 27MU67-B 4K, + 27" Acer Predator XB271HU 1440P
Case	Thermaltake Core X9 Snow
Audio Device(s)	Logitech G935 Headset
Power Supply	SeaSonic Platinum 1050W Snow Silent
Mouse	Logitech G903 Lightspeed
Keyboard	Logitech G915
Software	Windows 11 Pro
Benchmark Scores	FFXV: 19329

Processor	Intel Core i5 4590
Motherboard	Gigabyte Z97x Gaming 3
Cooling	Intel Stock Cooler
Memory	8GiB(2x4GiB) DDR3-1600 [800MHz]
Video Card(s)	XFX RX 560D 4GiB
Storage	Transcend SSD370S 128GB; Toshiba DT01ACA100 1TB HDD
Display(s)	Samsung S20D300 20" 768p TN
Case	Cooler Master MasterBox E501L
Audio Device(s)	Realtek ALC1150
Power Supply	Corsair VS450
Mouse	A4Tech N-70FX
Software	Windows 10 Pro
Benchmark Scores	BaseMark GPU : 250 Point in HD 4600

System Name	RyzenGtEvo/ Asus strix scar II
Processor	Amd R5 5900X/ Intel 8750H
Motherboard	Crosshair hero8 impact/Asus
Cooling	360EK extreme rad+ 360$EK slim all push, cpu ek suprim Gpu full cover all EK
Memory	Corsair Vengeance Rgb pro 3600cas14 16Gb in four sticks./16Gb/16GB
Video Card(s)	Powercolour RX7900XT Reference/Rtx 2060
Storage	Silicon power 2TB nvme/8Tb external/1Tb samsung Evo nvme 2Tb sata ssd/1Tb nvme
Display(s)	Samsung UAE28"850R 4k freesync.dell shiter
Case	Lianli 011 dynamic/strix scar2
Audio Device(s)	Xfi creative 7.1 on board ,Yamaha dts av setup, corsair void pro headset
Power Supply	corsair 1200Hxi/Asus stock
Mouse	Roccat Kova/ Logitech G wireless
Keyboard	Roccat Aimo 120
VR HMD	Oculus rift
Software	Win 10 Pro
Benchmark Scores	8726 vega 3dmark timespy/ laptop Timespy 6506

Processor	AMD Ryzen 7 3700X
Motherboard	MSI MAG B550 TOMAHAWK
Cooling	AMD Wraith Prism
Memory	Team Group Dark Pro 8Pack Edition 3600Mhz CL16
Video Card(s)	NVIDIA GeForce RTX 3080 FE
Storage	Kingston A2000 1TB + Seagate HDD workhorse
Display(s)	Samsung 50" QN94A Neo QLED
Case	Antec 1200
Power Supply	Seasonic Focus GX-850
Mouse	Razer Deathadder Chroma
Keyboard	Logitech UltraX
Software	Windows 11

System Name	Whaaaat Kiiiiiiid!
Processor	Intel Core i9-12900K @ Default
Motherboard	Gigabyte Z690 AORUS Elite AX
Cooling	Corsair H150i AIO Cooler
Memory	Corsair Dominator Platinum 32GB DDR4-3200
Video Card(s)	EVGA GeForce RTX 3080 FTW3 ULTRA @ Default
Storage	Samsung 970 PRO 512GB + Crucial MX500 2TB x3 + Crucial MX500 4TB + Samsung 980 PRO 1TB
Display(s)	27" LG 27MU67-B 4K, + 27" Acer Predator XB271HU 1440P
Case	Thermaltake Core X9 Snow
Audio Device(s)	Logitech G935 Headset
Power Supply	SeaSonic Platinum 1050W Snow Silent
Mouse	Logitech G903 Lightspeed
Keyboard	Logitech G915
Software	Windows 11 Pro
Benchmark Scores	FFXV: 19329

NVIDIA GA100 Scalar Processor Specs Sheet Released

T4C Fantasy

CPU & GPU DB Maintainer

IceShroom

TheoneandonlyMrK

Fluffmeister

T4C Fantasy

CPU & GPU DB Maintainer

IceShroom

Fluffmeister

dyonoctis

Fluffmeister

Vya Domus

Fluffmeister

RH92

M2B

TheoneandonlyMrK

Vya Domus

M2B

Vya Domus

Caring1

chodaboy19

Dante Uchiha

CALY Technologies

MuhammedAbdo

Vya Domus

Bytales

MuhammedAbdo

Vya Domus

Processor	AMD Ryzen 3700x
Motherboard	asus ROG Strix B-350I Gaming
Cooling	Deepcool LS520 SE
Memory	crucial ballistix 32Gb DDR4
Video Card(s)	RTX 3070 FE
Storage	WD sn550 1To/WD ssd sata 1To /WD black sn750 1To/Seagate 2To/WD book 4 To back-up
Display(s)	LG GL850
Case	Dan A4 H2O
Audio Device(s)	sennheiser HD58X
Power Supply	Corsair SF600
Mouse	MX master 3
Keyboard	Master Key Mx
Software	win 11 pro

System Name	Good enough
Processor	AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard	ASRock B650 Pro RS
Cooling	2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory	32GB - FURY Beast RGB 5600 Mhz
Video Card(s)	Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage	1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s)	LG UltraGear 32GN650-B + 4K Samsung TV
Case	Phanteks NV7
Power Supply	GPS-750C

Processor	RYZEN 7 5800X3D
Motherboard	Aorus B-550I Pro AX
Cooling	HEATKILLER IV PRO , EKWB Vector FTW3 3080/3090 , Barrow res + Xylem DDC 4.2, SE 240 + Dabel 20b 240
Memory	Viper Steel 4000 PVS416G400C6K
Video Card(s)	EVGA 3080Ti FTW3
Storage	XPG SX8200 Pro 512 GB NVMe + Samsung 980 1TB
Display(s)	Dell S2721DGF
Case	NR 200
Power Supply	CORSAIR SF750
Mouse	Logitech G PRO
Keyboard	Meletrix Zoom 75 GT Silver
Software	Windows 11 22H2

Processor	Intel Core i5-8600K @4.9GHz
Motherboard	MSI Z370 Gaming Pro Carbon
Cooling	Cooler Master MasterLiquid ML240L RGB
Memory	XPG 8GBx2 - 3200MHz CL16
Video Card(s)	Asus Strix GTX 1080 OC Edition 8G 11Gbps
Storage	2x Samsung 850 EVO 1TB
Display(s)	BenQ PD3200U
Case	Thermaltake View 71 Tempered Glass RGB Edition
Power Supply	EVGA 650 P2

System Name	H7 Flow 2024
Processor	AMD 5800X3D
Motherboard	Asus X570 Tough Gaming
Cooling	Custom liquid
Memory	32 GB DDR4
Video Card(s)	Intel ARC A750
Storage	Crucial P5 Plus 2TB.
Display(s)	AOC 24" Freesync 1m.s. 75Hz
Mouse	Lenovo
Keyboard	Eweadn Mechanical
Software	W11 Pro 64 bit

System Name	Avell old monster - Workstation T1 - HTPC
Processor	i7-3630QM\i7-5960x\Ryzen 3 2200G
Cooling	Stock.
Memory	2x4Gb @ 1600Mhz
Video Card(s)	HD 7970M \ EVGA GTX 980\ Vega 8
Storage	SSD Sandisk Ultra li - 480 GB + 1 TB 5400 RPM WD - 960gb SDD + 2TB HDD