Unwrapping the NVIDIA B200 and GB200 AI GPU Announcements

btarunr · Mar 19, 2024

NVIDIA on Monday, at the 2024 GTC conference, unveiled the "Blackwell" B200 and GB200 AI GPUs. These are designed to offer an incredible 5X the AI inferencing performance gain over the current-gen "Hopper" H100, and come with four times the on-package memory. The B200 "Blackwell" is the largest chip physically possible using existing foundry tech, according to its makers. The chip is an astonishing 208 billion transistors, and is made up of two chiplets, which by themselves are the largest possible chips.

Each chiplet is built on the TSMC N4P foundry node, which is the most advanced 4 nm-class node by the Taiwanese foundry. Each chiplet has 104 billion transistors. The two chiplets have a high degree of connectivity with each other, thanks to a 10 TB/s custom interconnect. This is enough bandwidth and latency for the two to maintain cache coherency (i.e. address each other's memory as if they're their own). Each of the two "Blackwell" chiplets has a 4096-bit memory bus, and is wired to 96 GB of HBM3E spread across four 24 GB stacks; which totals to 192 GB for the B200 package. The GPU has a staggering 8 TB/s of memory bandwidth on tap. The B200 package features a 1.8 TB/s NVLink interface for host connectivity, and connectivity to another B200 chip.

NVIDIA also announced the Grace-Blackwell GB200 Superchip. This is a module that has two B200 GPUs wired to an NVIDIA Grace CPU that offers superior serial processing performance than x86-64 based CPUs from Intel or AMD; and an ISA that's highly optimized for NVIDIA's AI GPUs. The biggest advantage of the Grace CPU over an Intel Xeon Scalable or AMD EPYC has to be its higher bandwidth NVLink interconnect to the GPUs, compared to PCIe connections for x86-64 hosts. NVIDIA appears to be carrying over the Grace CPU from the GH200 Grace-Hopper Superchip.

NVIDIA did not disclose the counts of the various SIMD components such as streaming multiprocessors per chiplet, CUDA cores, Tensor cores, or on-die cache sizes, but made performance claims. Each B200 chip provides 20 PFLOPs (that's 20,000 TFLOPs) of AI inferencing performance. "Blackwell" introduces NVIDIA's 2nd generation Transformer engine, and 6th generation Tensor core, which supports FP4 and FP6. The 5th Gen NVLink interface not just scales up within the node, but also scales out to up to 576 GPUs. Among NVIDIA's performance claims for the GB200 are 20 PFLOPs FP4 Tensor (dense), and 40 PFLOPs FP4 Tensor (sparse); 10 PFLOPs FP8 Tensor (dense); and 20 PFLOPs FP8 Tensor (sparse); 5 PFLOPs Bfloat16 and FP16 (10 PFLOPs with sparsity); and 2.5 PFLOPs TF32 Tensor (dense) with 5 PFLOPs (sparse). As a high-precision compute accelerator (FP64), the B200 provides 90 TFLOPs, which is a 3x increase over that of the GH200 "Hopper."

NVIDIA is expected to ship the B100, B200, and GB200, and their first-party derivatives, such as the SuperPODs, later this year.

View at TechPowerUp Main Site | Source

Redwoodz · Mar 19, 2024

10 yrs ago who could have known Jensen would make a chip the size of his leather jacket. What have we come to...

One thing is 10 different hardware announcements about the same product in one day. One way to make me turn green, from sickness. Someone needs to peruse Nvidia's payroll!

Minus Infinity · Mar 19, 2024

Tom's Hardware did a much better job. They showed us all the caveats in Nvidia's "massive" performance increases. Actual performance increase is about 25% per chip. You can also quarter the stated claims immediately as it uses two chips and they quote fp4 (yes that's a thing apparently) rather than fp8 performance of H200, GH200. Huang is no better than Musk

The Quim Reaper · Mar 19, 2024

And PC Gamers will get the butchererd, cut down, runts of the litter, foundry reject chips to go into the 5090s and be overcharged $2000 for the privilege.

..and the Pc Gamer FOMO sucker crowd, will lap them up.

mb194dc · Mar 19, 2024

Man must people who loaded their data centers up with the h100 feel like chumps?

Legacy-ZA · Mar 19, 2024

The Quim Reaper said:
And PC Gamers will get the butchererd, cut down, runts of the litter, foundry reject chips to go into the 5090s and be overcharged $2000 for the privilege.

..and the Pc Gamer FOMO sucker crowd, will lap them up.

It's a story that wants to make you cry, especially when you know nVidia became the monolith it is today, due to gamers. The GPU prices are still atrocious.

Kohl Baas · Mar 19, 2024

The Quim Reaper said:
..and the Pc Gamer FOMO sucker crowd, will lap them up.

Let's just call them the 12-inch-D pocket gamers. Or simply: Whales...

...well, apart from quality streamers whom oh so naively use these overpowered PCs to create the ultimate viewing experience instead of the ever so popular formulae of Tits&Asses.

N/A · Mar 19, 2024

for gaming GPUs we get a more advanced N3 node, but it's in the same fab18 as N4, so AI is the enemy in the end.
reticle limit is haved again for N1.x node so this interconnect really paves the way for big dies like the 90-class in the future.
it may even explain why 5080 is exactly the half of 5090, the latter is two 5080s with 12288 CUDA/256 bit memory glued together.

lemonadesoda · Mar 19, 2024

Minus Infinity said:
You can also quarter the stated claims immediately as it uses two chips and they quote fp4 (yes that's a thing apparently) rather than fp8 performance of H200, GH200. Huang is no better than Musk

Fp4! I dread to think about the type 1 and type 2 errors that can occur with ultra-low precision nibble Artificial Inference. It is such a blunt tool. If it’s a nail, it will work. If it’s a screw it won’t. And will the “users” of the Ai output have any clue

Legacy-ZA · Mar 19, 2024

N/A said:
for gaming GPUs we get a more advanced N3 node, but it's in the same fab18 as N4, so AI is the enemy in the end.
reticle limit is haved again for N1.x node so this interconnect really paves the way for big dies like the 90-class in the future.
it may even explain why 5080 is exactly the half of 5090, the latter is two 5080s with 12288 CUDA/256 bit memory glued together.

It explains the power requirement rumors I heard too, which was 1000W for the GB200 AI GPUs.

So, I expect the RTX5090 to pull around 500W, thus the new power connector makes sense, which supplies 600W, which leaves just a little headroom for overclocking while still being within the specified limits. (I still don't like the new connector, it doesn't inspire confidence, and the old 8-pins work)

I am rambling; that is a lot of money power-wise, never mind having to sit in an oven, sure it's comfy in the winter, but not so much in the summer, AC's are expensive to run in some countries, so I always look for the best Priced/Power/Performance GPU near the 200-250W mark. I have to take into account the wattage from my CPU and other components, the whole PC itself should not draw more than 500W under full load, it's why I like to under-volt and get the best clocks for said under-volt, it's more fun too than overclocking in my opinion.

lemonadesoda said:
Fp4! I dread to think about the type 1 and type 2 errors that can occur with ultra-low precision nibble Artificial Inference. It is such a blunt tool. If it’s a nail, it will work. If it’s a screw it won’t. And will the “users” of the Ai output have any clue

I am not so much up to date with the new A.I business side of things, but I wonder if one can use the said A.I, to take an old game that is loved and re-render the game, say, in the U5 engine?

Would love for this to happen to some golden oldies that haven't received re-makes yet and are dear to my heart, all raytraced, the light that got baked in has become much better over time in games, however, I always notice the inconsistencies, so I just love that this is finally a thing, hopefully, leatherjacket man will give us more Tensor cores so the raytracing tasks can be managed easier, reminds me of the tesselation days.

Games like:

Deus-Ex 1999
Freelancer
KotoR 1 & 2
Vampire: The Masquerade Bloodlines
The Prince of Persia Trilogy
Max Payne 1 & 2
Commandos 1, 2 & 3 (Who remembers these? So much fun)
Unreal
American McGee's Alice
Neverwinter Nights 1 & 2

Damn, I miss the EAX audio days too, it's miles ahead of the crap we have today. Who remembers F.E.A.R? Or the first Bioshock with EAX? Damn, I still have my soundcard today, just to listen to those all games SING!

Rambling, why am I rambling today? :S

Chomiq · Mar 19, 2024

Where's that guy that said chiplets are shit?

InVasMani · Mar 19, 2024

lemonadesoda said:
Fp4! I dread to think about the type 1 and type 2 errors that can occur with ultra-low precision nibble Artificial Inference. It is such a blunt tool. If it’s a nail, it will work. If it’s a screw it won’t. And will the “users” of the Ai output have any clue

I think you're looking at it from the wrong perspective low precision means lower overhead and you can mix it with higher precision and basically end up with mixed sample coverage at lower overhead and with less over processing since sometimes you need lower precision to get finer details at better quality. If you reduce overhead you can pile on more without it choking as quickly. For AI and upscale it seems like a win.

N/A · Mar 19, 2024

Legacy-ZA said:
It explains the power requirement rumors I heard too, which was 1000W for the GB200 AI GPUs.

So, I expect the RTX5090 to pull around 500W, thus the new power connector makes sense, which supplies 600W, which leaves just a little headroom for overclocking while still being within the specified limits. (I still don't like the new connector, it doesn't inspire confidence, and the old 8-pins work)

the test board shows 4 by 2x6 pin. this is 675W per connector. 2700W total.

Vayra86 · Mar 19, 2024

Chomiq said:
Where's that guy that said chiplets are shit?

Magically vanished from the face of TPU

Daven · Mar 19, 2024

Vayra86 said:
Magically vanished from the face of TPU

I believe that was fancucker who kept asking Wizzard to benchmark the latency penalty of RDNA3 moving the memory controllers off die. I think he/she is still around.

Dimitriman · Mar 19, 2024

This all sounds like:

"We are making this impractically expensive because we bet you will pay anything we ask."

This applies to gamers too. Dark times are coming, especially without competition at the high end. I place a prediction right now that 5090 shall be $2599.00 MSRP and the same perf/$ as 40 series.

Chrispy_ · Mar 19, 2024

Guys, these chips bear little resemblance to the RTX 5000-series we'll see as consumers.

Think of them like the H100 datacenter accelerators that Nvidia currently makes which are also impossibly expensive and not a reflection on the RTX 40-series.

Y'all need to chill...

Daven · Mar 19, 2024

Chrispy_ said:
Guys, these chips bear little resemblance to the RTX 5000-series we'll see as consumers.

Think of them like the H100 datacenter accelerators that Nvidia currently makes which are also impossibly expensive and not a reflection on the RTX 40-series.

Y'all need to chill...

My biggest concern is that 3 nm is not ready yet for large GPU chips as Blackwell is all 4nm. Nvidia is already at 4 nm for RTX 4000 so I’m not sure how much better they can make the RTX 5000 series if its on the same node unless they allow power consumption to get out of control like with Intel CPUs.

Edit: and right on cue

NVIDIA RTX 50 "GB202" Gaming GPU reportedly features the same TSMC 4NP process as B100 - VideoCardz.com

Gaming GB202 “Blackwell” GPU using the same node as data-center B100 Kopite confirms B100 and GB202 share the same process. NVIDIA unveiled its latest Blackwell GPU architecture designed for data centers and AI acceleration yesterday, marking the first step in their new architecture push...

videocardz.com

Denver · Mar 19, 2024

"This is a module that has two B200 GPUs wired to an NVIDIA Grace CPU that offers superior serial processing performance than x86-64 based CPUs from Intel or AMD"

I'll pretend it's not a lie. The whole presentation is full of make-up to make it look bigger and better. lol

lemonadesoda · Mar 19, 2024

pavle said:
Will they be supplying nv mini nuclear reactor power plant to power these monstrosities?

Yes.

Metroid · Mar 19, 2024

Is not blackwell supposed to be 3nm? Hopper is 4nm too, nvidia ceo said they only made it bigger, so I guess that will be the same for gpus too? still 4nm? last rumors said nvidia would make blackwell gaming gpus 3nm, I wonder if that still on the table.

Daven said:
My biggest concern is that 3 nm is not ready yet for large GPU chips as Blackwell is all 4nm. Nvidia is already at 4 nm for RTX 4000 so I’m not sure how much better they can make the RTX 5000 series if its on the same node unless they allow power consumption to get out of control like with Intel CPUs.

Edit: and right on cue

NVIDIA RTX 50 "GB202" Gaming GPU reportedly features the same TSMC 4NP process as B100 - VideoCardz.com

Gaming GB202 “Blackwell” GPU using the same node as data-center B100 Kopite confirms B100 and GB202 share the same process. NVIDIA unveiled its latest Blackwell GPU architecture designed for data centers and AI acceleration yesterday, marking the first step in their new architecture push...

videocardz.com

That is my thought as well, strange, I really thought it was 3nm.

Tomorrow · Mar 19, 2024

The biggest surprise is the use of N4P node. I thought for sure Nvidia was going to use 3nm by now, at least for these 20k+ costing chips.
This does not bode well for RTX 5000 series. I very much doubt those will use 3nm either.

Lycanwolfen · Mar 19, 2024

Hmmm 2x GPU's On one board. Where did I hear that before.

Now I remember
Voodoo 5

pavle said:
Will they be supplying nv mini nuclear reactor power plant to power these monstrosities?

Be like the voodoo 5 6000 with a power supply outside the box.

I remember when Nvidia roadmaps said in 2017 they were going to build lower power GPU's with higher graphics power. To help climate change and environment. Then with 3000 Series the idea went out the window down the stream into oceans and lost forever. Jensen said screw that more power let get 1000 watt video cards instead.

Also look at slide NV link now which is basically SLI on a single board. I said that's what they were going to do 2 years ago. Guess what I was right. Following the 3dfx path to ruin all over again.

Canned Noodles · Mar 19, 2024

Lycanwolfen said:
Be like the voodoo 5 6000 with a power supply outside the box.

Don’t give Huang any ideas, now…

Noyand · Mar 19, 2024

The Quim Reaper said:
And PC Gamers will get the butchererd, cut down, runts of the litter, foundry reject chips to go into the 5090s and be overcharged $2000 for the privilege.

..and the Pc Gamer FOMO sucker crowd, will lap them up.

The consumer version of that chip will probably be very different rather than a simple cut down. Those datacenters GPU are generally pretty bad for gaming
A100

RTX 3090

System Name	RBMK-1000
Processor	AMD Ryzen 7 5700G
Motherboard	Gigabyte B550 AORUS Elite V2
Cooling	DeepCool Gammax L240 V2
Memory	2x 16GB DDR4-3200
Video Card(s)	Galax RTX 4070 Ti EX
Storage	Samsung 990 1TB
Display(s)	BenQ 1440p 60 Hz 27-inch
Case	Corsair Carbide 100R
Audio Device(s)	ASUS SupremeFX S1220A
Power Supply	Cooler Master MWE Gold 650W
Mouse	ASUS ROG Strix Impact
Keyboard	Gamdias Hermes E2
Software	Windows 11 Pro

Processor	AMD Ryzen 9 5900X
Motherboard	ASUS ROG STRIX B550-F GAMING (WI-FI)
Cooling	Noctua NH-D15 G2
Memory	32GB G.Skill DDR4 3600Mhz CL18
Video Card(s)	ASUS RTX 5070Ti OC TUF
Storage	SAMSUNG 990 PRO 2TB
Display(s)	Dell S3220DGF
Case	Corsair iCUE 4000X
Audio Device(s)	ASUS Xonar D2X
Power Supply	Corsair AX760 Platinum
Mouse	Razer DeathAdder V2 - Wireless
Keyboard	Corsair K70 PRO - OPX Linear Switches
Software	Microsoft Windows 11 - Enterprise (64-bit)

System Name	My Addiction
Processor	AMD Ryzen 7950X3D
Motherboard	ASRock B650E PG-ITX WiFi
Cooling	Alphacool Core Ocean T38 AIO 240mm
Memory	G.Skill 32GB 6000MHz
Video Card(s)	Sapphire Pulse 7900XTX
Storage	Some SSDs
Display(s)	42" Samsung TV + 22" Dell monitor vertically
Case	Lian Li A4-H2O
Audio Device(s)	Denon + Bose
Power Supply	Corsair SF750
Mouse	Logitech
Keyboard	Glorious
VR HMD	None
Software	Win 10
Benchmark Scores	None taken

System Name	ICE-QUAD // ICE-CRUNCH
Processor	Q6600 // 2x Xeon 5472
Memory	2GB DDR // 8GB FB-DIMM
Video Card(s)	HD3850-AGP // FireGL 3400
Display(s)	2 x Samsung 204Ts = 3200x1200
Audio Device(s)	Audigy 2
Software	Windows Server 2003 R2 as a Workstation now migrated to W10 with regrets.

Processor	AMD Ryzen 9 5900X
Motherboard	ASUS ROG STRIX B550-F GAMING (WI-FI)
Cooling	Noctua NH-D15 G2
Memory	32GB G.Skill DDR4 3600Mhz CL18
Video Card(s)	ASUS RTX 5070Ti OC TUF
Storage	SAMSUNG 990 PRO 2TB
Display(s)	Dell S3220DGF
Case	Corsair iCUE 4000X
Audio Device(s)	ASUS Xonar D2X
Power Supply	Corsair AX760 Platinum
Mouse	Razer DeathAdder V2 - Wireless
Keyboard	Corsair K70 PRO - OPX Linear Switches
Software	Microsoft Windows 11 - Enterprise (64-bit)

Processor	Ryzen 7 5800X3D
Motherboard	Gigabyte X570 Aorus Elite
Cooling	Thermalright Phantom Spirit 120 SE
Memory	2x16 GB Crucial Ballistix 3600 CL16 Rev E @ 3600 CL14
Video Card(s)	RTX3080 Ti FE
Storage	SX8200 Pro 1 TB, Plextor M6Pro 256 GB, WD Blue 2TB
Display(s)	LG 34GN850P-B
Case	SilverStone Primera PM01 RGB
Audio Device(s)	SoundBlaster G6 \| Fidelio X2 \| Sennheiser 6XX
Power Supply	SeaSonic Focus Plus Gold 750W
Mouse	Endgame Gear XM1R
Keyboard	Wooting Two HE

System Name	Tiny the White Yeti
Processor	7800X3D
Motherboard	MSI MAG Mortar b650m wifi
Cooling	CPU: Thermalright Peerless Assassin / Case: Phanteks T30-120 x3
Memory	32GB Corsair Vengeance 30CL6000
Video Card(s)	ASRock RX7900XT Phantom Gaming
Storage	Lexar NM790 4TB + Samsung 850 EVO 1TB + Samsung 980 1TB + Crucial BX100 250GB
Display(s)	Gigabyte G34QWC (3440x1440)
Case	Lian Li A3 mATX White
Audio Device(s)	Harman Kardon AVR137 + 2.1
Power Supply	EVGA Supernova G2 750W
Mouse	Steelseries Aerox 5
Keyboard	Lenovo Thinkpad Trackpoint II
VR HMD	HD 420 - Green Edition ;)
Software	W11 IoT Enterprise LTSC
Benchmark Scores	Over 9000

System Name	Bragging Rights
Processor	Atom Z3735F 1.33GHz
Motherboard	It has no markings but it's green
Cooling	No, it's a 2.2W processor
Memory	2GB DDR3L-1333
Video Card(s)	Gen7 Intel HD (4EU @ 311MHz)
Storage	32GB eMMC and 128GB Sandisk Extreme U3
Display(s)	10" IPS 1280x800 60Hz
Case	Veddha T2
Audio Device(s)	Apparently, yes
Power Supply	Samsung 18W 5V fast-charger
Mouse	MX Anywhere 2
Keyboard	Logitech MX Keys (not Cherry MX at all)
VR HMD	Samsung Oddyssey, not that I'd plug it into this though....
Software	W10 21H1, barely
Benchmark Scores	I once clocked a Celeron-300A to 564MHz on an Abit BE6 and it scored over 9000.

Processor	i9-11900K
Motherboard	Asus ROG Maximus XIII Hero
Cooling	Arctic Liquid Freezer II 360
Memory	4x8GB DDR4
Video Card(s)	Alienware RTX 3090 OEM
Storage	OEM Kioxia 2tb NVMe (OS), 4TB WD Blue HDD (games)
Display(s)	LG 27GN950-B
Case	Lian Li Lancool II Mesh Performance (black)
Audio Device(s)	Logitech Pro X Wireless
Power Supply	Corsair RM1000x
Keyboard	HyperX Alloy Elite 2

Unwrapping the NVIDIA B200 and GB200 AI GPU Announcements

Editor & Senior Moderator