Tuesday, March 19th 2024

Unwrapping the NVIDIA B200 and GB200 AI GPU Announcements

NVIDIA on Monday, at the 2024 GTC conference, unveiled the "Blackwell" B200 and GB200 AI GPUs. These are designed to offer an incredible 5X the AI inferencing performance gain over the current-gen "Hopper" H100, and come with four times the on-package memory. The B200 "Blackwell" is the largest chip physically possible using existing foundry tech, according to its makers. The chip is an astonishing 208 billion transistors, and is made up of two chiplets, which by themselves are the largest possible chips.

Each chiplet is built on the TSMC N4P foundry node, which is the most advanced 4 nm-class node by the Taiwanese foundry. Each chiplet has 104 billion transistors. The two chiplets have a high degree of connectivity with each other, thanks to a 10 TB/s custom interconnect. This is enough bandwidth and latency for the two to maintain cache coherency (i.e. address each other's memory as if they're their own). Each of the two "Blackwell" chiplets has a 4096-bit memory bus, and is wired to 96 GB of HBM3E spread across four 24 GB stacks; which totals to 192 GB for the B200 package. The GPU has a staggering 8 TB/s of memory bandwidth on tap. The B200 package features a 1.8 TB/s NVLink interface for host connectivity, and connectivity to another B200 chip.
NVIDIA also announced the Grace-Blackwell GB200 Superchip. This is a module that has two B200 GPUs wired to an NVIDIA Grace CPU that offers superior serial processing performance than x86-64 based CPUs from Intel or AMD; and an ISA that's highly optimized for NVIDIA's AI GPUs. The biggest advantage of the Grace CPU over an Intel Xeon Scalable or AMD EPYC has to be its higher bandwidth NVLink interconnect to the GPUs, compared to PCIe connections for x86-64 hosts. NVIDIA appears to be carrying over the Grace CPU from the GH200 Grace-Hopper Superchip.
NVIDIA did not disclose the counts of the various SIMD components such as streaming multiprocessors per chiplet, CUDA cores, Tensor cores, or on-die cache sizes, but made performance claims. Each B200 chip provides 20 PFLOPs (that's 20,000 TFLOPs) of AI inferencing performance. "Blackwell" introduces NVIDIA's 2nd generation Transformer engine, and 6th generation Tensor core, which supports FP4 and FP6. The 5th Gen NVLink interface not just scales up within the node, but also scales out to up to 576 GPUs. Among NVIDIA's performance claims for the GB200 are 20 PFLOPs FP4 Tensor (dense), and 40 PFLOPs FP4 Tensor (sparse); 10 PFLOPs FP8 Tensor (dense); and 20 PFLOPs FP8 Tensor (sparse); 5 PFLOPs Bfloat16 and FP16 (10 PFLOPs with sparsity); and 2.5 PFLOPs TF32 Tensor (dense) with 5 PFLOPs (sparse). As a high-precision compute accelerator (FP64), the B200 provides 90 TFLOPs, which is a 3x increase over that of the GH200 "Hopper."
NVIDIA is expected to ship the B100, B200, and GB200, and their first-party derivatives, such as the SuperPODs, later this year.
Source: Tom's Hardware
Add your own comment

27 Comments on Unwrapping the NVIDIA B200 and GB200 AI GPU Announcements

#1
Redwoodz
10 yrs ago who could have known Jensen would make a chip the size of his leather jacket. What have we come to...


One thing is 10 different hardware announcements about the same product in one day. One way to make me turn green, from sickness. Someone needs to peruse Nvidia's payroll!
Posted on Reply
#2
Minus Infinity
Tom's Hardware did a much better job. They showed us all the caveats in Nvidia's "massive" performance increases. Actual performance increase is about 25% per chip. You can also quarter the stated claims immediately as it uses two chips and they quote fp4 (yes that's a thing apparently) rather than fp8 performance of H200, GH200. Huang is no better than Musk
Posted on Reply
#3
The Quim Reaper
And PC Gamers will get the butchererd, cut down, runts of the litter, foundry reject chips to go into the 5090s and be overcharged $2000 for the privilege.

..and the Pc Gamer FOMO sucker crowd, will lap them up.
Posted on Reply
#4
mb194dc
Man must people who loaded their data centers up with the h100 feel like chumps?
Posted on Reply
#5
Legacy-ZA
The Quim ReaperAnd PC Gamers will get the butchererd, cut down, runts of the litter, foundry reject chips to go into the 5090s and be overcharged $2000 for the privilege.

..and the Pc Gamer FOMO sucker crowd, will lap them up.
It's a story that wants to make you cry, especially when you know nVidia became the monolith it is today, due to gamers. The GPU prices are still atrocious.
Posted on Reply
#6
Kohl Baas
The Quim Reaper..and the Pc Gamer FOMO sucker crowd, will lap them up.
Let's just call them the 12-inch-D pocket gamers. Or simply: Whales...


...well, apart from quality streamers whom oh so naively use these overpowered PCs to create the ultimate viewing experience instead of the ever so popular formulae of Tits&Asses.
Posted on Reply
#7
N/A
for gaming GPUs we get a more advanced N3 node, but it's in the same fab18 as N4, so AI is the enemy in the end.
reticle limit is haved again for N1.x node so this interconnect really paves the way for big dies like the 90-class in the future.
it may even explain why 5080 is exactly the half of 5090, the latter is two 5080s with 12288 CUDA/256 bit memory glued together.
Posted on Reply
#8
lemonadesoda
Minus InfinityYou can also quarter the stated claims immediately as it uses two chips and they quote fp4 (yes that's a thing apparently) rather than fp8 performance of H200, GH200. Huang is no better than Musk
Fp4! I dread to think about the type 1 and type 2 errors that can occur with ultra-low precision nibble Artificial Inference. It is such a blunt tool. If it’s a nail, it will work. If it’s a screw it won’t. And will the “users” of the Ai output have any clue
Posted on Reply
#9
Legacy-ZA
N/Afor gaming GPUs we get a more advanced N3 node, but it's in the same fab18 as N4, so AI is the enemy in the end.
reticle limit is haved again for N1.x node so this interconnect really paves the way for big dies like the 90-class in the future.
it may even explain why 5080 is exactly the half of 5090, the latter is two 5080s with 12288 CUDA/256 bit memory glued together.
It explains the power requirement rumors I heard too, which was 1000W for the GB200 AI GPUs.

So, I expect the RTX5090 to pull around 500W, thus the new power connector makes sense, which supplies 600W, which leaves just a little headroom for overclocking while still being within the specified limits. (I still don't like the new connector, it doesn't inspire confidence, and the old 8-pins work)

I am rambling; that is a lot of money power-wise, never mind having to sit in an oven, sure it's comfy in the winter, but not so much in the summer, AC's are expensive to run in some countries, so I always look for the best Priced/Power/Performance GPU near the 200-250W mark. I have to take into account the wattage from my CPU and other components, the whole PC itself should not draw more than 500W under full load, it's why I like to under-volt and get the best clocks for said under-volt, it's more fun too than overclocking in my opinion. :D
lemonadesodaFp4! I dread to think about the type 1 and type 2 errors that can occur with ultra-low precision nibble Artificial Inference. It is such a blunt tool. If it’s a nail, it will work. If it’s a screw it won’t. And will the “users” of the Ai output have any clue
I am not so much up to date with the new A.I business side of things, but I wonder if one can use the said A.I, to take an old game that is loved and re-render the game, say, in the U5 engine?

Would love for this to happen to some golden oldies that haven't received re-makes yet and are dear to my heart, all raytraced, the light that got baked in has become much better over time in games, however, I always notice the inconsistencies, so I just love that this is finally a thing, hopefully, leatherjacket man will give us more Tensor cores so the raytracing tasks can be managed easier, reminds me of the tesselation days.

Games like:

Deus-Ex 1999
Freelancer
KotoR 1 & 2
Vampire: The Masquerade Bloodlines
The Prince of Persia Trilogy
Max Payne 1 & 2
Commandos 1, 2 & 3 (Who remembers these? So much fun)
Unreal
American McGee's Alice
Neverwinter Nights 1 & 2


Damn, I miss the EAX audio days too, it's miles ahead of the crap we have today. Who remembers F.E.A.R? Or the first Bioshock with EAX? Damn, I still have my soundcard today, just to listen to those all games SING!

Rambling, why am I rambling today? :S
Posted on Reply
#10
Chomiq
Where's that guy that said chiplets are shit?
Posted on Reply
#11
InVasMani
lemonadesodaFp4! I dread to think about the type 1 and type 2 errors that can occur with ultra-low precision nibble Artificial Inference. It is such a blunt tool. If it’s a nail, it will work. If it’s a screw it won’t. And will the “users” of the Ai output have any clue
I think you're looking at it from the wrong perspective low precision means lower overhead and you can mix it with higher precision and basically end up with mixed sample coverage at lower overhead and with less over processing since sometimes you need lower precision to get finer details at better quality. If you reduce overhead you can pile on more without it choking as quickly. For AI and upscale it seems like a win.
Posted on Reply
#12
N/A
Legacy-ZAIt explains the power requirement rumors I heard too, which was 1000W for the GB200 AI GPUs.

So, I expect the RTX5090 to pull around 500W, thus the new power connector makes sense, which supplies 600W, which leaves just a little headroom for overclocking while still being within the specified limits. (I still don't like the new connector, it doesn't inspire confidence, and the old 8-pins work)
the test board shows 4 by 2x6 pin. this is 675W per connector. 2700W total.
Posted on Reply
#13
Vayra86
ChomiqWhere's that guy that said chiplets are shit?
Magically vanished from the face of TPU :)
Posted on Reply
#14
Daven
Vayra86Magically vanished from the face of TPU :)
I believe that was fancucker who kept asking Wizzard to benchmark the latency penalty of RDNA3 moving the memory controllers off die. I think he/she is still around.
Posted on Reply
#15
Dimitriman
This all sounds like:

"We are making this impractically expensive because we bet you will pay anything we ask."

This applies to gamers too. Dark times are coming, especially without competition at the high end. I place a prediction right now that 5090 shall be $2599.00 MSRP and the same perf/$ as 40 series.
Posted on Reply
#16
Chrispy_
Guys, these chips bear little resemblance to the RTX 5000-series we'll see as consumers.

Think of them like the H100 datacenter accelerators that Nvidia currently makes which are also impossibly expensive and not a reflection on the RTX 40-series.

Y'all need to chill... ;)
Posted on Reply
#17
Daven
Chrispy_Guys, these chips bear little resemblance to the RTX 5000-series we'll see as consumers.

Think of them like the H100 datacenter accelerators that Nvidia currently makes which are also impossibly expensive and not a reflection on the RTX 40-series.

Y'all need to chill... ;)
My biggest concern is that 3 nm is not ready yet for large GPU chips as Blackwell is all 4nm. Nvidia is already at 4 nm for RTX 4000 so I’m not sure how much better they can make the RTX 5000 series if its on the same node unless they allow power consumption to get out of control like with Intel CPUs.

Edit: and right on cue
videocardz.com/newz/nvidia-rtx-50-gb202-gaming-gpu-reportedly-features-the-same-tsmc-4np-process-as-b100
Posted on Reply
#18
Denver
"This is a module that has two B200 GPUs wired to an NVIDIA Grace CPU that offers superior serial processing performance than x86-64 based CPUs from Intel or AMD"

I'll pretend it's not a lie. The whole presentation is full of make-up to make it look bigger and better. lol
Posted on Reply
#19
lemonadesoda
pavleWill they be supplying nv mini nuclear reactor power plant to power these monstrosities?
Yes.
Posted on Reply
#20
Metroid
Is not blackwell supposed to be 3nm? Hopper is 4nm too, nvidia ceo said they only made it bigger, so I guess that will be the same for gpus too? still 4nm? last rumors said nvidia would make blackwell gaming gpus 3nm, I wonder if that still on the table.
DavenMy biggest concern is that 3 nm is not ready yet for large GPU chips as Blackwell is all 4nm. Nvidia is already at 4 nm for RTX 4000 so I’m not sure how much better they can make the RTX 5000 series if its on the same node unless they allow power consumption to get out of control like with Intel CPUs.

Edit: and right on cue
videocardz.com/newz/nvidia-rtx-50-gb202-gaming-gpu-reportedly-features-the-same-tsmc-4np-process-as-b100
That is my thought as well, strange, I really thought it was 3nm.
Posted on Reply
#21
Tomorrow
The biggest surprise is the use of N4P node. I thought for sure Nvidia was going to use 3nm by now, at least for these 20k+ costing chips.
This does not bode well for RTX 5000 series. I very much doubt those will use 3nm either.
Posted on Reply
#22
Lycanwolfen
Hmmm 2x GPU's On one board. Where did I hear that before.

Now I remember
Voodoo 5
pavleWill they be supplying nv mini nuclear reactor power plant to power these monstrosities?
Be like the voodoo 5 6000 with a power supply outside the box.

I remember when Nvidia roadmaps said in 2017 they were going to build lower power GPU's with higher graphics power. To help climate change and environment. Then with 3000 Series the idea went out the window down the stream into oceans and lost forever. Jensen said screw that more power let get 1000 watt video cards instead.

Also look at slide NV link now which is basically SLI on a single board. I said that's what they were going to do 2 years ago. Guess what I was right. Following the 3dfx path to ruin all over again.
Posted on Reply
#23
Canned Noodles
LycanwolfenBe like the voodoo 5 6000 with a power supply outside the box.
Don’t give Huang any ideas, now…
Posted on Reply
#24
Noyand
The Quim ReaperAnd PC Gamers will get the butchererd, cut down, runts of the litter, foundry reject chips to go into the 5090s and be overcharged $2000 for the privilege.

..and the Pc Gamer FOMO sucker crowd, will lap them up.
The consumer version of that chip will probably be very different rather than a simple cut down. Those datacenters GPU are generally pretty bad for gaming
A100

RTX 3090
Posted on Reply
#25
Tomorrow
NoyandThe consumer version of that chip will probably be very different rather than a simple cut down. Those datacenters GPU are generally pretty bad for gaming
Also the x100 variants lack display outputs as they are meant to be used as accelerators - even the PCIe variants.
Posted on Reply
Add your own comment
Nov 17th, 2024 05:15 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts