• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Unwrapping the NVIDIA B200 and GB200 AI GPU Announcements

btarunr

Editor & Senior Moderator
Staff member
Joined
Oct 9, 2007
Messages
47,206 (7.55/day)
Location
Hyderabad, India
System Name RBMK-1000
Processor AMD Ryzen 7 5700G
Motherboard ASUS ROG Strix B450-E Gaming
Cooling DeepCool Gammax L240 V2
Memory 2x 8GB G.Skill Sniper X
Video Card(s) Palit GeForce RTX 2080 SUPER GameRock
Storage Western Digital Black NVMe 512GB
Display(s) BenQ 1440p 60 Hz 27-inch
Case Corsair Carbide 100R
Audio Device(s) ASUS SupremeFX S1220A
Power Supply Cooler Master MWE Gold 650W
Mouse ASUS ROG Strix Impact
Keyboard Gamdias Hermes E2
Software Windows 11 Pro
NVIDIA on Monday, at the 2024 GTC conference, unveiled the "Blackwell" B200 and GB200 AI GPUs. These are designed to offer an incredible 5X the AI inferencing performance gain over the current-gen "Hopper" H100, and come with four times the on-package memory. The B200 "Blackwell" is the largest chip physically possible using existing foundry tech, according to its makers. The chip is an astonishing 208 billion transistors, and is made up of two chiplets, which by themselves are the largest possible chips.

Each chiplet is built on the TSMC N4P foundry node, which is the most advanced 4 nm-class node by the Taiwanese foundry. Each chiplet has 104 billion transistors. The two chiplets have a high degree of connectivity with each other, thanks to a 10 TB/s custom interconnect. This is enough bandwidth and latency for the two to maintain cache coherency (i.e. address each other's memory as if they're their own). Each of the two "Blackwell" chiplets has a 4096-bit memory bus, and is wired to 96 GB of HBM3E spread across four 24 GB stacks; which totals to 192 GB for the B200 package. The GPU has a staggering 8 TB/s of memory bandwidth on tap. The B200 package features a 1.8 TB/s NVLink interface for host connectivity, and connectivity to another B200 chip.



NVIDIA also announced the Grace-Blackwell GB200 Superchip. This is a module that has two B200 GPUs wired to an NVIDIA Grace CPU that offers superior serial processing performance than x86-64 based CPUs from Intel or AMD; and an ISA that's highly optimized for NVIDIA's AI GPUs. The biggest advantage of the Grace CPU over an Intel Xeon Scalable or AMD EPYC has to be its higher bandwidth NVLink interconnect to the GPUs, compared to PCIe connections for x86-64 hosts. NVIDIA appears to be carrying over the Grace CPU from the GH200 Grace-Hopper Superchip.



NVIDIA did not disclose the counts of the various SIMD components such as streaming multiprocessors per chiplet, CUDA cores, Tensor cores, or on-die cache sizes, but made performance claims. Each B200 chip provides 20 PFLOPs (that's 20,000 TFLOPs) of AI inferencing performance. "Blackwell" introduces NVIDIA's 2nd generation Transformer engine, and 6th generation Tensor core, which supports FP4 and FP6. The 5th Gen NVLink interface not just scales up within the node, but also scales out to up to 576 GPUs. Among NVIDIA's performance claims for the GB200 are 20 PFLOPs FP4 Tensor (dense), and 40 PFLOPs FP4 Tensor (sparse); 10 PFLOPs FP8 Tensor (dense); and 20 PFLOPs FP8 Tensor (sparse); 5 PFLOPs Bfloat16 and FP16 (10 PFLOPs with sparsity); and 2.5 PFLOPs TF32 Tensor (dense) with 5 PFLOPs (sparse). As a high-precision compute accelerator (FP64), the B200 provides 90 TFLOPs, which is a 3x increase over that of the GH200 "Hopper."



NVIDIA is expected to ship the B100, B200, and GB200, and their first-party derivatives, such as the SuperPODs, later this year.

View at TechPowerUp Main Site | Source
 
Joined
Mar 29, 2014
Messages
463 (0.12/day)
10 yrs ago who could have known Jensen would make a chip the size of his leather jacket. What have we come to...


One thing is 10 different hardware announcements about the same product in one day. One way to make me turn green, from sickness. Someone needs to peruse Nvidia's payroll!
 
Last edited:
Joined
May 3, 2018
Messages
2,881 (1.20/day)
Tom's Hardware did a much better job. They showed us all the caveats in Nvidia's "massive" performance increases. Actual performance increase is about 25% per chip. You can also quarter the stated claims immediately as it uses two chips and they quote fp4 (yes that's a thing apparently) rather than fp8 performance of H200, GH200. Huang is no better than Musk
 
Joined
Jan 18, 2020
Messages
813 (0.46/day)
Man must people who loaded their data centers up with the h100 feel like chumps?
 
Joined
Dec 14, 2011
Messages
1,023 (0.22/day)
Location
South-Africa
Processor AMD Ryzen 9 5900X
Motherboard ASUS ROG STRIX B550-F GAMING (WI-FI)
Cooling Corsair iCUE H115i Elite Capellix 280mm
Memory 32GB G.Skill DDR4 3600Mhz CL18
Video Card(s) ASUS GTX 1650 TUF
Storage Sabrent Rocket 1TB M.2
Display(s) Dell S3220DGF
Case Corsair iCUE 4000X
Audio Device(s) ASUS Xonar D2X
Power Supply Corsair AX760 Platinum
Mouse Razer DeathAdder V2 - Wireless
Keyboard Redragon K618 RGB PRO
Software Microsoft Windows 11 Pro (64-bit)
And PC Gamers will get the butchererd, cut down, runts of the litter, foundry reject chips to go into the 5090s and be overcharged $2000 for the privilege.

..and the Pc Gamer FOMO sucker crowd, will lap them up.

It's a story that wants to make you cry, especially when you know nVidia became the monolith it is today, due to gamers. The GPU prices are still atrocious.
 
Joined
Sep 10, 2015
Messages
529 (0.16/day)
System Name My Addiction
Processor AMD Ryzen 7950X3D
Motherboard ASRock B650E PG-ITX WiFi
Cooling Alphacool Core Ocean T38 AIO 240mm
Memory G.Skill 32GB 6000MHz
Video Card(s) Sapphire Pulse 7900XTX
Storage Some SSDs
Display(s) 42" Samsung TV + 22" Dell monitor vertically
Case Lian Li A4-H2O
Audio Device(s) Denon + Bose
Power Supply Corsair SF750
Mouse Logitech
Keyboard Glorious
VR HMD None
Software Win 10
Benchmark Scores None taken
..and the Pc Gamer FOMO sucker crowd, will lap them up.
Let's just call them the 12-inch-D pocket gamers. Or simply: Whales...


...well, apart from quality streamers whom oh so naively use these overpowered PCs to create the ultimate viewing experience instead of the ever so popular formulae of Tits&Asses.
 
Joined
Dec 31, 2020
Messages
976 (0.69/day)
Processor E5-4627 v4
Motherboard VEINEDA X99
Memory 32 GB
Video Card(s) 2080 Ti
Storage NE-512
Display(s) G27Q
Case DAOTECH X9
Power Supply SF450
for gaming GPUs we get a more advanced N3 node, but it's in the same fab18 as N4, so AI is the enemy in the end.
reticle limit is haved again for N1.x node so this interconnect really paves the way for big dies like the 90-class in the future.
it may even explain why 5080 is exactly the half of 5090, the latter is two 5080s with 12288 CUDA/256 bit memory glued together.
 
Joined
Aug 30, 2006
Messages
7,221 (1.09/day)
System Name ICE-QUAD // ICE-CRUNCH
Processor Q6600 // 2x Xeon 5472
Memory 2GB DDR // 8GB FB-DIMM
Video Card(s) HD3850-AGP // FireGL 3400
Display(s) 2 x Samsung 204Ts = 3200x1200
Audio Device(s) Audigy 2
Software Windows Server 2003 R2 as a Workstation now migrated to W10 with regrets.
You can also quarter the stated claims immediately as it uses two chips and they quote fp4 (yes that's a thing apparently) rather than fp8 performance of H200, GH200. Huang is no better than Musk
Fp4! I dread to think about the type 1 and type 2 errors that can occur with ultra-low precision nibble Artificial Inference. It is such a blunt tool. If it’s a nail, it will work. If it’s a screw it won’t. And will the “users” of the Ai output have any clue
 
Joined
Dec 14, 2011
Messages
1,023 (0.22/day)
Location
South-Africa
Processor AMD Ryzen 9 5900X
Motherboard ASUS ROG STRIX B550-F GAMING (WI-FI)
Cooling Corsair iCUE H115i Elite Capellix 280mm
Memory 32GB G.Skill DDR4 3600Mhz CL18
Video Card(s) ASUS GTX 1650 TUF
Storage Sabrent Rocket 1TB M.2
Display(s) Dell S3220DGF
Case Corsair iCUE 4000X
Audio Device(s) ASUS Xonar D2X
Power Supply Corsair AX760 Platinum
Mouse Razer DeathAdder V2 - Wireless
Keyboard Redragon K618 RGB PRO
Software Microsoft Windows 11 Pro (64-bit)
for gaming GPUs we get a more advanced N3 node, but it's in the same fab18 as N4, so AI is the enemy in the end.
reticle limit is haved again for N1.x node so this interconnect really paves the way for big dies like the 90-class in the future.
it may even explain why 5080 is exactly the half of 5090, the latter is two 5080s with 12288 CUDA/256 bit memory glued together.

It explains the power requirement rumors I heard too, which was 1000W for the GB200 AI GPUs.

So, I expect the RTX5090 to pull around 500W, thus the new power connector makes sense, which supplies 600W, which leaves just a little headroom for overclocking while still being within the specified limits. (I still don't like the new connector, it doesn't inspire confidence, and the old 8-pins work)

I am rambling; that is a lot of money power-wise, never mind having to sit in an oven, sure it's comfy in the winter, but not so much in the summer, AC's are expensive to run in some countries, so I always look for the best Priced/Power/Performance GPU near the 200-250W mark. I have to take into account the wattage from my CPU and other components, the whole PC itself should not draw more than 500W under full load, it's why I like to under-volt and get the best clocks for said under-volt, it's more fun too than overclocking in my opinion. :D

Fp4! I dread to think about the type 1 and type 2 errors that can occur with ultra-low precision nibble Artificial Inference. It is such a blunt tool. If it’s a nail, it will work. If it’s a screw it won’t. And will the “users” of the Ai output have any clue

I am not so much up to date with the new A.I business side of things, but I wonder if one can use the said A.I, to take an old game that is loved and re-render the game, say, in the U5 engine?

Would love for this to happen to some golden oldies that haven't received re-makes yet and are dear to my heart, all raytraced, the light that got baked in has become much better over time in games, however, I always notice the inconsistencies, so I just love that this is finally a thing, hopefully, leatherjacket man will give us more Tensor cores so the raytracing tasks can be managed easier, reminds me of the tesselation days.

Games like:

Deus-Ex 1999
Freelancer
KotoR 1 & 2
Vampire: The Masquerade Bloodlines
The Prince of Persia Trilogy
Max Payne 1 & 2
Commandos 1, 2 & 3 (Who remembers these? So much fun)
Unreal
American McGee's Alice
Neverwinter Nights 1 & 2


Damn, I miss the EAX audio days too, it's miles ahead of the crap we have today. Who remembers F.E.A.R? Or the first Bioshock with EAX? Damn, I still have my soundcard today, just to listen to those all games SING!

Rambling, why am I rambling today? :S
 
Joined
Feb 23, 2019
Messages
6,050 (2.89/day)
Location
Poland
Processor Ryzen 7 5800X3D
Motherboard Gigabyte X570 Aorus Elite
Cooling Thermalright Phantom Spirit 120 SE
Memory 2x16 GB Crucial Ballistix 3600 CL16 Rev E @ 3800 CL16
Video Card(s) RTX3080 Ti FE
Storage SX8200 Pro 1 TB, Plextor M6Pro 256 GB, WD Blue 2TB
Display(s) LG 34GN850P-B
Case SilverStone Primera PM01 RGB
Audio Device(s) SoundBlaster G6 | Fidelio X2 | Sennheiser 6XX
Power Supply SeaSonic Focus Plus Gold 750W
Mouse Endgame Gear XM1R
Keyboard Wooting Two HE
Where's that guy that said chiplets are shit?
 
Joined
Mar 21, 2016
Messages
2,508 (0.79/day)
Fp4! I dread to think about the type 1 and type 2 errors that can occur with ultra-low precision nibble Artificial Inference. It is such a blunt tool. If it’s a nail, it will work. If it’s a screw it won’t. And will the “users” of the Ai output have any clue

I think you're looking at it from the wrong perspective low precision means lower overhead and you can mix it with higher precision and basically end up with mixed sample coverage at lower overhead and with less over processing since sometimes you need lower precision to get finer details at better quality. If you reduce overhead you can pile on more without it choking as quickly. For AI and upscale it seems like a win.
 
Joined
Dec 31, 2020
Messages
976 (0.69/day)
Processor E5-4627 v4
Motherboard VEINEDA X99
Memory 32 GB
Video Card(s) 2080 Ti
Storage NE-512
Display(s) G27Q
Case DAOTECH X9
Power Supply SF450
It explains the power requirement rumors I heard too, which was 1000W for the GB200 AI GPUs.

So, I expect the RTX5090 to pull around 500W, thus the new power connector makes sense, which supplies 600W, which leaves just a little headroom for overclocking while still being within the specified limits. (I still don't like the new connector, it doesn't inspire confidence, and the old 8-pins work)

the test board shows 4 by 2x6 pin. this is 675W per connector. 2700W total.
 
Joined
Sep 17, 2014
Messages
22,411 (6.03/day)
Location
The Washing Machine
Processor 7800X3D
Motherboard MSI MAG Mortar b650m wifi
Cooling Thermalright Peerless Assassin
Memory 32GB Corsair Vengeance 30CL6000
Video Card(s) ASRock RX7900XT Phantom Gaming
Storage Lexar NM790 4TB + Samsung 850 EVO 1TB + Samsung 980 1TB + Crucial BX100 250GB
Display(s) Gigabyte G34QWC (3440x1440)
Case Lian Li A3 mATX White
Audio Device(s) Harman Kardon AVR137 + 2.1
Power Supply EVGA Supernova G2 750W
Mouse Steelseries Aerox 5
Keyboard Lenovo Thinkpad Trackpoint II
Software W11 IoT Enterprise LTSC
Benchmark Scores Over 9000
Joined
Dec 12, 2016
Messages
1,802 (0.62/day)
Magically vanished from the face of TPU :)

I believe that was fancucker who kept asking Wizzard to benchmark the latency penalty of RDNA3 moving the memory controllers off die. I think he/she is still around.
 
Joined
Jun 5, 2018
Messages
237 (0.10/day)
This all sounds like:

"We are making this impractically expensive because we bet you will pay anything we ask."

This applies to gamers too. Dark times are coming, especially without competition at the high end. I place a prediction right now that 5090 shall be $2599.00 MSRP and the same perf/$ as 40 series.
 
Joined
Feb 20, 2019
Messages
8,259 (3.94/day)
System Name Bragging Rights
Processor Atom Z3735F 1.33GHz
Motherboard It has no markings but it's green
Cooling No, it's a 2.2W processor
Memory 2GB DDR3L-1333
Video Card(s) Gen7 Intel HD (4EU @ 311MHz)
Storage 32GB eMMC and 128GB Sandisk Extreme U3
Display(s) 10" IPS 1280x800 60Hz
Case Veddha T2
Audio Device(s) Apparently, yes
Power Supply Samsung 18W 5V fast-charger
Mouse MX Anywhere 2
Keyboard Logitech MX Keys (not Cherry MX at all)
VR HMD Samsung Oddyssey, not that I'd plug it into this though....
Software W10 21H1, barely
Benchmark Scores I once clocked a Celeron-300A to 564MHz on an Abit BE6 and it scored over 9000.
Guys, these chips bear little resemblance to the RTX 5000-series we'll see as consumers.

Think of them like the H100 datacenter accelerators that Nvidia currently makes which are also impossibly expensive and not a reflection on the RTX 40-series.

Y'all need to chill... ;)
 
Joined
Dec 12, 2016
Messages
1,802 (0.62/day)
Guys, these chips bear little resemblance to the RTX 5000-series we'll see as consumers.

Think of them like the H100 datacenter accelerators that Nvidia currently makes which are also impossibly expensive and not a reflection on the RTX 40-series.

Y'all need to chill... ;)

My biggest concern is that 3 nm is not ready yet for large GPU chips as Blackwell is all 4nm. Nvidia is already at 4 nm for RTX 4000 so I’m not sure how much better they can make the RTX 5000 series if its on the same node unless they allow power consumption to get out of control like with Intel CPUs.

Edit: and right on cue
 
Last edited:
Joined
Oct 6, 2021
Messages
1,605 (1.41/day)
"This is a module that has two B200 GPUs wired to an NVIDIA Grace CPU that offers superior serial processing performance than x86-64 based CPUs from Intel or AMD"

I'll pretend it's not a lie. The whole presentation is full of make-up to make it look bigger and better. lol
 
Joined
Aug 30, 2006
Messages
7,221 (1.09/day)
System Name ICE-QUAD // ICE-CRUNCH
Processor Q6600 // 2x Xeon 5472
Memory 2GB DDR // 8GB FB-DIMM
Video Card(s) HD3850-AGP // FireGL 3400
Display(s) 2 x Samsung 204Ts = 3200x1200
Audio Device(s) Audigy 2
Software Windows Server 2003 R2 as a Workstation now migrated to W10 with regrets.
Joined
May 8, 2018
Messages
1,568 (0.66/day)
Location
London, UK
Is not blackwell supposed to be 3nm? Hopper is 4nm too, nvidia ceo said they only made it bigger, so I guess that will be the same for gpus too? still 4nm? last rumors said nvidia would make blackwell gaming gpus 3nm, I wonder if that still on the table.

My biggest concern is that 3 nm is not ready yet for large GPU chips as Blackwell is all 4nm. Nvidia is already at 4 nm for RTX 4000 so I’m not sure how much better they can make the RTX 5000 series if its on the same node unless they allow power consumption to get out of control like with Intel CPUs.

Edit: and right on cue
That is my thought as well, strange, I really thought it was 3nm.
 
Joined
Aug 21, 2013
Messages
1,897 (0.46/day)
The biggest surprise is the use of N4P node. I thought for sure Nvidia was going to use 3nm by now, at least for these 20k+ costing chips.
This does not bode well for RTX 5000 series. I very much doubt those will use 3nm either.
 
Joined
Jun 11, 2017
Messages
273 (0.10/day)
Location
Montreal Canada
Hmmm 2x GPU's On one board. Where did I hear that before.

Now I remember
Voodoo 5

Will they be supplying nv mini nuclear reactor power plant to power these monstrosities?
Be like the voodoo 5 6000 with a power supply outside the box.

I remember when Nvidia roadmaps said in 2017 they were going to build lower power GPU's with higher graphics power. To help climate change and environment. Then with 3000 Series the idea went out the window down the stream into oceans and lost forever. Jensen said screw that more power let get 1000 watt video cards instead.

Also look at slide NV link now which is basically SLI on a single board. I said that's what they were going to do 2 years ago. Guess what I was right. Following the 3dfx path to ruin all over again.
 
Joined
Jul 8, 2022
Messages
253 (0.29/day)
Location
USA
Processor i9-11900K
Motherboard Asus ROG Maximus XIII Hero
Cooling Arctic Liquid Freezer II 360
Memory 4x8GB DDR4
Video Card(s) Alienware RTX 3090 OEM
Storage OEM Kioxia 2tb NVMe (OS), 4TB WD Blue HDD (games)
Display(s) LG 27GN950-B
Case Lian Li Lancool II Mesh Performance (black)
Audio Device(s) Logitech Pro X Wireless
Power Supply Corsair RM1000x
Keyboard HyperX Alloy Elite 2
Joined
Nov 8, 2017
Messages
229 (0.09/day)
And PC Gamers will get the butchererd, cut down, runts of the litter, foundry reject chips to go into the 5090s and be overcharged $2000 for the privilege.

..and the Pc Gamer FOMO sucker crowd, will lap them up.
The consumer version of that chip will probably be very different rather than a simple cut down. Those datacenters GPU are generally pretty bad for gaming
A100
1710855520941.png

RTX 3090
1710855555513.png
 
Top