• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

NVIDIA B200 "Blackwell" Records 2.2x Performance Improvement Over its "Hopper" Predecessor

AleksandarK

News Editor
Staff member
Joined
Aug 19, 2017
Messages
2,651 (0.99/day)
We know that NVIDIA's latest "Blackwell" GPUs are fast, but how much faster are they over the previous generation "Hopper"? Thanks to the latest MLPerf Training v4.1 results, NVIDIA's HGX B200 Blackwell platform has demonstrated massive performance gains, measuring up to 2.2x improvement per GPU compared to its HGX H200 Hopper. The latest results, verified by MLCommons, reveal impressive achievements in large language model (LLM) training. The Blackwell architecture, featuring HBM3e high-bandwidth memory and fifth-generation NVLink interconnect technology, achieved double the performance per GPU for GPT-3 pre-training and a 2.2x boost for Llama 2 70B fine-tuning compared to the previous Hopper generation. Each benchmark system incorporated eight Blackwell GPUs operating at a 1,000 W TDP, connected via NVLink Switch for scale-up.

The network infrastructure utilized NVIDIA ConnectX-7 SuperNICs and Quantum-2 InfiniBand switches, enabling high-speed node-to-node communication for distributed training workloads. While previous Hopper-based systems required 256 GPUs to optimize performance for the GPT-3 175B benchmark, Blackwell accomplished the same task with just 64 GPUs, leveraging its larger HBM3e memory capacity and bandwidth. One thing to look out for is the upcoming GB200 NVL72 system, which promises even more significant gains past the 2.2x. It features expanded NVLink domains, higher memory bandwidth, and tight integration with NVIDIA Grace CPUs, complemented by ConnectX-8 SuperNIC and Quantum-X800 switch technologies. With faster switching and better data movement with Grace-Blackwell integration, we could see even more software optimization from NVIDIA to push the performance envelope.



View at TechPowerUp Main Site | Source
 
Joined
Sep 15, 2011
Messages
6,760 (1.39/day)
Processor Intel® Core™ i7-13700K
Motherboard Gigabyte Z790 Aorus Elite AX
Cooling Noctua NH-D15
Memory 32GB(2x16) DDR5@6600MHz G-Skill Trident Z5
Video Card(s) ZOTAC GAMING GeForce RTX 3080 AMP Holo
Storage 2TB SK Platinum P41 SSD + 4TB SanDisk Ultra SSD + 500GB Samsung 840 EVO SSD
Display(s) Acer Predator X34 3440x1440@100Hz G-Sync
Case NZXT PHANTOM410-BK
Audio Device(s) Creative X-Fi Titanium PCIe
Power Supply Corsair 850W
Mouse Logitech Hero G502 SE
Software Windows 11 Pro - 64bit
Benchmark Scores 30FPS in NFS:Rivals
Are those banned for Chinese Market?
 
Joined
Nov 2, 2016
Messages
122 (0.04/day)
What about performance per watt? It's nice that the architecture brings good performance uplift but does it do it at the cost of efficiency or not?
 
Joined
Oct 20, 2017
Messages
135 (0.05/day)
 
Joined
Nov 6, 2016
Messages
1,773 (0.60/day)
Location
NH, USA
System Name Lightbringer
Processor Ryzen 7 2700X
Motherboard Asus ROG Strix X470-F Gaming
Cooling Enermax Liqmax Iii 360mm AIO
Memory G.Skill Trident Z RGB 32GB (8GBx4) 3200Mhz CL 14
Video Card(s) Sapphire RX 5700XT Nitro+
Storage Hp EX950 2TB NVMe M.2, HP EX950 1TB NVMe M.2, Samsung 860 EVO 2TB
Display(s) LG 34BK95U-W 34" 5120 x 2160
Case Lian Li PC-O11 Dynamic (White)
Power Supply BeQuiet Straight Power 11 850w Gold Rated PSU
Mouse Glorious Model O (Matte White)
Keyboard Royal Kludge RK71
Software Windows 10
What about performance per watt? It's nice that the architecture brings good performance uplift but does it do it at the cost of efficiency or not?
Exactly....."per GPU" isn't exactly a unit of measurement and although I cannot recall the exact specifications, if I remember correctly, the B100 for example, is made from two GPU chips so I'm sure it has way more "cores" (although it seems like Nvidia is constantly changing what a Cuda "core" is so I don't even know of a Blackwell "core" can be directly compared to a hopper "core")....Basically what I'm saying is that to compare power efficiency between the two architectures, I feel like we have to use a unit of measurement like "watts per mm^2" where "mm^2" is the physical area of the GPU chip/chiplet
 

mikesg

New Member
Joined
Jun 1, 2024
Messages
26 (0.13/day)
Who would have thought "more data" & "more power" would have made it this far before attention was paid to properly engineering the solution.
 
Joined
Apr 2, 2011
Messages
2,849 (0.57/day)
Another person who actually read the article! PC gamers don't read.


This isn't about a GPU to play games.

You know...it's funny. You almost have a point, then it's smothered in the crib.

Everything is going to be about the GPUs when it comes to Nvidia. It's the nature of the thing, when you're a GPU company. In this instance there's about 2 degrees to cover. Let's make it like that other game involving Kevin Bacon.
1) Nvidia produces a new Blackwell based A.I. accelerator.
2) The A.I. accelerator is run on the same lines as their other products.
3) The production of the A.I. accelerator is higher margin, and thus will decrease the amount of GPUs on the market.

Two leaps to get from an announced (presumably commercial or educational use) product to its direct impact on the cost of consumer GPUs. Oh, and scalping is a thing...right now the countries around China are scalping for them...and you know if scalpers get caught there is a penalty, though the up-side from scalping is huge profits and an artificially inflated cost for things that are knock-on or related. In this case scalping the Nvidia A.I. accelerators will drive people who cannot afford them to buy GPUs...which will price out consumers. Cool. That might be one jump.



In short, the price of tea in China does influence the price of tea in India. It's impossible to cross fingers and wish away that the things are linked, despite theoretically being in separate realms.
 
Joined
Aug 26, 2021
Messages
384 (0.32/day)
I'll be interested to see how cut down the 5090 and what it can do if that performance gain is directly translated they will price accordingly we already know the compute cards are double. Gaming card used prices are bad here with 3090ti selling for 1000, 4090 down to 600 and used entry level 5090 1300+ ! At that money none of he make sense especially the 3090's for compute even I know the extra ram is helpful but even still not enough when compared to 40xx
 
Joined
Aug 30, 2020
Messages
327 (0.21/day)
Location
Texass
System Name EXTREME-FLIGHT SIM
Processor AMD RYZEN 7 9800X3D 4.7GHZ 8-core 120W
Motherboard ASUS ROG X670E Crosshair EXTREME BIOS V.2506
Cooling be quiet! Silent Loop 2 360MM, Light Wings 120 & 140MM
Memory G. SKILL Trident Z5 RGB 32MBx2 DDR5-6000 CL32/EXPOⅡ
Video Card(s) ASUS ROG Strix RTX4090 O24
Storage 2TB CRUCIAL T705 M.2, 4TB Seagate FireCuda 3.5"x7200rpm
Display(s) Samsung Odyssey Neo G9 57" 5120x1440 120Hz DP2.1 #2.Ulrhzar 8" Touchscreen(HUD)
Case be quiet! Dark Base Pro 900 Rev.2 Silver
Audio Device(s) ROG SupremeFX ALC4082, Creative SoundBlaster Katana V2
Power Supply be quiet! Dark Power Pro 12 1500W via APC Back-UPS 1500
Mouse LOGITECH Pro Superlight2 and POWERPLAY Mouse Pad
Keyboard CORSAIR K100 AIR
Software WINDOWS 11 x64 PRO 23H2, MSFS2020-2024 Aviator Edition, DCS
Benchmark Scores fast and stable AIDA64
This article is about the actual product for AI and servers. It won't be scalped. That only happens to the junk for gaming. Which will be scalped.
Absolutely but I'm still praying for a 5090 someday. After what I went through during Cadet Covid's reign of terror to land a 3090, I managed to finally get one but at a ridiculous ebay price.
 
Joined
May 10, 2023
Messages
352 (0.59/day)
Location
Brazil
Processor 5950x
Motherboard B550 ProArt
Cooling Fuma 2
Memory 4x32GB 3200MHz Corsair LPX
Video Card(s) 2x RTX 3090
Display(s) LG 42" C2 4k OLED
Power Supply XPG Core Reactor 850W
Software I use Arch btw
I'll be interested to see how cut down the 5090 and what it can do if that performance gain is directly translated they will price accordingly we already know the compute cards are double. Gaming card used prices are bad here with 3090ti selling for 1000, 4090 down to 600 and used entry level 5090 1300+ ! At that money none of he make sense especially the 3090's for compute even I know the extra ram is helpful but even still not enough when compared to 40xx
The GB100 chip is going to be totally different from the GB102 one, as usual from Nvidia in the past releases (the x100 chips usually have FP64 units, no RT cores, HBM support etc etc).
Aside from that, I'm curious to see how much of a die cut the 5090 is going to be from the full GB102. I expect something similar to what we saw with the 4090 w.r.t. the full AD102.

As for pricing/performance, a 3090 is almost as fast as a 4090 for LLM tasks (albeit way less efficient), given that it's memory speed is pretty much the same, hence why its priced similarly to the 4090 in many places.
The 3090(ti) also allows one to use NVLink still, offsetting the bottleneck in PCIe speeds for training models with layer-parallel approaches.
 
Joined
Jun 11, 2017
Messages
283 (0.10/day)
Location
Montreal Canada
SO the price of the 5090 are you ready $2500.00 US. or $3250.00 CAD

Ya I'll pass sorry Nvidia you care nothing about gamers and all about riping people off.

I would pay maybe up to 300 dollars for a video card but not 2500.00
 
Top