• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

NVIDIA GB202 "Blackwell" Die Exposed, Shows the Massive 24,576 CUDA Core Configuration

AleksandarK

News Editor
Staff member
Joined
Aug 19, 2017
Messages
2,752 (1.01/day)
A die-shot of NVIDIA's GB202, the silicon powering the RTX 5090, has surfaced online, providing detailed insights into the "Blackwell" architecture's physical layout. The annotated images, shared by hardware analyst Kurnal and provided by ASUS China general manager Tony Yu, compare the GB202 to its AD102 predecessor and outline key architectural components. The die's central region houses 128 MB of L2 cache (96 MB enabled on RTX 5090), surrounded by memory interfaces. Eight 64-bit memory controllers support the 512-bit GDDR7 interface, with physical interfaces positioned along the top, left, and right edges of the die. Twelve graphics processing clusters (GPCs) surround the central cache. Each GPC contains eight texture processing clusters (TPCs), with each GPC housing 16 streaming multiprocessors (SMs). The complete die configuration enables 24,576 CUDA cores, arranged as 128 cores per SM across 192 SMs. With RTX 5090 offering "only" 21,760 CUDA cores, this means that the full GB202 die is reserved for workstation GPUs.

The SM design includes four slices sharing 128 KB of L1 cache and four texture mapping units (TMUs). Individual SM slices contain dedicated register files, L0 instruction caches, warp schedulers, load-store units, and special function units. Central to the die's layout is a vertical strip containing the media processing components—NVENC and NVDEC units—running from top to bottom. The RTX 5090 implementation enables three of four available NVENC encoders and two of four NVDEC decoders. The die includes twelve raster engine/3D FF blocks for geometry processing. At the bottom edge sits the PCIe 5.0 x16 interface and display controller components. Despite its substantial size, the GB202 remains smaller than NVIDIA's previous GH100 and GV100 dies, which exceeded 814 mm². Each SM integrates specialized hardware, including new 5th-generation Tensor cores and 4th-generation RT cores, contributing to the die's total of 192 RT cores, 768 Tensor cores, and 768 texture units.



View at TechPowerUp Main Site | Source
 
Joined
Jun 18, 2015
Messages
386 (0.11/day)
Location
Perth , West Australia
System Name schweinestalle1 and schweinestalle 2
Processor AMD Ryzen 7 5700X3D / AMD Ryzen 3200G
Motherboard Asus Prime - Pro X570 + Asus PCI -E AC68 Adapter / Asus Prime B450 M-K
Cooling TT Tough air 510 / AMD Wraith
Memory Kingston HyperX 2 x 16 gb DDR 4 3200mhz / Kingston HyperX 2x 8Gb DDR 3200mhz
Video Card(s) AMD Radeon RX 7800 XT 16GB Pulse / AMD Reference Vega 64 8GB
Storage Crucial 1TB M.2 SSD and WD Blue 500gb Nand SSD / WD Blue 240gb M.2 SSD
Display(s) Asus XG 32 V ROG and LG ultra gear 32gs75q / TCL TV
Case Corsair AIR ATX / Corsair Air Mini ATX
Audio Device(s) Realtech standard / Realtech standard
Power Supply Corsair 850 Modular / Corsair 750 Modular
Mouse CM Havoc / Microsoft Wireless
Keyboard Corsair Cherry Mechanical / Razor piece of shit
Software Win 10 / win 10
Benchmark Scores Soon ! whateva
GEEZUZ looks like a wicked construction :nutkick:
 
Joined
Aug 22, 2016
Messages
171 (0.06/day)
Its just a difference of 3.5%, for a GPU of this size, it would never make sense to have a 5090ti with the same die.
 
Joined
Jun 18, 2015
Messages
386 (0.11/day)
Location
Perth , West Australia
System Name schweinestalle1 and schweinestalle 2
Processor AMD Ryzen 7 5700X3D / AMD Ryzen 3200G
Motherboard Asus Prime - Pro X570 + Asus PCI -E AC68 Adapter / Asus Prime B450 M-K
Cooling TT Tough air 510 / AMD Wraith
Memory Kingston HyperX 2 x 16 gb DDR 4 3200mhz / Kingston HyperX 2x 8Gb DDR 3200mhz
Video Card(s) AMD Radeon RX 7800 XT 16GB Pulse / AMD Reference Vega 64 8GB
Storage Crucial 1TB M.2 SSD and WD Blue 500gb Nand SSD / WD Blue 240gb M.2 SSD
Display(s) Asus XG 32 V ROG and LG ultra gear 32gs75q / TCL TV
Case Corsair AIR ATX / Corsair Air Mini ATX
Audio Device(s) Realtech standard / Realtech standard
Power Supply Corsair 850 Modular / Corsair 750 Modular
Mouse CM Havoc / Microsoft Wireless
Keyboard Corsair Cherry Mechanical / Razor piece of shit
Software Win 10 / win 10
Benchmark Scores Soon ! whateva
Its just a difference of 3.5%, for a GPU of this size, it would never make sense to have a 5090ti with the same die.
seems to me the real start of Ai GPU presentation
 
Joined
Jan 8, 2017
Messages
9,651 (3.28/day)
System Name Good enough
Processor AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard ASRock B650 Pro RS
Cooling 2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory 32GB - FURY Beast RGB 5600 Mhz
Video Card(s) Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage 1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s) LG UltraGear 32GN650-B + 4K Samsung TV
Case Phanteks NV7
Power Supply GPS-750C
Its just a difference of 3.5%, for a GPU of this size, it would never make sense to have a 5090ti with the same die.
The chip is simply too large to realistically segment products like that because of yields.
 
Joined
Dec 12, 2016
Messages
2,153 (0.72/day)
The chip is simply too large to realistically segment products like that because of yields.
Here are the last few generation chip sizes. Looks like normal gen to gen variation to me. I don’t see anything out of the ordinary.

1737979677741.png
 
Joined
Jan 27, 2024
Messages
532 (1.44/day)
Processor Ryzen AI
Motherboard MSI
Cooling Cool
Memory Fast
Video Card(s) Matrox Ultra high quality | Radeon
Storage Chinese
Display(s) 4K
Case Transparent left side window
Audio Device(s) Yes
Power Supply Chinese
Mouse Chinese
Keyboard Chinese
VR HMD No
Software Android | Yandex
Benchmark Scores Yes
The chip is simply too large to realistically segment products like that because of yields.

How so... exactly because of the sheer size it is quite suitable for segmenting - 50% enabled shaders would be a good RTX 5060 candidate.
 
Joined
Jun 6, 2007
Messages
443 (0.07/day)
Location
Manchester, UK
System Name Colin #2 - the revenge!
Processor Ryzen 7 5800X3D
Motherboard Gigabyte B550 Aorus Elite V2
Cooling Arctic P14 PWM PST (3 pull and 3 push) 2 pushing through an Arctic Freezer II 280 AIO
Memory TeamGroup Dark Pro 8 Pack 2 x16Gb dual rank B-die 3733MHz CL16
Video Card(s) MSI RTX 4080 Suprim X
Storage WD SN850X 1Tb + WD SN770 2Tb
Display(s) MSI MPG321URX
Case Phanteks P500A
Audio Device(s) Realtek ALC1200/1220
Power Supply 750W Corsair RM750
VR HMD PSVR2
News heading mentions wrong no - 756 instead of 576 :)
 
Joined
Jan 8, 2017
Messages
9,651 (3.28/day)
System Name Good enough
Processor AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard ASRock B650 Pro RS
Cooling 2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory 32GB - FURY Beast RGB 5600 Mhz
Video Card(s) Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage 1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s) LG UltraGear 32GN650-B + 4K Samsung TV
Case Phanteks NV7
Power Supply GPS-750C
How so... exactly because of the sheer size it is quite suitable for segmenting - 50% enabled shaders would be a good RTX 5060 candidate.
That's not how this works, throwing away 50% of the wafer to turn a 5090 into a 5060 is ridiculous, yields are much better if you simply make a chip 50% smaller.

By the way -50% shaders wouldn't mean a 5060, it would mean a 5080, which is exactly what that GPU is, half of a GB202. Except that they didn't chose to simply disable half of a GB202 but instead they made a different chip because that's way more cost effective.
 
Joined
Jan 27, 2024
Messages
532 (1.44/day)
Processor Ryzen AI
Motherboard MSI
Cooling Cool
Memory Fast
Video Card(s) Matrox Ultra high quality | Radeon
Storage Chinese
Display(s) 4K
Case Transparent left side window
Audio Device(s) Yes
Power Supply Chinese
Mouse Chinese
Keyboard Chinese
VR HMD No
Software Android | Yandex
Benchmark Scores Yes
That's not how this works, throwing away 50% of the wafer to turn a 5090 into a 5060 is ridiculous, yields are much better if you simply make a chip 50% smaller.

What do you do with the salvage parts, then? Directly in the bin, instead of segmenting ?

By the way -50% shaders wouldn't mean a 5060, it would mean a 5080

According to the greedy black-leather-jacketed shitshow products? I guess he tests his client's intelligence.
 
Joined
Jan 8, 2017
Messages
9,651 (3.28/day)
System Name Good enough
Processor AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard ASRock B650 Pro RS
Cooling 2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory 32GB - FURY Beast RGB 5600 Mhz
Video Card(s) Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage 1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s) LG UltraGear 32GN650-B + 4K Samsung TV
Case Phanteks NV7
Power Supply GPS-750C
What do you do with the salvage parts, then? Directly in the bin, instead of segmenting
Yes. If it made sense to use a GB102 for lower tier products the greedy black-leather-jacketed CEO would have done that instead, it's obvious.
 
Joined
Sep 15, 2011
Messages
6,851 (1.40/day)
Processor Intel® Core™ i7-13700K
Motherboard Gigabyte Z790 Aorus Elite AX
Cooling Noctua NH-D15
Memory 32GB(2x16) DDR5@6600MHz G-Skill Trident Z5
Video Card(s) ZOTAC GAMING GeForce RTX 3080 AMP Holo
Storage 2TB SK Platinum P41 SSD + 4TB SanDisk Ultra SSD + 500GB Samsung 840 EVO SSD
Display(s) Acer Predator X34 3440x1440@100Hz G-Sync
Case NZXT PHANTOM410-BK
Audio Device(s) Creative X-Fi Titanium PCIe
Power Supply Corsair 850W
Mouse Logitech Hero G502 SE
Software Windows 11 Pro - 64bit
Benchmark Scores 30FPS in NFS:Rivals
Future 5090 Ti GPU? For only 3000$ MSRP!
 
Joined
Jan 27, 2024
Messages
532 (1.44/day)
Processor Ryzen AI
Motherboard MSI
Cooling Cool
Memory Fast
Video Card(s) Matrox Ultra high quality | Radeon
Storage Chinese
Display(s) 4K
Case Transparent left side window
Audio Device(s) Yes
Power Supply Chinese
Mouse Chinese
Keyboard Chinese
VR HMD No
Software Android | Yandex
Benchmark Scores Yes
Yes. If it made sense to use a GB102 for lower tier products the greedy black-leather-jacketed CEO would have done that instead, it's obvious.

It doesn't make sense. It was estimated that the cost to make one RTX 5090 is between 450$ and 500$. Selling the defective dies for anything above those values is profits, still higher than throwing the materials (expensive wafers) in the bin.
 
Joined
Feb 3, 2017
Messages
3,889 (1.33/day)
Processor Ryzen 7800X3D
Motherboard ROG STRIX B650E-F GAMING WIFI
Memory 2x16GB G.Skill Flare X5 DDR5-6000 CL36 (F5-6000J3636F16GX2-FX5)
Video Card(s) INNO3D GeForce RTX™ 4070 Ti SUPER TWIN X2
Storage 2TB Samsung 980 PRO, 4TB WD Black SN850X
Display(s) 42" LG C2 OLED, 27" ASUS PG279Q
Case Thermaltake Core P5
Power Supply Fractal Design Ion+ Platinum 760W
Mouse Corsair Dark Core RGB Pro SE
Keyboard Corsair K100 RGB
VR HMD HTC Vive Cosmos
What do you do with the salvage parts, then? Directly in the bin, instead of segmenting ?
Depends on what exactly yield and defect patterns are. Generally, if it is mass-produced and sold as a product the yield numbers for dies suitable for some SKU are not as bad as you'd expect. GPUs are huge but contain a lot of identical parallel units. Disable a few and there you go. If indeed you need to resort to disabling half a chip, then producing that thing in the first place is pretty suspect. Not that one or another company has not manufactured dies with horrible-horrible yields but these are exceptions rather than a rule.
 
Joined
May 24, 2023
Messages
1,037 (1.68/day)
5090 dies can be used this way:

Fully functional: 5090 TI or not sold as a consumer product at all
almost functional: 5090
partly functional: 5080 TI - something must fill the gap anyway
unusable for above: scrapped.
 
Joined
Dec 25, 2020
Messages
7,425 (4.96/day)
Location
São Paulo, Brazil
System Name "Icy Resurrection"
Processor 13th Gen Intel Core i9-13900KS
Motherboard ASUS ROG Maximus Z790 Apex Encore
Cooling Noctua NH-D15S upgraded with 2x NF-F12 iPPC-3000 fans and Honeywell PTM7950 TIM
Memory 32 GB G.SKILL Trident Z5 RGB F5-6800J3445G16GX2-TZ5RK @ 7600 MT/s 36-44-44-52-96 1.4V
Video Card(s) RTX A2000 (soon: Palit GeForce RTX 5090 GameRock)
Storage 500 GB WD Black SN750 SE NVMe SSD + 4 TB WD Red Plus WD40EFPX HDD
Display(s) 55-inch LG G3 OLED
Case Pichau Mancer CV500 White Edition
Power Supply EVGA 1300 G2 1.3kW 80+ Gold
Mouse Microsoft Classic IntelliMouse (2017)
Keyboard IBM Model M type 1391405
Software Windows 10 Pro 22H2
Benchmark Scores I pulled a Qiqi~
@AleksandarK Brother, that math ain't mathin'. 128 CUDA cores*192 SM = 24576, not 24756 ;)

5090 dies can be used this way:

Fully functional: 5090 TI or not sold as a consumer product at all
almost functional: 5090
partly functional: 5080 TI - something must fill the gap anyway
unusable for above: scrapped.

AD102 never shipped in a full configuration even at the enterprise segment, wouldn't be surprised if this happened again tbh
 
Joined
Jan 8, 2017
Messages
9,651 (3.28/day)
System Name Good enough
Processor AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard ASRock B650 Pro RS
Cooling 2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory 32GB - FURY Beast RGB 5600 Mhz
Video Card(s) Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage 1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s) LG UltraGear 32GN650-B + 4K Samsung TV
Case Phanteks NV7
Power Supply GPS-750C
Selling the defective dies for anything above those values is profits bin.
No, you still don't understand. In order to sell those defective dies it must make sense to waste that much of a wafer vs a wafer with much smaller chips. The yields don't scale linearly, you waste way more space with bigger chips because you can have defects which make the entire die unusable and you can't just simply disable SMs and then use it in a lower end SKU, so instead of losing 350mm^2, you lose 750mm^2, or whatever.
 
Joined
Dec 31, 2020
Messages
1,089 (0.73/day)
Processor E5-4627 v4
Motherboard VEINEDA X99
Memory 32 GB
Video Card(s) 2080 Ti
Storage NE-512
Display(s) G27Q
Case DAOTECH X9
Power Supply SF450
Here are the last few generation chip sizes. Looks like normal gen to gen variation to me. I don’t see anything out of the ordinary.
Next in line the 6090 with a 600mm2 die with 24576 enabled out of 30720 and 384 bit bus because it's impossible to fit 512. That's the evolution. Which probably means 12288 for the 6080.
 
Joined
May 10, 2023
Messages
553 (0.88/day)
Location
Brazil
Processor 5950x
Motherboard B550 ProArt
Cooling Fuma 2
Memory 4x32GB 3200MHz Corsair LPX
Video Card(s) 2x RTX 3090
Display(s) LG 42" C2 4k OLED
Power Supply XPG Core Reactor 850W
Software I use Arch btw
How so... exactly because of the sheer size it is quite suitable for segmenting - 50% enabled shaders would be a good RTX 5060 candidate.
The main idea of a line is that you won't even be getting defect rates that high, meaning that your GB202 chip will be at worst something like 70~80% defective. If you often get chips worse than that, then there's no reason to even fab that chip to begin with.
5090 dies can be used this way:

Fully functional: 5090 TI or not sold as a consumer product at all
almost functional: 5090
partly functional: 5080 TI - something must fill the gap anyway
unusable for above: scrapped.
That's considering only the consumer RTX parts, those same chips are also used for their (née) Tesla/Quadro cards.
I thought it did... Was it really almost impossible to make a fully functional chip?
The high-end AD102 only had 2SMs disabled IIRC. I guess that's a good safety margin on such a big chip, or maybe they indeed couldn't get it 100% functioning often enough to give it a proper product.
 
Joined
Dec 25, 2020
Messages
7,425 (4.96/day)
Location
São Paulo, Brazil
System Name "Icy Resurrection"
Processor 13th Gen Intel Core i9-13900KS
Motherboard ASUS ROG Maximus Z790 Apex Encore
Cooling Noctua NH-D15S upgraded with 2x NF-F12 iPPC-3000 fans and Honeywell PTM7950 TIM
Memory 32 GB G.SKILL Trident Z5 RGB F5-6800J3445G16GX2-TZ5RK @ 7600 MT/s 36-44-44-52-96 1.4V
Video Card(s) RTX A2000 (soon: Palit GeForce RTX 5090 GameRock)
Storage 500 GB WD Black SN750 SE NVMe SSD + 4 TB WD Red Plus WD40EFPX HDD
Display(s) 55-inch LG G3 OLED
Case Pichau Mancer CV500 White Edition
Power Supply EVGA 1300 G2 1.3kW 80+ Gold
Mouse Microsoft Classic IntelliMouse (2017)
Keyboard IBM Model M type 1391405
Software Windows 10 Pro 22H2
Benchmark Scores I pulled a Qiqi~
I thought it did... Was it really almost impossible to make a fully functional chip?

I believe so. The RTX 6000 Ada Generation has 142 out of the 144 SMs enabled, with the RTX 4090 coming in at 128 out of 144. A full L2 cache slice is also disabled on the 4090, reducing L2 from 96 to 72 MB.
 
Joined
Nov 4, 2005
Messages
12,083 (1.72/day)
System Name Compy 386
Processor 7800X3D
Motherboard Asus
Cooling Air for now.....
Memory 64 GB DDR5 6400Mhz
Video Card(s) 7900XTX 310 Merc
Storage Samsung 990 2TB, 2 SP 2TB SSDs, 24TB Enterprise drives
Display(s) 55" Samsung 4K HDR
Audio Device(s) ATI HDMI
Mouse Logitech MX518
Keyboard Razer
Software A lot.
Benchmark Scores Its fast. Enough.
AD102 never shipped in a full configuration even at the enterprise segment, wouldn't be surprised if this happened again tbh
Truth.

The number of defects in the silicon, during lithography and production on a chip this complex rules out a full chip being feasible. I’m sure they get some that have all their parts working, I would guess the my keep that for internal use.

All it take is a few atoms of carbon or aluminum at these node sizes.
 
Joined
Jun 19, 2024
Messages
361 (1.60/day)
System Name XPS, Lenovo and HP Laptops, HP Xeon Mobile Workstation, HP Servers, Dell Desktops
Processor Everything from Turion to 13900kf
Motherboard MSI - they own the OEM market
Cooling Air on laptops, lots of air on servers, AIO on desktops
Memory I think one of the laptops is 2GB, to 64GB on gamer, to 128GB on ZFS Filer
Video Card(s) A pile up to my knee, with a RTX 4090 teetering on top
Storage Rust in the closet, solid state everywhere else
Display(s) Laptop crap, LG UltraGear of various vintages
Case OEM and a 42U rack
Audio Device(s) Headphones
Power Supply Whole home UPS w/Generac Standby Generator
Software ZFS, UniFi Network Application, Entra, AWS IoT Core, Splunk
Benchmark Scores 1.21 GigaBungholioMarks
What do you do with the salvage parts, then? Directly in the bin, instead of segmenting ?

Technically you could call the 5090 a salvage part, as it is not fully enabled for yield reasons.
 
Joined
Nov 26, 2021
Messages
1,768 (1.52/day)
Location
Mississauga, Canada
Processor Ryzen 7 5700X
Motherboard ASUS TUF Gaming X570-PRO (WiFi 6)
Cooling Noctua NH-C14S (two fans)
Memory 2x16GB DDR4 3200
Video Card(s) Reference Vega 64
Storage Intel 665p 1TB, WD Black SN850X 2TB, Crucial MX300 1TB SATA, Samsung 830 256 GB SATA
Display(s) Nixeus NX-EDG27, and Samsung S23A700
Case Fractal Design R5
Power Supply Seasonic PRIME TITANIUM 850W
Mouse Logitech
VR HMD Oculus Rift
Software Windows 11 Pro, and Ubuntu 20.04
The main idea of a line is that you won't even be getting defect rates that high, meaning that your GB202 chip will be at worst something like 70~80% defective. If you often get chips worse than that, then there's no reason to even fab that chip to begin with.

That's considering only the consumer RTX parts, those same chips are also used for their (née) Tesla/Quadro cards.

The high-end AD102 only had 2SMs disabled IIRC. I guess that's a good safety margin on such a big chip, or maybe they indeed couldn't get it 100% functioning often enough to give it a proper product.
We know that TSMC's N5 had a defect rate of 0.1 per square centimeter in the summer of 2020. Plugging in the numbers for the 5090 suggests a yield of 49% for fully functional dies. After harvesting defective dies and fusing off damaged portions, the yields must be fairly high.
 
Top