Monday, January 27th 2025

NVIDIA GB202 "Blackwell" Die Exposed, Shows the Massive 24,576 CUDA Core Configuration

A die-shot of NVIDIA's GB202, the silicon powering the RTX 5090, has surfaced online, providing detailed insights into the "Blackwell" architecture's physical layout. The annotated images, shared by hardware analyst Kurnal and provided by ASUS China general manager Tony Yu, compare the GB202 to its AD102 predecessor and outline key architectural components. The die's central region houses 128 MB of L2 cache (96 MB enabled on RTX 5090), surrounded by memory interfaces. Eight 64-bit memory controllers support the 512-bit GDDR7 interface, with physical interfaces positioned along the top, left, and right edges of the die. Twelve graphics processing clusters (GPCs) surround the central cache. Each GPC contains eight texture processing clusters (TPCs), with each GPC housing 16 streaming multiprocessors (SMs). The complete die configuration enables 24,576 CUDA cores, arranged as 128 cores per SM across 192 SMs. With RTX 5090 offering "only" 21,760 CUDA cores, this means that the full GB202 die is reserved for workstation GPUs.

The SM design includes four slices sharing 128 KB of L1 cache and four texture mapping units (TMUs). Individual SM slices contain dedicated register files, L0 instruction caches, warp schedulers, load-store units, and special function units. Central to the die's layout is a vertical strip containing the media processing components—NVENC and NVDEC units—running from top to bottom. The RTX 5090 implementation enables three of four available NVENC encoders and two of four NVDEC decoders. The die includes twelve raster engine/3D FF blocks for geometry processing. At the bottom edge sits the PCIe 5.0 x16 interface and display controller components. Despite its substantial size, the GB202 remains smaller than NVIDIA's previous GH100 and GV100 dies, which exceeded 814 mm². Each SM integrates specialized hardware, including new 5th-generation Tensor cores and 4th-generation RT cores, contributing to the die's total of 192 RT cores, 768 Tensor cores, and 768 texture units.
Sources: ASUS China Tony Yu, Kurnal on X, via VideoCardz
Add your own comment

27 Comments on NVIDIA GB202 "Blackwell" Die Exposed, Shows the Massive 24,576 CUDA Core Configuration

#1
r.h.p
GEEZUZ looks like a wicked construction :nutkick:
Posted on Reply
#2
Bruno Vieira
Its just a difference of 3.5%, for a GPU of this size, it would never make sense to have a 5090ti with the same die.
Posted on Reply
#3
r.h.p
Bruno VieiraIts just a difference of 3.5%, for a GPU of this size, it would never make sense to have a 5090ti with the same die.
seems to me the real start of Ai GPU presentation
Posted on Reply
#4
Vya Domus
Bruno VieiraIts just a difference of 3.5%, for a GPU of this size, it would never make sense to have a 5090ti with the same die.
The chip is simply too large to realistically segment products like that because of yields.
Posted on Reply
#5
Daven
Vya DomusThe chip is simply too large to realistically segment products like that because of yields.
Here are the last few generation chip sizes. Looks like normal gen to gen variation to me. I don’t see anything out of the ordinary.

Posted on Reply
#6
3valatzy
Vya DomusThe chip is simply too large to realistically segment products like that because of yields.
How so... exactly because of the sheer size it is quite suitable for segmenting - 50% enabled shaders would be a good RTX 5060 candidate.
Posted on Reply
#7
rodneyhchef
News heading mentions wrong no - 756 instead of 576 :)
Posted on Reply
#8
Vya Domus
3valatzyHow so... exactly because of the sheer size it is quite suitable for segmenting - 50% enabled shaders would be a good RTX 5060 candidate.
That's not how this works, throwing away 50% of the wafer to turn a 5090 into a 5060 is ridiculous, yields are much better if you simply make a chip 50% smaller.

By the way -50% shaders wouldn't mean a 5060, it would mean a 5080, which is exactly what that GPU is, half of a GB202. Except that they didn't chose to simply disable half of a GB202 but instead they made a different chip because that's way more cost effective.
Posted on Reply
#9
3valatzy
Vya DomusThat's not how this works, throwing away 50% of the wafer to turn a 5090 into a 5060 is ridiculous, yields are much better if you simply make a chip 50% smaller.
What do you do with the salvage parts, then? Directly in the bin, instead of segmenting ?
Vya DomusBy the way -50% shaders wouldn't mean a 5060, it would mean a 5080
According to the greedy black-leather-jacketed shitshow products? I guess he tests his client's intelligence.
Posted on Reply
#10
Vya Domus
3valatzyWhat do you do with the salvage parts, then? Directly in the bin, instead of segmenting
Yes. If it made sense to use a GB102 for lower tier products the greedy black-leather-jacketed CEO would have done that instead, it's obvious.
Posted on Reply
#11
Prima.Vera
Future 5090 Ti GPU? For only 3000$ MSRP!
Posted on Reply
#12
3valatzy
Vya DomusYes. If it made sense to use a GB102 for lower tier products the greedy black-leather-jacketed CEO would have done that instead, it's obvious.
It doesn't make sense. It was estimated that the cost to make one RTX 5090 is between 450$ and 500$. Selling the defective dies for anything above those values is profits, still higher than throwing the materials (expensive wafers) in the bin.
Posted on Reply
#13
Daven
Prima.VeraFuture 5090 Ti GPU? For only 3000$ MSRP!
Don’t you mean future 5090 price hike to $3000 to make up for AI business loss to China?
Posted on Reply
#14
londiste
3valatzyWhat do you do with the salvage parts, then? Directly in the bin, instead of segmenting ?
Depends on what exactly yield and defect patterns are. Generally, if it is mass-produced and sold as a product the yield numbers for dies suitable for some SKU are not as bad as you'd expect. GPUs are huge but contain a lot of identical parallel units. Disable a few and there you go. If indeed you need to resort to disabling half a chip, then producing that thing in the first place is pretty suspect. Not that one or another company has not manufactured dies with horrible-horrible yields but these are exceptions rather than a rule.
Posted on Reply
#15
BoggledBeagle
5090 dies can be used this way:

Fully functional: 5090 TI or not sold as a consumer product at all
almost functional: 5090
partly functional: 5080 TI - something must fill the gap anyway
unusable for above: scrapped.
Posted on Reply
#16
Dr. Dro
@AleksandarK Brother, that math ain't mathin'. 128 CUDA cores*192 SM = 24576, not 24756 ;)
BoggledBeagle5090 dies can be used this way:

Fully functional: 5090 TI or not sold as a consumer product at all
almost functional: 5090
partly functional: 5080 TI - something must fill the gap anyway
unusable for above: scrapped.
AD102 never shipped in a full configuration even at the enterprise segment, wouldn't be surprised if this happened again tbh
Posted on Reply
#17
BoggledBeagle
Dr. DroAD102 never shipped in a full configuration even at the enterprise segment
I thought it did... Was it really almost impossible to make a fully functional chip?
Posted on Reply
#18
Vya Domus
3valatzySelling the defective dies for anything above those values is profits bin.
No, you still don't understand. In order to sell those defective dies it must make sense to waste that much of a wafer vs a wafer with much smaller chips. The yields don't scale linearly, you waste way more space with bigger chips because you can have defects which make the entire die unusable and you can't just simply disable SMs and then use it in a lower end SKU, so instead of losing 350mm^2, you lose 750mm^2, or whatever.
Posted on Reply
#19
N/A
DavenHere are the last few generation chip sizes. Looks like normal gen to gen variation to me. I don’t see anything out of the ordinary.
Next in line the 6090 with a 600mm2 die with 24576 enabled out of 30720 and 384 bit bus because it's impossible to fit 512. That's the evolution. Which probably means 12288 for the 6080.
Posted on Reply
#20
igormp
3valatzyHow so... exactly because of the sheer size it is quite suitable for segmenting - 50% enabled shaders would be a good RTX 5060 candidate.
The main idea of a line is that you won't even be getting defect rates that high, meaning that your GB202 chip will be at worst something like 70~80% defective. If you often get chips worse than that, then there's no reason to even fab that chip to begin with.
BoggledBeagle5090 dies can be used this way:

Fully functional: 5090 TI or not sold as a consumer product at all
almost functional: 5090
partly functional: 5080 TI - something must fill the gap anyway
unusable for above: scrapped.
That's considering only the consumer RTX parts, those same chips are also used for their (née) Tesla/Quadro cards.
BoggledBeagleI thought it did... Was it really almost impossible to make a fully functional chip?
The high-end AD102 only had 2SMs disabled IIRC. I guess that's a good safety margin on such a big chip, or maybe they indeed couldn't get it 100% functioning often enough to give it a proper product.
Posted on Reply
#21
Dr. Dro
BoggledBeagleI thought it did... Was it really almost impossible to make a fully functional chip?
I believe so. The RTX 6000 Ada Generation has 142 out of the 144 SMs enabled, with the RTX 4090 coming in at 128 out of 144. A full L2 cache slice is also disabled on the 4090, reducing L2 from 96 to 72 MB.
Posted on Reply
#22
Steevo
Dr. DroAD102 never shipped in a full configuration even at the enterprise segment, wouldn't be surprised if this happened again tbh
Truth.

The number of defects in the silicon, during lithography and production on a chip this complex rules out a full chip being feasible. I’m sure they get some that have all their parts working, I would guess the my keep that for internal use.

All it take is a few atoms of carbon or aluminum at these node sizes.
Posted on Reply
#23
Visible Noise
3valatzyWhat do you do with the salvage parts, then? Directly in the bin, instead of segmenting ?
Technically you could call the 5090 a salvage part, as it is not fully enabled for yield reasons.
Posted on Reply
#24
AnotherReader
igormpThe main idea of a line is that you won't even be getting defect rates that high, meaning that your GB202 chip will be at worst something like 70~80% defective. If you often get chips worse than that, then there's no reason to even fab that chip to begin with.

That's considering only the consumer RTX parts, those same chips are also used for their (née) Tesla/Quadro cards.

The high-end AD102 only had 2SMs disabled IIRC. I guess that's a good safety margin on such a big chip, or maybe they indeed couldn't get it 100% functioning often enough to give it a proper product.
We know that TSMC's N5 had adefect rate of 0.1 per square centimeter in the summer of 2020. Plugging in the numbers for the 5090 suggests a yield of 49% for fully functional dies. After harvesting defective dies and fusing off damaged portions, the yields must be fairly high.
Posted on Reply
#25
mxthunder
It always bums me out when we never get a fully enabled big die part. Even if it doesnt make sense as a product at the time, its cool to pickup years down the line, and cool to compare fully enabled big die parts on a gen to gen basis.
I forgot that AD102 was actually smaller than GA102.
Posted on Reply
Add your own comment
Jan 30th, 2025 01:39 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts