Monday, January 27th 2025
NVIDIA GB202 "Blackwell" Die Exposed, Shows the Massive 24,576 CUDA Core Configuration
A die-shot of NVIDIA's GB202, the silicon powering the RTX 5090, has surfaced online, providing detailed insights into the "Blackwell" architecture's physical layout. The annotated images, shared by hardware analyst Kurnal and provided by ASUS China general manager Tony Yu, compare the GB202 to its AD102 predecessor and outline key architectural components. The die's central region houses 128 MB of L2 cache (96 MB enabled on RTX 5090), surrounded by memory interfaces. Eight 64-bit memory controllers support the 512-bit GDDR7 interface, with physical interfaces positioned along the top, left, and right edges of the die. Twelve graphics processing clusters (GPCs) surround the central cache. Each GPC contains eight texture processing clusters (TPCs), with each GPC housing 16 streaming multiprocessors (SMs). The complete die configuration enables 24,576 CUDA cores, arranged as 128 cores per SM across 192 SMs. With RTX 5090 offering "only" 21,760 CUDA cores, this means that the full GB202 die is reserved for workstation GPUs.
The SM design includes four slices sharing 128 KB of L1 cache and four texture mapping units (TMUs). Individual SM slices contain dedicated register files, L0 instruction caches, warp schedulers, load-store units, and special function units. Central to the die's layout is a vertical strip containing the media processing components—NVENC and NVDEC units—running from top to bottom. The RTX 5090 implementation enables three of four available NVENC encoders and two of four NVDEC decoders. The die includes twelve raster engine/3D FF blocks for geometry processing. At the bottom edge sits the PCIe 5.0 x16 interface and display controller components. Despite its substantial size, the GB202 remains smaller than NVIDIA's previous GH100 and GV100 dies, which exceeded 814 mm². Each SM integrates specialized hardware, including new 5th-generation Tensor cores and 4th-generation RT cores, contributing to the die's total of 192 RT cores, 768 Tensor cores, and 768 texture units.
Sources:
ASUS China Tony Yu, Kurnal on X, via VideoCardz
The SM design includes four slices sharing 128 KB of L1 cache and four texture mapping units (TMUs). Individual SM slices contain dedicated register files, L0 instruction caches, warp schedulers, load-store units, and special function units. Central to the die's layout is a vertical strip containing the media processing components—NVENC and NVDEC units—running from top to bottom. The RTX 5090 implementation enables three of four available NVENC encoders and two of four NVDEC decoders. The die includes twelve raster engine/3D FF blocks for geometry processing. At the bottom edge sits the PCIe 5.0 x16 interface and display controller components. Despite its substantial size, the GB202 remains smaller than NVIDIA's previous GH100 and GV100 dies, which exceeded 814 mm². Each SM integrates specialized hardware, including new 5th-generation Tensor cores and 4th-generation RT cores, contributing to the die's total of 192 RT cores, 768 Tensor cores, and 768 texture units.
27 Comments on NVIDIA GB202 "Blackwell" Die Exposed, Shows the Massive 24,576 CUDA Core Configuration
By the way -50% shaders wouldn't mean a 5060, it would mean a 5080, which is exactly what that GPU is, half of a GB202. Except that they didn't chose to simply disable half of a GB202 but instead they made a different chip because that's way more cost effective.
Fully functional: 5090 TI or not sold as a consumer product at all
almost functional: 5090
partly functional: 5080 TI - something must fill the gap anyway
unusable for above: scrapped.
The number of defects in the silicon, during lithography and production on a chip this complex rules out a full chip being feasible. I’m sure they get some that have all their parts working, I would guess the my keep that for internal use.
All it take is a few atoms of carbon or aluminum at these node sizes.
I forgot that AD102 was actually smaller than GA102.