Wednesday, September 21st 2022
NVIDIA RTX 4090 Doesn't Max-Out AD102, Ample Room Left for Future RTX 4090 Ti
The AD102 silicon on which NVIDIA's new flagship graphics card, the GeForce RTX 4090, is based, is a marvel of semiconductor engineering. Built on the 4 nm EUV (TSMC 4N) silicon fabrication process, the chip has a gargantuan transistor-count of 76.3 billion, a nearly 170% increase over the previous GA102, and a die-size of 608 mm², which is in fact smaller than the 628 mm² die-area of the GA102. This is thanks to TSMC 4N offering nearly thrice the transistor-density of the Samsung 8LPP node on which the GA102 is based.
The AD102 physically features 18,432 CUDA cores, 568 fourth-generation Tensor cores, and 142 third-generation RT cores. The streaming multiprocessors (SM) come with special components that enable the Shader Execution Reordering optimization, which has a significant performance impact on both raster- and ray traced graphics rendering performance. The silicon supports up to 24 GB of GDDR6X or up to 48 GB of GDDR6+ECC memory (the latter will be seen in the RTX Ada professional-visualization card), across a 384-bit wide memory bus. There are 568 TMUs, and a mammoth 192 ROPs on the silicon.The RTX 4090 is carved out of this silicon by enabling 16,384 out of 18,432 CUDA cores. 512 out of 568 Tensor cores, 512 out of 568 TMUs, 128 out of 142 RT cores, and unless NVIDIA has touched the ROP count, it could remain at 192. The memory bus is maxed out, with 24 GB of 21 Gbps GDDR6X memory across the 384-bit bus-width. In creating the RTX 4090, NVIDIA has given itself a 10% headroom in the number-crunching machinery, from which to carve out future SKUs such as the possible RTX 4090 Ti. Until that SKU is needed in the product-stack, NVIDIA will use this 10% margin toward harvesting the AD102 silicon.
The AD102 physically features 18,432 CUDA cores, 568 fourth-generation Tensor cores, and 142 third-generation RT cores. The streaming multiprocessors (SM) come with special components that enable the Shader Execution Reordering optimization, which has a significant performance impact on both raster- and ray traced graphics rendering performance. The silicon supports up to 24 GB of GDDR6X or up to 48 GB of GDDR6+ECC memory (the latter will be seen in the RTX Ada professional-visualization card), across a 384-bit wide memory bus. There are 568 TMUs, and a mammoth 192 ROPs on the silicon.The RTX 4090 is carved out of this silicon by enabling 16,384 out of 18,432 CUDA cores. 512 out of 568 Tensor cores, 512 out of 568 TMUs, 128 out of 142 RT cores, and unless NVIDIA has touched the ROP count, it could remain at 192. The memory bus is maxed out, with 24 GB of 21 Gbps GDDR6X memory across the 384-bit bus-width. In creating the RTX 4090, NVIDIA has given itself a 10% headroom in the number-crunching machinery, from which to carve out future SKUs such as the possible RTX 4090 Ti. Until that SKU is needed in the product-stack, NVIDIA will use this 10% margin toward harvesting the AD102 silicon.
27 Comments on NVIDIA RTX 4090 Doesn't Max-Out AD102, Ample Room Left for Future RTX 4090 Ti
Mhm, but TSMC's are much better.
Good job, nVIDIA.
And here we are seeing the same GDDR6X on TSMC with more memory alongside a smaller die with many more transistors at a relatively small increase of power budget :)
There seems to be some ambiguity around the ROP count. Is there an official number yet?
Refreshes require more mature manufacturing nodes and a build up of harvested dies that yield more or less silicon to be activated. It’s a way to sell as many chips as possible given the reality of defects and poor yields near the beginning of new product series.
As an aside, its also easier to make one complete chip and then lock otherwise functioning parts to create lower SKUs. This only works up to a point when the ‘dead’ or ‘locked’ silicon exceeds the portion of working parts of the chip at which point you manufacture a smaller ‘native’ chip.
Edit: Oh and sometimes later SKU refreshes are just added in response to competition product releases. A company might even save such responses from the beginning on purpose to see how the competition reacts.
...on launch day.
This is how tech advancement works.