As you well understood, my post was deliberately ignoring Nvidia there
There is a
reddit post where detailed Turing die shots were analyzed. What he came up seems to be correct enough, Tensor cores and FP16 capabilities may be more nuanced but RT Cores are distinguishable and straightforward. RT Cores make up about 6% of TPC and about 3% of total die size. The increase for Tensor cores and/or FP16 capability concurrent to FP32 has more/most uses outside RT, same for cache. Implementation for AMD and Intel should not be too much different in terms of transistors and area cost, possibly less.
I wish there were good/readable enough die shots for RDNA2 and Ampere but apparently not so far. Would also need comparisons without RT and in case of RDNA where RT capability is part of some other block (TMU?) it is probably impossible to read.
3090 is on Samsung 8N, 6900XT is on TSMC N7:
- 3090 die is 28.3B transistors on 628 mm² - 45 MTr/mm²
- 6900XT die is 26.8B transistors on 520 mm² - 51 MTr/mm²
This highlights the differences in manufacturing processes more than anything.
In terms of transistors/area cost of latest improvements RDNA2 has huge amount of transistors (at least 6.4B plus some control logic which is 24% of total transistors) in Infinity cache, Ampere no doubt has a lot of transistors in the doubled ALUs in shaders.
More cache has been the go-to improvement for a few generations before RDNA and Turing. More likely than not adding more and more cache (at different levels) would happen with or without RT.
Assuming similar transistor density as 6900XT, 3090 on N7 would be 5.5% larger, about 30 mm².
That assumption is obviously suspect though. Without Infinity Cache 6900XT die would be noticeably less dense. On the other hand, there is A100 on TSMC's N7 with 54.2B transistors and 826mm² making the density out to 65,6 MTr/mm².