NVIDIA RTX Logic Increases TPC Area by 22% Compared to Non-RTX Turing

Raevenlord · Apr 8, 2019

Public perception on NVIDIA's new RTX series of graphics cards was sometimes marred by an impression of wrong resource allocation from NVIDIA. The argument went that NVIDIA had greatly increased chip area by adding RTX functionality (in both its Tensor ad RT cores) that could have been better used for increased performance gains in shader-based, non-raytracing workloads. While the merits of ray tracing oas it stands (in terms of uptake from developers) are certainly worthy of discussion, it seems that NVIDIA didn't dedicate that much more die area to their RTX functionality - at least not to the tone of public perception.

After analyzing full, high-res images of NVIDIA's TU106 and TU116 chips, reddit user @Qesa did some analysis on the TPC structure of NVIDIA's Turing chips, and arrived at the conclusion that the difference between NVIDIA's RTX-capable TU106 compared to their RTX-stripped TU116 amounts to a mere 1.95 mm² of additional logic per TPC - a 22% area increase. Of these, 1.25 mm² are reserved for the Tensor logic (which accelerates both DLSS and de-noising on ray-traced workloads), while only 0.7 mm² are being used for the RT cores.

According to the math, this means that a TU102 chip used for the RTX 2080 Ti, which in its full configuration, has a 754 mm² area, could have done with a 684 mm² chip instead. It seems that most of the area increase compared to the Pascal architecture actually comes from increased performance (and size) of caches and larger instruction sets on Turing than from RTX functionality. Not accounting to area density achieved from the transition from 16 nm to 12 nm, a TU106 chip powering an RTX 2060 delivers around the same performance as the GP104 chip powering the GTX 1080 (410 mm² on the TU106 against 314 mm² on GP104), whilst carrying only 75% of the SM count (1920 versus 2560 SMs).

View at TechPowerUp Main Site

_Flare · Apr 8, 2019

Did @Qesa mention anything about the area for the dedicated FP16-Cores in the TU116?

londiste · Apr 8, 2019

Source: https://www.reddit.com/r/hardware/comments/baajes I believe in another comment he mentioned FP16 logic in TU116 is smaller than Tensor cores in TU106 but not negligible.

sergionography · Apr 8, 2019

I remember this coming up in the comments previously. I pointed out how a GTX 1660 ti at 284mm2 is about 10-15% slower than a GTX 1080 at 314mm2 (also 10-15% smaller in size)
It makes me question whether Turing actually improved on pascal at all. Similar performance for a given footprint. Im guessing where it comes in handy is when you scale it to larger chips where having a lesser number of more powerful cores is more manageable and scales better. Perhaps nvidias architecture also has a core count cap similar to hoe GCN caps at 4096?

And as for tensor cores; they are the most useless things in a gaming card as far as I am concerned.

cucker tarlson · Apr 8, 2019

I'm more interested in power that rtx logic requires.

londiste · Apr 8, 2019

cucker tarlson said:
I'm more interested in power that rtx logic requires.

If it is not used, it gets power gated and does not use any significant amount of power.

TheoneandonlyMrK · Apr 8, 2019

Only 22% , how's that a only for a start that's a lot of shader space ?.

That simple figure doesn't account for the extra caches to feed them , they're likely to have increased in size to accommodate them, so 25-27% if we throw fp16 hardware too I would think.

Seams odd anyway , I like chip pic's like anyone but,

It's a bit late to be pushing this angle IMHO.

danbert2000 · Apr 8, 2019

The big change for Turing was being able to do INT and FP operations at the same time. I'm sure that cost some transistors. The perf per clock and clock speeds are higher too, I'm not surprised at all that the chips are bigger. Same thing happened with Maxwell I believe, the GPUs got bigger to support the higher clocks.

bug · Apr 8, 2019

That is actually good news.

Vya Domus · Apr 8, 2019

A lot of people think shaders take up a lot of space and that the relationship between how many shaders a GPU has and it's size is linear. But really it isn't, for example, GP106 had 1280 shaders and was 200m^2 , GP104 has twice the shaders and was only ~60% bigger. That's because caches, crossbars, memory controllers, etc don't scale at the same rate. That being said 22% is a lot when everything is put into perspective.

Even more concerning is how much in percentages these things take from the power budget. 12nm didn't bring any notable efficiency gains and Turing only uses slightly more power than their Pascal equivalent does. But we now know RT and Tensor cores uses about a fifth of the silicon, I have a suspicion that when this silicon is in use it doesn't just cause the GPU to use more power but it may actually eat away at the power budget the other parts of the chip would otherwise get performing traditional shading.

Fiendish · Apr 9, 2019

Don't those numbers add up to like a 10% increase, where is 22% coming from?

Edit: I see now.

Sora · Apr 9, 2019

damnit, now i need to go on reddit and beat this information over the heads of the fools that kept repeating the RT is 1/3 the die crap.

Crackong · Apr 9, 2019

22% is A LOT.
With a slight overclock, we could have the performance of 1080ti with the price of a 1080.
All of that was lost because some RTX holy grail "Just Works".
It works so well that after 6 months of launch, nobody can utilize all of the RTX features in a single game.
And DLSS is such a gimmick feature that have to beg The leather jacket himself to fire up his multi-billion AI computer to train for your multi-million AAAAA loot box micro - transaction "game" in the first place.

Thank you Leather Jacket.
Praise the Leather Jacket.

MuhammedAbdo · Apr 9, 2019

Crackong said:
22% is A LOT.
With a slight overclock, we could have the performance of 1080ti with the price of a 1080.
All of that was lost because some RTX holy grail "Just Works".
It works so well that after 6 months of launch, nobody can utilize all of the RTX features in a single game.
And DLSS is such a gimmick feature that have to beg The leather jacket himself to fire up his multi-billion AI computer to train for your multi-million AAAAA loot box micro - transaction "game" in the first place.

It only amounts to 10% die increase

According to the math, this means that a TU102 chip used for the RTX 2080 Ti, which in its full configuration, has a 754 mm² area, could have done with a 684 mm² chip instead.

Crackong · Apr 9, 2019

MuhammedAbdo said:
It only amounts to 10% die increase

TPC units do not populate the whole die.
Just like an Intel CPU had almost half of its die size populated by the i-GPU.

Therefore decreasing TPC size by 22% only results a 10% reduction in overall die size.

kastriot · Apr 9, 2019

Maybe RTX cores work in tandem with normal cores so it's some kind of compromise?

Kaapstad · Apr 9, 2019

It does not matter how it is dressed up, RTX cards are NVidia's worst products in a very long time.

NVidia have totally screwed up on price/performance and there is no getting away from that.

It would be nice if NVidia could give us some cards that "just work for the price" next time they launch a new architecture.

Epic fail NVidia, it is enough to make Alan Turing eat an apple.

londiste · Apr 9, 2019

Vya Domus said:
Even more concerning is how much in percentages these things take from the power budget. 12nm didn't bring any notable efficiency gains and Turing only uses slightly more power than their Pascal equivalent does. But we now know RT and Tensor cores uses about a fifth of the silicon, I have a suspicion that when this silicon is in use it doesn't just cause the GPU to use more power but it may actually eat away at the power budget the other parts of the chip would otherwise get performing traditional shading.

While RT Cores are used, they definitely eat into power budget but the extent of it is unknown. I do have a feeling it is less than we would expect though, using DXR stuff in games like SoTR - that is otherwise heavy load and causes a bit lower clocks than normal due to power limit - does not decrease the clocks noticeably. Unfortunately, I don't think anyone (besides Nvidia) has even a good idea for how to test this. RT Cores, while separate units, always have the rest of the chip feeding data into them effectively being just another ALU in CUDA core.

Vayra86 · Apr 9, 2019

MuhammedAbdo said:
It only amounts to 10% die increase

People forget that the shader itself and the L2 cache was also expanded a bit. Turing always contains part of the RTX logic even in the 1660ti.

This news article says nothing against the claim that about 17-20% of the die is needed for RTRT with Turing which, given the die schematic still looks plausible to me. All things considered the architecture does not perform much better (per watt) than Pascal in regular loads, its hit or miss in that sense. The real comparison here is die size vs absolute performance on non-RT workloads for Pascal versus Turing. The rest only serves to make matters complicated for no benefit.

cucker tarlson said:
I'm more interested in power that rtx logic requires.

Makes two (and probably many more). Its a pretty complicated test I think, but I think the best way to get a handle on it, is to put Turing RTX and non-RTX next to Pascal and test at fixed FPS all cards can manage, then measure power consumption. That is with the assumption that we accept Pascal and Turing to have a ballpark equal perf/watt figure; though you could probably apply a formula for any deviation as well; the problem here is that its not linear per game.

londiste · Apr 9, 2019

The comparison is made based on TU106 vs TU116 - focused on RT Cores and Tensor cores. The shader changes from Pascal (to Volta) to Turing are not being looked at. This is where a considerable amount of additional transistors went.

bug · Apr 9, 2019

Vayra86 said:
Makes two (and probably many more). Its a pretty complicated test I think, but I think the best way to get a handle on it, is to put Turing RTX and non-RTX next to Pascal and test at fixed FPS all cards can manage, then measure power consumption. That is with the assumption that we accept Pascal and Turing to have a ballpark equal perf/watt figure; though you could probably apply a formula for any deviation as well; the problem here is that its not linear per game.

I think this one is tricky to measure (but I'd still like to know). Turn of RTX and the card will draw more frames. Draw more frames, the power draw goes up

You can mitigate that by locking the FPS to a set value, but then you're not stressing the hardware enough

londiste · Apr 9, 2019

Vayra86 said:
Makes two (and probably many more). Its a pretty complicated test I think, but I think the best way to get a handle on it, is to put Turing RTX and non-RTX next to Pascal and test at fixed FPS all cards can manage, then measure power consumption. That is with the assumption that we accept Pascal and Turing to have a ballpark equal perf/watt figure; though you could probably apply a formula for any deviation as well; the problem here is that its not linear per game.

It is not that simple. There are no cards that are directly comparable in terms of resources.
- RTX2070 (2304:144:64 and 8GB GDDR6 on 256-bit bus) vs GTX1070Ti (2432:125:64 and 8GB GDDR5 on 256-bit bus) is the closest comparison we can make. And even here, shaders, TMUs and memory type are different and we cannot make an exact correction for it. Memory we could account for roughly but more shaders and less TMUs on GTX is tough.
- RTX2080 (2944:184:64 and 8GB GDDR6 on 256-bit bus) vs GTX1080Ti (3584:224:88 and 11GB GDDR5X on 352-bit bus) is definitely closer in performance at stock but discrepancy in individual resources is larger, mostly due to larger memory controller along with associated ROPs.

The other question is which games/tests to run. Anything that is able to utilize new features in Turing either inherently (some concurrent INT+FP) or with dev support (RPM) will do better and is likely to better justify the additional cost.

Tom's Hardware Germany did try running RTX2080Ti at lower power limits and comparison in Metro Last Light: Redux. It ties to GTX 1080Ti (2GHz and 280W) at around 160W.
https://www.tomshw.de/2019/04/04/nv...effizienz-test-von-140-bis-340-watt-igorslab/
However, this is a very flawed comparison as RTX2080Ti has 21% more shaders and TMUs along with 27% more memory bandwidth. This guy overclocked the memory on 2080Ti from 1750MHz to 2150MHz making the memory bandwidth difference 56%. Lowering the power limit lowers the core clock (slightly above 1GHz at 160W) but does not reduce memory bandwidth.

Edit: Actually, in that Tom's Hardware Germany comparison, RTX2080Ti runs at roughly 2GHz at 340W. Considering that it has 21% more shaders than GTX 1080Ti we can roughly calculate the power consumption for comparison - 340W / 1.21 = 280.1W which is very close to GTX1080Ti's 280W number. This means shaders are consuming roughly the same amount of power. At the same time, performance is up 47% in average FPS and 57% in min FPS. Turing does appear to be more efficient even in old games but not by very much.

ppn · Apr 9, 2019

445/284 sq.mm is 156% increase. and 2070 is 150% over 1660Ti, so clearly. RT core only takes about 4% of space. Not much.
It would be nice to mark the different functional parts on the infrared photo. I think the RT is still there, in no way RT can be 4%.

System Name	The Ryzening
Processor	AMD Ryzen 9 5900X
Motherboard	MSI X570 MAG TOMAHAWK
Cooling	Lian Li Galahad 360mm AIO
Memory	32 GB G.Skill Trident Z F4-3733 (4x 8 GB)
Video Card(s)	Gigabyte RTX 3070 Ti
Storage	Boot: Transcend MTE220S 2TB, Kintson A2000 1TB, Seagate Firewolf Pro 14 TB
Display(s)	Acer Nitro VG270UP (1440p 144 Hz IPS)
Case	Lian Li O11DX Dynamic White
Audio Device(s)	iFi Audio Zen DAC
Power Supply	Seasonic Focus+ 750 W
Mouse	Cooler Master Masterkeys Lite L
Keyboard	Cooler Master Masterkeys Lite L
Software	Windows 10 x64

Processor	Ryzen 5600X
Motherboard	MSI A520
Cooling	Thermalright ARO-M14 orange
Memory	2x 8GB 3200
Video Card(s)	RTX 3050 (ROG Strix Bios)
Storage	SATA SSD
Display(s)	UltraHD TV
Case	Sharkoon AM5 Window red
Audio Device(s)	Headset
Power Supply	beQuiet 400W
Mouse	Mountain Makalu 67
Keyboard	MS Sidewinder X4
Software	Windows, Vivaldi, Thunderbird, LibreOffice, Games, etc.

Processor	Ryzen 7800X3D
Motherboard	ROG STRIX B650E-F GAMING WIFI
Memory	2x16GB G.Skill Flare X5 DDR5-6000 CL36 (F5-6000J3636F16GX2-FX5)
Video Card(s)	INNO3D GeForce RTX™ 4070 Ti SUPER TWIN X2
Storage	2TB Samsung 980 PRO, 4TB WD Black SN850X
Display(s)	42" LG C2 OLED, 27" ASUS PG279Q
Case	Thermaltake Core P5
Power Supply	Fractal Design Ion+ Platinum 760W
Mouse	Corsair Dark Core RGB Pro SE
Keyboard	Corsair K100 RGB
VR HMD	HTC Vive Cosmos

System Name	Purple rain
Processor	10.5 thousand 4.2G 1.1v
Motherboard	Zee 490 Aorus Elite
Cooling	Noctua D15S
Memory	16GB 4133 CL16-16-16-31 Viper Steel
Video Card(s)	RTX 2070 Super Gaming X Trio
Storage	SU900 128,8200Pro 1TB,850 Pro 512+256+256,860 Evo 500,XPG950 480, Skyhawk 2TB
Display(s)	Acer XB241YU+Dell S2716DG
Case	P600S Silent w. Alpenfohn wing boost 3 ARGBT+ fans
Audio Device(s)	K612 Pro w. FiiO E10k DAC,W830BT wireless
Power Supply	Superflower Leadex Gold 850W
Mouse	G903 lightspeed+powerplay,G403 wireless + Steelseries DeX + Roccat rest
Keyboard	HyperX Alloy SilverSpeed (w.HyperX wrist rest),Razer Deathstalker
Software	Windows 10
Benchmark Scores	A LOT

Processor	Ryzen 7800X3D
Motherboard	ROG STRIX B650E-F GAMING WIFI
Memory	2x16GB G.Skill Flare X5 DDR5-6000 CL36 (F5-6000J3636F16GX2-FX5)
Video Card(s)	INNO3D GeForce RTX™ 4070 Ti SUPER TWIN X2
Storage	2TB Samsung 980 PRO, 4TB WD Black SN850X
Display(s)	42" LG C2 OLED, 27" ASUS PG279Q
Case	Thermaltake Core P5
Power Supply	Fractal Design Ion+ Platinum 760W
Mouse	Corsair Dark Core RGB Pro SE
Keyboard	Corsair K100 RGB
VR HMD	HTC Vive Cosmos

System Name	RyzenGtEvo/ Asus strix scar II
Processor	Amd R5 5900X/ Intel 8750H
Motherboard	Crosshair hero8 impact/Asus
Cooling	360EK extreme rad+ 360$EK slim all push, cpu ek suprim Gpu full cover all EK
Memory	Gskill Trident Z 3900cas18 32Gb in four sticks./16Gb/16GB
Video Card(s)	Asus tuf RX7900XT /Rtx 2060
Storage	Silicon power 2TB nvme/8Tb external/1Tb samsung Evo nvme 2Tb sata ssd/1Tb nvme
Display(s)	Samsung UAE28"850R 4k freesync.dell shiter
Case	Lianli 011 dynamic/strix scar2
Audio Device(s)	Xfi creative 7.1 on board ,Yamaha dts av setup, corsair void pro headset
Power Supply	corsair 1200Hxi/Asus stock
Mouse	Roccat Kova/ Logitech G wireless
Keyboard	Roccat Aimo 120
VR HMD	Oculus rift
Software	Win 10 Pro
Benchmark Scores	laptop Timespy 6506

System Name	Baxter
Processor	AMD Ryzen 7 5800X3D
Motherboard	ASRock B550 Phantom Gaming ITX
Cooling	Scythe Mugen 5 Rev. B with Noctua NF-A12x25, NF-A9
Memory	16 GB Crucial Ballistix Elite DDR4 @ 3600 MHz
Video Card(s)	ASRock RX 9070 XT Steel Legend
Storage	WD SN850X 4 TB PCIe 4.0 M.2 SSD
Display(s)	Sony X90J 65" 4KTV @ 120 Hz VRR
Case	Corsair NR200P
Audio Device(s)	Samsung Q930C Atmos 7.1.4 Surround Sound system
Power Supply	Corsair SF750 Platinum 750 W SFX PSU
Mouse	Logitech MX Master 2S
Keyboard	Logitech G613 and Microsoft Media Keyboard
VR HMD	Meta Quest 3

Processor	Intel i5-12600k
Motherboard	Asus H670 TUF
Cooling	Arctic Freezer 34
Memory	2x16GB DDR4 3600 G.Skill Ripjaws V
Video Card(s)	EVGA GTX 1060 SC
Storage	500GB Samsung 970 EVO, 500GB Samsung 850 EVO, 1TB Crucial MX300 and 2TB Crucial MX500
Display(s)	Dell U3219Q + HP ZR24w
Case	Raijintek Thetis
Audio Device(s)	Audioquest Dragonfly Red :D
Power Supply	Seasonic 620W M12
Mouse	Logitech G502 Proteus Core
Keyboard	G.Skill KM780R
Software	Arch Linux + Win10

System Name	Good enough
Processor	AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard	ASRock B650 Pro RS
Cooling	2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory	32GB - FURY Beast RGB 5600 Mhz
Video Card(s)	Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage	1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s)	LG UltraGear 32GN650-B + 4K Samsung TV
Case	Phanteks NV7
Power Supply	GPS-750C

System Name	Personal Gaming Rig
Processor	Ryzen 7800X3D
Motherboard	MSI X670E Carbon
Cooling	MO-RA 3 420
Memory	32GB 6000MHz
Video Card(s)	RTX 4090 ICHILL FROSTBITE ULTRA
Storage	4x 2TB Nvme
Display(s)	Samsung G8 OLED
Case	Silverstone FT04

System Name	My PC
Processor	4670K@4.4GHz
Motherboard	Gryphon Z87
Cooling	CM 212
Memory	2x8GB+2x4GB @2400GHz
Video Card(s)	XFX Radeon RX 580 GTS Black Edition 1425MHz OC+, 8GB
Storage	Intel 530 SSD 480GB + Intel 510 SSD 120GB + 2x500GB hdd raid 1
Display(s)	HP envy 32 1440p
Case	CM Mastercase 5
Audio Device(s)	Sbz ZXR
Power Supply	Antec 620W
Mouse	G502
Keyboard	G910
Software	Win 10 pro

System Name	HAL
Processor	i7 975
Motherboard	Asus Rampage III
Cooling	Corsair H50
Memory	Patriot Viper 2000mhz 8-8-8-24
Video Card(s)	HD 5970 x 2
Storage	C300 256gb x 2
Case	Corsair 800D
Power Supply	Corsair AX1200

System Name	Tiny the White Yeti
Processor	7800X3D
Motherboard	MSI MAG Mortar b650m wifi
Cooling	CPU: Thermalright Peerless Assassin / Case: Phanteks T30-120 x3
Memory	32GB Corsair Vengeance 30CL6000
Video Card(s)	ASRock RX7900XT Phantom Gaming
Storage	Lexar NM790 4TB + Samsung 850 EVO 1TB + Samsung 980 1TB + Crucial BX100 250GB
Display(s)	Gigabyte G34QWC (3440x1440)
Case	Lian Li A3 mATX White
Audio Device(s)	Harman Kardon AVR137 + 2.1
Power Supply	EVGA Supernova G2 750W
Mouse	Steelseries Aerox 5
Keyboard	Lenovo Thinkpad Trackpoint II
VR HMD	HD 420 - Green Edition ;)
Software	W11 IoT Enterprise LTSC
Benchmark Scores	Over 9000

NVIDIA RTX Logic Increases TPC Area by 22% Compared to Non-RTX Turing

News Editor