• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

NVIDIA RTX Logic Increases TPC Area by 22% Compared to Non-RTX Turing

Raevenlord

News Editor
Joined
Aug 12, 2016
Messages
3,755 (1.24/day)
Location
Portugal
System Name The Ryzening
Processor AMD Ryzen 9 5900X
Motherboard MSI X570 MAG TOMAHAWK
Cooling Lian Li Galahad 360mm AIO
Memory 32 GB G.Skill Trident Z F4-3733 (4x 8 GB)
Video Card(s) Gigabyte RTX 3070 Ti
Storage Boot: Transcend MTE220S 2TB, Kintson A2000 1TB, Seagate Firewolf Pro 14 TB
Display(s) Acer Nitro VG270UP (1440p 144 Hz IPS)
Case Lian Li O11DX Dynamic White
Audio Device(s) iFi Audio Zen DAC
Power Supply Seasonic Focus+ 750 W
Mouse Cooler Master Masterkeys Lite L
Keyboard Cooler Master Masterkeys Lite L
Software Windows 10 x64
Public perception on NVIDIA's new RTX series of graphics cards was sometimes marred by an impression of wrong resource allocation from NVIDIA. The argument went that NVIDIA had greatly increased chip area by adding RTX functionality (in both its Tensor ad RT cores) that could have been better used for increased performance gains in shader-based, non-raytracing workloads. While the merits of ray tracing oas it stands (in terms of uptake from developers) are certainly worthy of discussion, it seems that NVIDIA didn't dedicate that much more die area to their RTX functionality - at least not to the tone of public perception.

After analyzing full, high-res images of NVIDIA's TU106 and TU116 chips, reddit user @Qesa did some analysis on the TPC structure of NVIDIA's Turing chips, and arrived at the conclusion that the difference between NVIDIA's RTX-capable TU106 compared to their RTX-stripped TU116 amounts to a mere 1.95 mm² of additional logic per TPC - a 22% area increase. Of these, 1.25 mm² are reserved for the Tensor logic (which accelerates both DLSS and de-noising on ray-traced workloads), while only 0.7 mm² are being used for the RT cores.





According to the math, this means that a TU102 chip used for the RTX 2080 Ti, which in its full configuration, has a 754 mm² area, could have done with a 684 mm² chip instead. It seems that most of the area increase compared to the Pascal architecture actually comes from increased performance (and size) of caches and larger instruction sets on Turing than from RTX functionality. Not accounting to area density achieved from the transition from 16 nm to 12 nm, a TU106 chip powering an RTX 2060 delivers around the same performance as the GP104 chip powering the GTX 1080 (410 mm² on the TU106 against 314 mm² on GP104), whilst carrying only 75% of the SM count (1920 versus 2560 SMs).

View at TechPowerUp Main Site
 
Joined
Jun 19, 2010
Messages
409 (0.08/day)
Location
Germany
Processor Ryzen 5600X
Motherboard MSI A520
Cooling Thermalright ARO-M14 orange
Memory 2x 8GB 3200
Video Card(s) RTX 3050 (ROG Strix Bios)
Storage SATA SSD
Display(s) UltraHD TV
Case Sharkoon AM5 Window red
Audio Device(s) Headset
Power Supply beQuiet 400W
Mouse Mountain Makalu 67
Keyboard MS Sidewinder X4
Software Windows, Vivaldi, Thunderbird, LibreOffice, Games, etc.
Did @Qesa mention anything about the area for the dedicated FP16-Cores in the TU116?
 
Joined
Feb 3, 2017
Messages
3,746 (1.32/day)
Processor Ryzen 7800X3D
Motherboard ROG STRIX B650E-F GAMING WIFI
Memory 2x16GB G.Skill Flare X5 DDR5-6000 CL36 (F5-6000J3636F16GX2-FX5)
Video Card(s) INNO3D GeForce RTX™ 4070 Ti SUPER TWIN X2
Storage 2TB Samsung 980 PRO, 4TB WD Black SN850X
Display(s) 42" LG C2 OLED, 27" ASUS PG279Q
Case Thermaltake Core P5
Power Supply Fractal Design Ion+ Platinum 760W
Mouse Corsair Dark Core RGB Pro SE
Keyboard Corsair K100 RGB
VR HMD HTC Vive Cosmos
Joined
Feb 13, 2012
Messages
523 (0.11/day)
I remember this coming up in the comments previously. I pointed out how a GTX 1660 ti at 284mm2 is about 10-15% slower than a GTX 1080 at 314mm2 (also 10-15% smaller in size)
It makes me question whether Turing actually improved on pascal at all. Similar performance for a given footprint. Im guessing where it comes in handy is when you scale it to larger chips where having a lesser number of more powerful cores is more manageable and scales better. Perhaps nvidias architecture also has a core count cap similar to hoe GCN caps at 4096?

And as for tensor cores; they are the most useless things in a gaming card as far as I am concerned.
 
Joined
Aug 6, 2017
Messages
7,412 (2.78/day)
Location
Poland
System Name Purple rain
Processor 10.5 thousand 4.2G 1.1v
Motherboard Zee 490 Aorus Elite
Cooling Noctua D15S
Memory 16GB 4133 CL16-16-16-31 Viper Steel
Video Card(s) RTX 2070 Super Gaming X Trio
Storage SU900 128,8200Pro 1TB,850 Pro 512+256+256,860 Evo 500,XPG950 480, Skyhawk 2TB
Display(s) Acer XB241YU+Dell S2716DG
Case P600S Silent w. Alpenfohn wing boost 3 ARGBT+ fans
Audio Device(s) K612 Pro w. FiiO E10k DAC,W830BT wireless
Power Supply Superflower Leadex Gold 850W
Mouse G903 lightspeed+powerplay,G403 wireless + Steelseries DeX + Roccat rest
Keyboard HyperX Alloy SilverSpeed (w.HyperX wrist rest),Razer Deathstalker
Software Windows 10
Benchmark Scores A LOT
I'm more interested in power that rtx logic requires.
 
Joined
Feb 3, 2017
Messages
3,746 (1.32/day)
Processor Ryzen 7800X3D
Motherboard ROG STRIX B650E-F GAMING WIFI
Memory 2x16GB G.Skill Flare X5 DDR5-6000 CL36 (F5-6000J3636F16GX2-FX5)
Video Card(s) INNO3D GeForce RTX™ 4070 Ti SUPER TWIN X2
Storage 2TB Samsung 980 PRO, 4TB WD Black SN850X
Display(s) 42" LG C2 OLED, 27" ASUS PG279Q
Case Thermaltake Core P5
Power Supply Fractal Design Ion+ Platinum 760W
Mouse Corsair Dark Core RGB Pro SE
Keyboard Corsair K100 RGB
VR HMD HTC Vive Cosmos
Joined
Mar 10, 2010
Messages
11,878 (2.21/day)
Location
Manchester uk
System Name RyzenGtEvo/ Asus strix scar II
Processor Amd R5 5900X/ Intel 8750H
Motherboard Crosshair hero8 impact/Asus
Cooling 360EK extreme rad+ 360$EK slim all push, cpu ek suprim Gpu full cover all EK
Memory Corsair Vengeance Rgb pro 3600cas14 16Gb in four sticks./16Gb/16GB
Video Card(s) Powercolour RX7900XT Reference/Rtx 2060
Storage Silicon power 2TB nvme/8Tb external/1Tb samsung Evo nvme 2Tb sata ssd/1Tb nvme
Display(s) Samsung UAE28"850R 4k freesync.dell shiter
Case Lianli 011 dynamic/strix scar2
Audio Device(s) Xfi creative 7.1 on board ,Yamaha dts av setup, corsair void pro headset
Power Supply corsair 1200Hxi/Asus stock
Mouse Roccat Kova/ Logitech G wireless
Keyboard Roccat Aimo 120
VR HMD Oculus rift
Software Win 10 Pro
Benchmark Scores 8726 vega 3dmark timespy/ laptop Timespy 6506
Only 22% , how's that a only for a start that's a lot of shader space ?.

That simple figure doesn't account for the extra caches to feed them , they're likely to have increased in size to accommodate them, so 25-27% if we throw fp16 hardware too I would think.

Seams odd anyway , I like chip pic's like anyone but,

It's a bit late to be pushing this angle IMHO.
 
Joined
Jun 16, 2016
Messages
409 (0.13/day)
System Name Baxter
Processor Intel i7-5775C @ 4.2 GHz 1.35 V
Motherboard ASRock Z97-E ITX/AC
Cooling Scythe Big Shuriken 3 with Noctua NF-A12 fan
Memory 16 GB 2400 MHz CL11 HyperX Savage DDR3
Video Card(s) EVGA RTX 2070 Super Black @ 1950 MHz
Storage 1 TB Sabrent Rocket 2242 NVMe SSD (boot), 500 GB Samsung 850 EVO, and 4TB Toshiba X300 7200 RPM HDD
Display(s) Vizio P65-F1 4KTV (4k60 with HDR or 1080p120)
Case Raijintek Ophion
Audio Device(s) HDMI PCM 5.1, Vizio 5.1 surround sound
Power Supply Corsair SF600 Platinum 600 W SFX PSU
Mouse Logitech MX Master 2S
Keyboard Logitech G613 and Microsoft Media Keyboard
The big change for Turing was being able to do INT and FP operations at the same time. I'm sure that cost some transistors. The perf per clock and clock speeds are higher too, I'm not surprised at all that the chips are bigger. Same thing happened with Maxwell I believe, the GPUs got bigger to support the higher clocks.
 

bug

Joined
May 22, 2015
Messages
13,747 (3.96/day)
Processor Intel i5-12600k
Motherboard Asus H670 TUF
Cooling Arctic Freezer 34
Memory 2x16GB DDR4 3600 G.Skill Ripjaws V
Video Card(s) EVGA GTX 1060 SC
Storage 500GB Samsung 970 EVO, 500GB Samsung 850 EVO, 1TB Crucial MX300 and 2TB Crucial MX500
Display(s) Dell U3219Q + HP ZR24w
Case Raijintek Thetis
Audio Device(s) Audioquest Dragonfly Red :D
Power Supply Seasonic 620W M12
Mouse Logitech G502 Proteus Core
Keyboard G.Skill KM780R
Software Arch Linux + Win10
That is actually good news.
 
Joined
Jan 8, 2017
Messages
9,425 (3.28/day)
System Name Good enough
Processor AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard ASRock B650 Pro RS
Cooling 2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory 32GB - FURY Beast RGB 5600 Mhz
Video Card(s) Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage 1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s) LG UltraGear 32GN650-B + 4K Samsung TV
Case Phanteks NV7
Power Supply GPS-750C
A lot of people think shaders take up a lot of space and that the relationship between how many shaders a GPU has and it's size is linear. But really it isn't, for example, GP106 had 1280 shaders and was 200m^2 , GP104 has twice the shaders and was only ~60% bigger. That's because caches, crossbars, memory controllers, etc don't scale at the same rate. That being said 22% is a lot when everything is put into perspective.

Even more concerning is how much in percentages these things take from the power budget. 12nm didn't bring any notable efficiency gains and Turing only uses slightly more power than their Pascal equivalent does. But we now know RT and Tensor cores uses about a fifth of the silicon, I have a suspicion that when this silicon is in use it doesn't just cause the GPU to use more power but it may actually eat away at the power budget the other parts of the chip would otherwise get performing traditional shading.
 
Last edited:
Joined
Mar 8, 2013
Messages
20 (0.00/day)
Don't those numbers add up to like a 10% increase, where is 22% coming from?

Edit: I see now.
 
Last edited:
Joined
Feb 15, 2019
Messages
1,658 (0.79/day)
System Name Personal Gaming Rig
Processor Ryzen 7800X3D
Motherboard MSI X670E Carbon
Cooling MO-RA 3 420
Memory 32GB 6000MHz
Video Card(s) RTX 4090 ICHILL FROSTBITE ULTRA
Storage 4x 2TB Nvme
Display(s) Samsung G8 OLED
Case Silverstone FT04
22% is A LOT.
With a slight overclock, we could have the performance of 1080ti with the price of a 1080.
All of that was lost because some RTX holy grail "Just Works".
It works so well that after 6 months of launch, nobody can utilize all of the RTX features in a single game.
And DLSS is such a gimmick feature that have to beg The leather jacket himself to fire up his multi-billion AI computer to train for your multi-million AAAAA loot box micro - transaction "game" in the first place.


Thank you Leather Jacket.
Praise the Leather Jacket.
 
Joined
Mar 26, 2009
Messages
176 (0.03/day)
22% is A LOT.
With a slight overclock, we could have the performance of 1080ti with the price of a 1080.
All of that was lost because some RTX holy grail "Just Works".
It works so well that after 6 months of launch, nobody can utilize all of the RTX features in a single game.
And DLSS is such a gimmick feature that have to beg The leather jacket himself to fire up his multi-billion AI computer to train for your multi-million AAAAA loot box micro - transaction "game" in the first place.
It only amounts to 10% die increase
According to the math, this means that a TU102 chip used for the RTX 2080 Ti, which in its full configuration, has a 754 mm² area, could have done with a 684 mm² chip instead.
 
Joined
Feb 15, 2019
Messages
1,658 (0.79/day)
System Name Personal Gaming Rig
Processor Ryzen 7800X3D
Motherboard MSI X670E Carbon
Cooling MO-RA 3 420
Memory 32GB 6000MHz
Video Card(s) RTX 4090 ICHILL FROSTBITE ULTRA
Storage 4x 2TB Nvme
Display(s) Samsung G8 OLED
Case Silverstone FT04
It only amounts to 10% die increase

TPC units do not populate the whole die.
Just like an Intel CPU had almost half of its die size populated by the i-GPU.

Therefore decreasing TPC size by 22% only results a 10% reduction in overall die size.
 
Joined
Jun 15, 2016
Messages
1,042 (0.34/day)
Location
Pristina
System Name My PC
Processor 4670K@4.4GHz
Motherboard Gryphon Z87
Cooling CM 212
Memory 2x8GB+2x4GB @2400GHz
Video Card(s) XFX Radeon RX 580 GTS Black Edition 1425MHz OC+, 8GB
Storage Intel 530 SSD 480GB + Intel 510 SSD 120GB + 2x500GB hdd raid 1
Display(s) HP envy 32 1440p
Case CM Mastercase 5
Audio Device(s) Sbz ZXR
Power Supply Antec 620W
Mouse G502
Keyboard G910
Software Win 10 pro
Maybe RTX cores work in tandem with normal cores so it's some kind of compromise?
 
Joined
Jan 14, 2014
Messages
75 (0.02/day)
System Name HAL
Processor i7 975
Motherboard Asus Rampage III
Cooling Corsair H50
Memory Patriot Viper 2000mhz 8-8-8-24
Video Card(s) HD 5970 x 2
Storage C300 256gb x 2
Case Corsair 800D
Power Supply Corsair AX1200
It does not matter how it is dressed up, RTX cards are NVidia's worst products in a very long time.

NVidia have totally screwed up on price/performance and there is no getting away from that.

It would be nice if NVidia could give us some cards that "just work for the price" next time they launch a new architecture.

Epic fail NVidia, it is enough to make Alan Turing eat an apple.
 
Joined
Feb 3, 2017
Messages
3,746 (1.32/day)
Processor Ryzen 7800X3D
Motherboard ROG STRIX B650E-F GAMING WIFI
Memory 2x16GB G.Skill Flare X5 DDR5-6000 CL36 (F5-6000J3636F16GX2-FX5)
Video Card(s) INNO3D GeForce RTX™ 4070 Ti SUPER TWIN X2
Storage 2TB Samsung 980 PRO, 4TB WD Black SN850X
Display(s) 42" LG C2 OLED, 27" ASUS PG279Q
Case Thermaltake Core P5
Power Supply Fractal Design Ion+ Platinum 760W
Mouse Corsair Dark Core RGB Pro SE
Keyboard Corsair K100 RGB
VR HMD HTC Vive Cosmos
Even more concerning is how much in percentages these things take from the power budget. 12nm didn't bring any notable efficiency gains and Turing only uses slightly more power than their Pascal equivalent does. But we now know RT and Tensor cores uses about a fifth of the silicon, I have a suspicion that when this silicon is in use it doesn't just cause the GPU to use more power but it may actually eat away at the power budget the other parts of the chip would otherwise get performing traditional shading.
While RT Cores are used, they definitely eat into power budget but the extent of it is unknown. I do have a feeling it is less than we would expect though, using DXR stuff in games like SoTR - that is otherwise heavy load and causes a bit lower clocks than normal due to power limit - does not decrease the clocks noticeably. Unfortunately, I don't think anyone (besides Nvidia) has even a good idea for how to test this. RT Cores, while separate units, always have the rest of the chip feeding data into them effectively being just another ALU in CUDA core.
 
Joined
Sep 17, 2014
Messages
22,417 (6.03/day)
Location
The Washing Machine
Processor 7800X3D
Motherboard MSI MAG Mortar b650m wifi
Cooling Thermalright Peerless Assassin
Memory 32GB Corsair Vengeance 30CL6000
Video Card(s) ASRock RX7900XT Phantom Gaming
Storage Lexar NM790 4TB + Samsung 850 EVO 1TB + Samsung 980 1TB + Crucial BX100 250GB
Display(s) Gigabyte G34QWC (3440x1440)
Case Lian Li A3 mATX White
Audio Device(s) Harman Kardon AVR137 + 2.1
Power Supply EVGA Supernova G2 750W
Mouse Steelseries Aerox 5
Keyboard Lenovo Thinkpad Trackpoint II
Software W11 IoT Enterprise LTSC
Benchmark Scores Over 9000
It only amounts to 10% die increase

People forget that the shader itself and the L2 cache was also expanded a bit. Turing always contains part of the RTX logic even in the 1660ti.

This news article says nothing against the claim that about 17-20% of the die is needed for RTRT with Turing which, given the die schematic still looks plausible to me. All things considered the architecture does not perform much better (per watt) than Pascal in regular loads, its hit or miss in that sense. The real comparison here is die size vs absolute performance on non-RT workloads for Pascal versus Turing. The rest only serves to make matters complicated for no benefit.

I'm more interested in power that rtx logic requires.

Makes two (and probably many more). Its a pretty complicated test I think, but I think the best way to get a handle on it, is to put Turing RTX and non-RTX next to Pascal and test at fixed FPS all cards can manage, then measure power consumption. That is with the assumption that we accept Pascal and Turing to have a ballpark equal perf/watt figure; though you could probably apply a formula for any deviation as well; the problem here is that its not linear per game.
 
Last edited:
Joined
Feb 3, 2017
Messages
3,746 (1.32/day)
Processor Ryzen 7800X3D
Motherboard ROG STRIX B650E-F GAMING WIFI
Memory 2x16GB G.Skill Flare X5 DDR5-6000 CL36 (F5-6000J3636F16GX2-FX5)
Video Card(s) INNO3D GeForce RTX™ 4070 Ti SUPER TWIN X2
Storage 2TB Samsung 980 PRO, 4TB WD Black SN850X
Display(s) 42" LG C2 OLED, 27" ASUS PG279Q
Case Thermaltake Core P5
Power Supply Fractal Design Ion+ Platinum 760W
Mouse Corsair Dark Core RGB Pro SE
Keyboard Corsair K100 RGB
VR HMD HTC Vive Cosmos
The comparison is made based on TU106 vs TU116 - focused on RT Cores and Tensor cores. The shader changes from Pascal (to Volta) to Turing are not being looked at. This is where a considerable amount of additional transistors went.
 

bug

Joined
May 22, 2015
Messages
13,747 (3.96/day)
Processor Intel i5-12600k
Motherboard Asus H670 TUF
Cooling Arctic Freezer 34
Memory 2x16GB DDR4 3600 G.Skill Ripjaws V
Video Card(s) EVGA GTX 1060 SC
Storage 500GB Samsung 970 EVO, 500GB Samsung 850 EVO, 1TB Crucial MX300 and 2TB Crucial MX500
Display(s) Dell U3219Q + HP ZR24w
Case Raijintek Thetis
Audio Device(s) Audioquest Dragonfly Red :D
Power Supply Seasonic 620W M12
Mouse Logitech G502 Proteus Core
Keyboard G.Skill KM780R
Software Arch Linux + Win10
Makes two (and probably many more). Its a pretty complicated test I think, but I think the best way to get a handle on it, is to put Turing RTX and non-RTX next to Pascal and test at fixed FPS all cards can manage, then measure power consumption. That is with the assumption that we accept Pascal and Turing to have a ballpark equal perf/watt figure; though you could probably apply a formula for any deviation as well; the problem here is that its not linear per game.
I think this one is tricky to measure (but I'd still like to know). Turn of RTX and the card will draw more frames. Draw more frames, the power draw goes up :(
You can mitigate that by locking the FPS to a set value, but then you're not stressing the hardware enough :(
 
Joined
Feb 3, 2017
Messages
3,746 (1.32/day)
Processor Ryzen 7800X3D
Motherboard ROG STRIX B650E-F GAMING WIFI
Memory 2x16GB G.Skill Flare X5 DDR5-6000 CL36 (F5-6000J3636F16GX2-FX5)
Video Card(s) INNO3D GeForce RTX™ 4070 Ti SUPER TWIN X2
Storage 2TB Samsung 980 PRO, 4TB WD Black SN850X
Display(s) 42" LG C2 OLED, 27" ASUS PG279Q
Case Thermaltake Core P5
Power Supply Fractal Design Ion+ Platinum 760W
Mouse Corsair Dark Core RGB Pro SE
Keyboard Corsair K100 RGB
VR HMD HTC Vive Cosmos
Makes two (and probably many more). Its a pretty complicated test I think, but I think the best way to get a handle on it, is to put Turing RTX and non-RTX next to Pascal and test at fixed FPS all cards can manage, then measure power consumption. That is with the assumption that we accept Pascal and Turing to have a ballpark equal perf/watt figure; though you could probably apply a formula for any deviation as well; the problem here is that its not linear per game.
It is not that simple. There are no cards that are directly comparable in terms of resources.
- RTX2070 (2304:144:64 and 8GB GDDR6 on 256-bit bus) vs GTX1070Ti (2432:125:64 and 8GB GDDR5 on 256-bit bus) is the closest comparison we can make. And even here, shaders, TMUs and memory type are different and we cannot make an exact correction for it. Memory we could account for roughly but more shaders and less TMUs on GTX is tough.
- RTX2080 (2944:184:64 and 8GB GDDR6 on 256-bit bus) vs GTX1080Ti (3584:224:88 and 11GB GDDR5X on 352-bit bus) is definitely closer in performance at stock but discrepancy in individual resources is larger, mostly due to larger memory controller along with associated ROPs.

The other question is which games/tests to run. Anything that is able to utilize new features in Turing either inherently (some concurrent INT+FP) or with dev support (RPM) will do better and is likely to better justify the additional cost.

Tom's Hardware Germany did try running RTX2080Ti at lower power limits and comparison in Metro Last Light: Redux. It ties to GTX 1080Ti (2GHz and 280W) at around 160W.
https://www.tomshw.de/2019/04/04/nv...effizienz-test-von-140-bis-340-watt-igorslab/
However, this is a very flawed comparison as RTX2080Ti has 21% more shaders and TMUs along with 27% more memory bandwidth. This guy overclocked the memory on 2080Ti from 1750MHz to 2150MHz making the memory bandwidth difference 56%. Lowering the power limit lowers the core clock (slightly above 1GHz at 160W) but does not reduce memory bandwidth.

Edit: Actually, in that Tom's Hardware Germany comparison, RTX2080Ti runs at roughly 2GHz at 340W. Considering that it has 21% more shaders than GTX 1080Ti we can roughly calculate the power consumption for comparison - 340W / 1.21 = 280.1W which is very close to GTX1080Ti's 280W number. This means shaders are consuming roughly the same amount of power. At the same time, performance is up 47% in average FPS and 57% in min FPS. Turing does appear to be more efficient even in old games but not by very much.
 
Last edited:

ppn

Joined
Aug 18, 2015
Messages
1,231 (0.36/day)
445/284 sq.mm is 156% increase. and 2070 is 150% over 1660Ti, so clearly. RT core only takes about 4% of space. Not much.
It would be nice to mark the different functional parts on the infrared photo. I think the RT is still there, in no way RT can be 4%.
 
Top