• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Ampere v/s Turing

Joined
Dec 17, 2011
Messages
359 (0.08/day)
So I was bored and I was looking through the GPU database (i am weird) and look what I found - Turing TU104 (used in 2070 Super, 2080 and 2080 Super) has a transistor count of 13.6 billion. GA106 (used in 3060) has a transistor count of 13.25 million. My! How similar!

So I thought, you know what. Let's do a comparison. On the Ampere corner we have RTX 3060 (duh!) and on the Turing corner we need a TU104 disabled part like the RTX 3060 is for GA104. So I went with RTX 2070 Super. Now the RTX 2070 Super utilizes 2560/3072 = 83% of TU104 shaders compared to RTX 3060's 3584/3840 = 93% but TU104 has more transistors too (3% more) so I thought it should be good. Anyways, here are the numbers!

MetricRTX 2070 SuperRTX 30602070 Super advantage over 3060
Pixel Fill Rate113.3 GigaPixels/sec85.30 GigaPixels/sec+33%
Texture Fill Rate283.2 GigaTexels/sec199.0 GigaTexels/sec+42%
Half Precision (FP16) FLOPs18.12 TFLOPs12.74 TFLOPs+42%
Full Precision (FP32) FLOPs9.06 TFLOPs12.74 TFLOPs-29% (or 3060 has +40%)
Double Precision (FP64) FLOPs283.2 GFLOPs199.0 GFLOPs+42%
Memory bandwidth448 GB/sec360 GB/sec+24%
RT cores (thanks cvaldes)4028 (but 2x faster)-29% (or 3060 has +40%)

So... RTX 3060 seems to have 40% more FP32 TFLOPs and RT performance but 2070 Super has everything else 40% more plus 24% more memory bandwidth. Of course, this doesn't take into account architectural efficiencies/inefficiencies. But it makes you wonder how such a drastic rebalancing changes the gaming performance.

On RT cores - Anandtech says
The ray tracing (RT) cores have also been beefed up (for Ampere) ...... the individual RT cores are said to be up to 2x faster, with NVIDIA specifically quoting ray/triangle intersection performance.

MetricRTX 2070 SuperRTX 30602070 Super advantage over 3060
Average FPS - 1080p124.5113.9+9%
Average FPS - 4k53.447.8+12%
Average FPS - 1080p RT
(mean data of below)
81.3779.35+2%
Control - 1080p RT46.845.3+3%
Control - 4k RT14.713.4+10%
Cyberpunk - 1080p RT31.531.4-
Cyberpunk - 4k RT9.49.3-
Doom Eternal - 1080p RT134.2128.1+5%
Doom Eternal - 4k RT14.654.2N/A as 2070 Super runs into VRAM limit
F1 2021 - 1080p RT127.6124.1+3%
F1 2021 - 4k RT43.741.7+5%
Far Cry 6 - 1080p RT75.276.8-2%
Far Cry 6 - 4k RT3735+6%
Metro Exodus - 1080p RT72.970.4+3%
Metro Exodus - 4k RT26.622.1+20%

So despite the rebalancing it does look like 2070 Super is faster in non-RT games. 2070's advantage almost disappears with Ray Tracing turned on. What a weird but interesting result. I think the RTX 3060 could benefit a lot from greater Texture Fill Rate and maybe from more Pixel Fill Rate. I doubt the lower FP16 performance is harming the 3060. FP64 is irrelevant to gaming anyway.

I wonder what was Nvidia's rationale for focusing on FP32 + RT performance over everything else. What do you think?
 
Last edited:
Joined
Jun 21, 2021
Messages
3,100 (2.49/day)
System Name daily driver Mac mini M2 Pro
Processor Apple proprietary M2 Pro (6 p-cores, 4 e-cores)
Motherboard Apple proprietary
Cooling Apple proprietary
Memory Apple proprietary 16GB LPDDR5 unified memory
Video Card(s) Apple proprietary M2 Pro (16-core GPU)
Storage Apple proprietary onboard 512GB SSD + various external HDDs
Display(s) LG UltraFine 27UL850W (4K@60Hz IPS)
Case Apple proprietary
Audio Device(s) Apple proprietary
Power Supply Apple proprietary
Mouse Apple Magic Trackpad 2
Keyboard Keychron K1 tenkeyless (Gateron Reds)
VR HMD Oculus Rift S (hosted on a different PC)
Software macOS Sonoma 14.7
Benchmark Scores (My Windows daily driver is a Beelink Mini S12 Pro. I'm not interested in benchmarking.)
The type of transistors matters.

2070 Super: 40 RT cores
3060: 28 RT cores
 
Joined
Dec 17, 2011
Messages
359 (0.08/day)
2070 Super: 40 RT cores
3060: 28 RT cores



Ampere's RT cores are 2x faster than Turing's RT cores. Source - https://www.anandtech.com/show/1605...re-for-gaming-starting-with-rtx-3080-rtx-3090
 
Joined
Aug 20, 2007
Messages
21,434 (3.40/day)
System Name Pioneer
Processor Ryzen R9 9950X
Motherboard GIGABYTE Aorus Elite X670 AX
Cooling Noctua NH-D15 + A whole lotta Sunon and Corsair Maglev blower fans...
Memory 64GB (4x 16GB) G.Skill Flare X5 @ DDR5-6000 CL30
Video Card(s) XFX RX 7900 XTX Speedster Merc 310
Storage Intel 905p Optane 960GB boot, +2x Crucial P5 Plus 2TB PCIe 4.0 NVMe SSDs
Display(s) 55" LG 55" B9 OLED 4K Display
Case Thermaltake Core X31
Audio Device(s) TOSLINK->Schiit Modi MB->Asgard 2 DAC Amp->AKG Pro K712 Headphones or HDMI->B9 OLED
Power Supply FSP Hydro Ti Pro 850W
Mouse Logitech G305 Lightspeed Wireless
Keyboard WASD Code v3 with Cherry Green keyswitches + PBT DS keycaps
Software Gentoo Linux x64 / Windows 11 Enterprise IoT 2024
Joined
Dec 17, 2011
Messages
359 (0.08/day)
Or you know, they aren't and that's just marketing. You seem to be finding that.
I don't think they are lying about their RT cores performance. RTX 2070 Super has a 9% lead in 1080p non-RT games but that shrinks to a 2% lead in RT games. Turing's RT performance isn't as as good as Ampere's.
 
Joined
Jan 27, 2015
Messages
1,065 (0.30/day)
System Name loon v4.0
Processor i7-11700K
Motherboard asus Z590TUF+wifi
Cooling Custom Loop
Memory ballistix 3600 cl16
Video Card(s) eVga 3060 xc
Storage WD sn570 1tb(nvme) SanDisk ultra 2tb(sata)
Display(s) cheap 1080&4K 60hz
Case Roswell Stryker
Power Supply eVGA supernova 750 G6
Mouse eats cheese
Keyboard warrior!
Benchmark Scores https://www.3dmark.com/spy/21765182 https://www.3dmark.com/pr/1114767
I don't think they are lying about their RT cores performance. RTX 2070 Super has a 9% lead in 1080p non-RT games but that shrinks to a 2% lead in RT games. Turing's RT performance isn't as as good as Ampere's.
as a 3060 owner, i want to believe but objectively (or maybe not) i don't trust slides w/nvidia watermark.

more of your OP (first post. not #3/second post.)

but thanks for taking the time. :)
 
Joined
May 20, 2020
Messages
1,370 (0.83/day)
Interesting comparison indeed - despite 2070S small lead in most cases one needs to consider that 3060 is a smaller chip (smaller node) and thusly more economic to run with almost all the peformance of 2070S and thus preferred.
 
Joined
Nov 11, 2016
Messages
3,398 (1.16/day)
System Name The de-ploughminator Mk-III
Processor 9800X3D
Motherboard Gigabyte X870E Aorus Master
Cooling DeepCool AK620
Memory 2x32GB G.SKill 6400MT Cas32
Video Card(s) Asus RTX4090 TUF
Storage 4TB Samsung 990 Pro
Display(s) 48" LG OLED C4
Case Corsair 5000D Air
Audio Device(s) KEF LSX II LT speakers + KEF KC62 Subwoofer
Power Supply Corsair HX850
Mouse Razor Death Adder v3
Keyboard Razor Huntsman V3 Pro TKL
Software win11
So I was bored and I was looking through the GPU database (i am weird) and look what I found - Turing TU104 (used in 2070 Super, 2080 and 2080 Super) has a transistor count of 13.6 billion. GA106 (used in 3060) has a transistor count of 13.25 million. My! How similar!

I wonder what was Nvidia's rationale for focusing on FP32 + RT performance over everything else. What do you think?

My guess is Ampere SM, RT and Tensore cores are redesigned in order to squeeze more transistors together = save die space which is the only thing chip makers care.

With TSMC 7nm:
GA100: 65.6M/mm2
RDNA2: 51.5M/mm2

Transistors don't cost money, die size does :D
 
Joined
Dec 17, 2011
Messages
359 (0.08/day)
Interesting comparison indeed - despite 2070S small lead in most cases one needs to consider that 3060 is a smaller chip (smaller node) and thusly more economic to run with almost all the peformance of 2070S and thus preferred.
agreed. my intention was to do a performance per transistor comparison.

With TSMC 7nm:
GA100: 65.6M/mm2
RDNA2: 51.5M/mm2

RDNA2 runs at much faster (+40%) frequency though. That might necessitate less dense, higher frequency transistors.
 
Joined
Sep 17, 2014
Messages
22,403 (6.03/day)
Location
The Washing Machine
Processor 7800X3D
Motherboard MSI MAG Mortar b650m wifi
Cooling Thermalright Peerless Assassin
Memory 32GB Corsair Vengeance 30CL6000
Video Card(s) ASRock RX7900XT Phantom Gaming
Storage Lexar NM790 4TB + Samsung 850 EVO 1TB + Samsung 980 1TB + Crucial BX100 250GB
Display(s) Gigabyte G34QWC (3440x1440)
Case Lian Li A3 mATX White
Audio Device(s) Harman Kardon AVR137 + 2.1
Power Supply EVGA Supernova G2 750W
Mouse Steelseries Aerox 5
Keyboard Lenovo Thinkpad Trackpoint II
Software W11 IoT Enterprise LTSC
Benchmark Scores Over 9000
I think architecture advances based on predictions, the sacrifices in Ampere are apparently the right ones as they allow other areas to move forward, specifically RT perf while not making a big sacrifice in net performance. And still taking advantage of a shrink. Turing was still testing those waters really.
 
Joined
Nov 11, 2016
Messages
3,398 (1.16/day)
System Name The de-ploughminator Mk-III
Processor 9800X3D
Motherboard Gigabyte X870E Aorus Master
Cooling DeepCool AK620
Memory 2x32GB G.SKill 6400MT Cas32
Video Card(s) Asus RTX4090 TUF
Storage 4TB Samsung 990 Pro
Display(s) 48" LG OLED C4
Case Corsair 5000D Air
Audio Device(s) KEF LSX II LT speakers + KEF KC62 Subwoofer
Power Supply Corsair HX850
Mouse Razor Death Adder v3
Keyboard Razor Huntsman V3 Pro TKL
Software win11
RDNA2 runs at much faster (+40%) frequency though. That might necessitate less dense, higher frequency transistors.

Not necessarily, RDNA1 has transitors density of 41M/mm2 while frequency is ~2100mhz.
RDNA2 also has Infinity Cache that help squeezing in more transistors (L3 cache is 3-5x more dense than compute cores).

So yeah predicting performance base on transistors count and frequency is only academic, maybe avg FPS/die size is a more meaningful metric :D, 3060 is like 1/2 the die size of 2070 Super
 
Last edited:
Joined
Feb 1, 2019
Messages
3,562 (1.68/day)
Location
UK, Midlands
System Name Main PC
Processor 13700k
Motherboard Asrock Z690 Steel Legend D4 - Bios 13.02
Cooling Noctua NH-D15S
Memory 32 Gig 3200CL14
Video Card(s) 4080 RTX SUPER FE 16G
Storage 1TB 980 PRO, 2TB SN850X, 2TB DC P4600, 1TB 860 EVO, 2x 3TB WD Red, 2x 4TB WD Red
Display(s) LG 27GL850
Case Fractal Define R4
Audio Device(s) Soundblaster AE-9
Power Supply Antec HCG 750 Gold
Software Windows 10 21H2 LTSC
You didnt add memory capacity to comparison. So hard for me to give an opinion.
 
Joined
Jan 8, 2017
Messages
9,424 (3.28/day)
System Name Good enough
Processor AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard ASRock B650 Pro RS
Cooling 2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory 32GB - FURY Beast RGB 5600 Mhz
Video Card(s) Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage 1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s) LG UltraGear 32GN650-B + 4K Samsung TV
Case Phanteks NV7
Power Supply GPS-750C
I wonder what was Nvidia's rationale for focusing on FP32 + RT performance over everything else. What do you think?
It's hard to explain here without going into great detail and write entire pages but the gist of it is that Nvidia chose to make a faster SM overall but also one that is less efficient per computational resource. That's why the 2070 which has a lot more SMs wins despite the fact that the 3060 seemingly has a much bigger computational advantage.
 
Joined
Jan 14, 2019
Messages
12,337 (5.78/day)
Location
Midlands, UK
System Name Nebulon B
Processor AMD Ryzen 7 7800X3D
Motherboard MSi PRO B650M-A WiFi
Cooling be quiet! Dark Rock 4
Memory 2x 24 GB Corsair Vengeance DDR5-4800
Video Card(s) AMD Radeon RX 6750 XT 12 GB
Storage 2 TB Corsair MP600 GS, 2 TB Corsair MP600 R2
Display(s) Dell S3422DWG, 7" Waveshare touchscreen
Case Kolink Citadel Mesh black
Audio Device(s) Logitech Z333 2.1 speakers, AKG Y50 headphones
Power Supply Seasonic Prime GX-750
Mouse Logitech MX Master 2S
Keyboard Logitech G413 SE
Software Bazzite (Fedora Linux) KDE
Interesting comparison. I've got another one from the TPU database:

GTX 1080: 180 W TDP, 100% average performance,
RTX 2070: 175 W TDP, 116% average performance,
RTX 3060: 170 W TDP, 119% average performance.

My conclusion is that performance per power consumption has increased by 26% since Pascal (2 generational gaps).

A bonus feature:

The GTX 980 Ti has a TDP of 250 W, and 76% of the performance of the 1080. That's a performance per power increase of 82% within a single generational gap.

Something really went wrong in modern GPU design.

It's hard to explain here without going into great detail and write entire pages but the gist of it is that Nvidia chose to make a faster SM overall but also one that is less efficient per computational resource. That's why the 2070 which has a lot more SMs wins despite the fact that the 3060 seemingly has a much bigger computational advantage.
My gist is that nvidia changed the definition of cuda cores with Ampere. Before Ampere, a full INT32 core counted as a cuda core. With Ampere, half of the FP32 cores can also do INT32 operations, so they also count as cuda cores, despite the fact that they may be busy with FP32 operations half of the time.

Edited: TLDR: 3072 Ampere cores equal to somewhere between 3072 and 6144 Turing cores. Where exactly depends on the situation.
 
Last edited:
Joined
Dec 17, 2011
Messages
359 (0.08/day)
3072 Ampere cores equal to somewhere between 3072 and 6144 Turing cores.
If I am I understanding you correctly, don't you mean the opposite? That 6144 Ampere cores equal to somewhere between 3072 and 6144 Turing cores?

Or to put it simply, Ampere has an inflated core count compared to Turing?

Ninja Edit - Found some images. Now I get what you mean.



So the RTX 2070 has 2560 FP32 cores or 2560 CUDA cores at any given time. But the RTX 3060 can have anywhere between 1792 and 3584 FP32 cores available depending on the task (because half of them do double duty as INT32 cores and can be unavailable) and are advertised as 3584 CUDA cores.
 
Last edited:
Joined
Jan 14, 2019
Messages
12,337 (5.78/day)
Location
Midlands, UK
System Name Nebulon B
Processor AMD Ryzen 7 7800X3D
Motherboard MSi PRO B650M-A WiFi
Cooling be quiet! Dark Rock 4
Memory 2x 24 GB Corsair Vengeance DDR5-4800
Video Card(s) AMD Radeon RX 6750 XT 12 GB
Storage 2 TB Corsair MP600 GS, 2 TB Corsair MP600 R2
Display(s) Dell S3422DWG, 7" Waveshare touchscreen
Case Kolink Citadel Mesh black
Audio Device(s) Logitech Z333 2.1 speakers, AKG Y50 headphones
Power Supply Seasonic Prime GX-750
Mouse Logitech MX Master 2S
Keyboard Logitech G413 SE
Software Bazzite (Fedora Linux) KDE
So the RTX 2070 has 2304 FP32 cores or 2304 CUDA cores at any given time. But the RTX 3060 can have anywhere between 1792 and 3584 FP32 cores available depending on the task (because half of them do double duty as INT32 cores and can be unavailable) and are advertised as 3584 CUDA cores.
Exactly (with the minor correction). Not to mention that in addition to the 2304 full FP32 cores, the 2070 also has the same number of INT32 cores, while the 3060 shares half of its cores between INT32 and FP32 tasks. So it can have either 1792 INT32 and 1792 FP32 cores, or 3584 PF32 cores with no INT32. If you take the former situation as an example (a 50/50 split between INT/FP), then the 2070 really has 2x2304=4608 cores. Though the truth isn't that extreme, it's always somewhere in between, and that is why the two cards offer relatively similar performance in real life, despite having a massively different number of cuda cores on paper.

This is why I'm mad at nvidia with their cuda core naming convention change in Ampere. It's a good way to trick gamers into believing that it's a massively superior architecture compared to Turing when real world performance data show otherwise. In my eyes, it's only a mild refresh at best.
 
Last edited:
Joined
Dec 17, 2011
Messages
359 (0.08/day)
This is why I'm mad at nvidia with their cuda core naming convention change in Ampere. It's a good way to trick gamers into believing that it's a massively superior architecture compared to Turing when real world performance data show otherwise.
It also explains why AMD's 5120 core 6900XT is able to go toe to toe with Nvidia's 10240 core RTX 3080 Ti. Nvidia has the advantage of having effectively more than 5120 but less than 10240 FP32 cores and AMD has the advantage of running its 5120 FP32 cores at a greater clock speed so it evens out.
 
Joined
Jan 14, 2019
Messages
12,337 (5.78/day)
Location
Midlands, UK
System Name Nebulon B
Processor AMD Ryzen 7 7800X3D
Motherboard MSi PRO B650M-A WiFi
Cooling be quiet! Dark Rock 4
Memory 2x 24 GB Corsair Vengeance DDR5-4800
Video Card(s) AMD Radeon RX 6750 XT 12 GB
Storage 2 TB Corsair MP600 GS, 2 TB Corsair MP600 R2
Display(s) Dell S3422DWG, 7" Waveshare touchscreen
Case Kolink Citadel Mesh black
Audio Device(s) Logitech Z333 2.1 speakers, AKG Y50 headphones
Power Supply Seasonic Prime GX-750
Mouse Logitech MX Master 2S
Keyboard Logitech G413 SE
Software Bazzite (Fedora Linux) KDE
It also explains why AMD's 5120 core 6900XT is able to go toe to toe with Nvidia's 10240 core RTX 3080 Ti. Nvidia has the advantage of having effectively more than 5120 but less than 10240 FP32 cores and AMD has the advantage of running its 5120 FP32 cores at a greater clock speed so it evens out.
Yes, though I think AMD runs either FP or INT on essentially any core at any time, so that's an even more complicated story, not to mention the huge cache advantage there.

By the way, I've just read back that you were originally comparing the 3060 to the 2070 Super, which really has 2560 cores. My bad. :ohwell:
 
Joined
Dec 17, 2011
Messages
359 (0.08/day)
By the way, I've just read back that you were originally comparing the 3060 to the 2070 Super, which really has 2560 cores. My bad
Haha no that's on me for not being clear in my post.

Yes, though I think AMD runs either FP or INT on essentially any core at any time, so that's an even more complicated story.
Are you saying that while Nvidia has 10240 active cores on its 3080 Ti (running any combination of FP + INT), AMD has only 5120 cores active in 1 cycle?
 
Joined
Feb 6, 2021
Messages
2,894 (2.10/day)
Location
Germany
Processor AMD Ryzen 7 7800X3D
Motherboard ASRock B650E Steel Legend Wifi
Cooling Arctic Liquid Freezer III 280
Memory 2x16GB Corsair Vengeance RGB 6000 CL30 (A-Die)
Video Card(s) RTX 4090 Gaming X Trio
Storage 1TB Samsung 990 PRO, 4TB Corsair MP600 PRO XT, 1TB WD SN850X, 4x4TB Crucial MX500
Display(s) Alienware AW2725DF, LG 27GR93U, LG 27GN950-B
Case Streacom BC1 V2 Black
Audio Device(s) Bose Companion Series 2 III, Sennheiser GSP600 and HD599 SE - Creative Soundblaster X4
Power Supply bequiet! Dark Power Pro 12 1500w Titanium
Mouse Razer Deathadder V3
Keyboard Razer Black Widow V3 TKL
VR HMD Oculus Rift S
Software ~2000 Video Games
NVidia has two different implementation of "Cuda Cores"


for example in a 3080 Ti with 10240 Cores:

it has 5120 traditional FP/INT Cores and 5120 Pascal Like FP OR INT Cores.
the Die actually has 10240 FPUs but only 5120 are the "proper ones" and the other ones are a cluster that either does INT in one cycle or FP32 in another one (Per SM)
 
Joined
Apr 30, 2020
Messages
985 (0.59/day)
System Name S.L.I + RTX research rig
Processor Ryzen 7 5800X 3D.
Motherboard MSI MEG ACE X570
Cooling Corsair H150i Cappellx
Memory Corsair Vengeance pro RGB 3200mhz 32Gbs
Video Card(s) 2x Dell RTX 2080 Ti in S.L.I
Storage Western digital Sata 6.0 SDD 500gb + fanxiang S660 4TB PCIe 4.0 NVMe M.2
Display(s) HP X24i
Case Corsair 7000D Airflow
Power Supply EVGA G+1600watts
Mouse Corsair Scimitar
Keyboard Cosair K55 Pro RGB
Have you tried using a dedicated PhysX GPU on Metro Exodus (Enhanced Edition) Since it has Real time Raytracing added to it?
I'm wondering if this game still supports PhysX along with having the RT with it. I'd like to see some test done with it on vs with it off.
 
Joined
Jan 6, 2016
Messages
69 (0.02/day)
Location
Algeria
System Name HyPerioN
Processor 5900X
Motherboard Asus X570 Strix-F
Cooling Arctic Freezer II 360 Rev 5
Memory 32GB 3000Mhz G.Skill Rijaw V
Video Card(s) GTX 2070
Storage Intel 760p+Samsung 860 EVO+2TB Toshiba 7200rpm HDD+2TB Seagate Barracuda
Display(s) LG UltraGear GL850
Case Asus Tuf G501
Power Supply EVGA 850W G6
Mouse Razer Viper Mini
Keyboard Corsair K70
this is pretty interesting topic, i love it
 
Top