• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.
  • The forums have been upgraded with support for dark mode. By default it will follow the setting on your system/browser. You may override it by scrolling to the end of the page and clicking the gears icon.

Ampere v/s Turing

Joined
Dec 17, 2011
Messages
364 (0.07/day)
So I was bored and I was looking through the GPU database (i am weird) and look what I found - Turing TU104 (used in 2070 Super, 2080 and 2080 Super) has a transistor count of 13.6 billion. GA106 (used in 3060) has a transistor count of 13.25 million. My! How similar!

So I thought, you know what. Let's do a comparison. On the Ampere corner we have RTX 3060 (duh!) and on the Turing corner we need a TU104 disabled part like the RTX 3060 is for GA104. So I went with RTX 2070 Super. Now the RTX 2070 Super utilizes 2560/3072 = 83% of TU104 shaders compared to RTX 3060's 3584/3840 = 93% but TU104 has more transistors too (3% more) so I thought it should be good. Anyways, here are the numbers!

MetricRTX 2070 SuperRTX 30602070 Super advantage over 3060
Pixel Fill Rate113.3 GigaPixels/sec85.30 GigaPixels/sec+33%
Texture Fill Rate283.2 GigaTexels/sec199.0 GigaTexels/sec+42%
Half Precision (FP16) FLOPs18.12 TFLOPs12.74 TFLOPs+42%
Full Precision (FP32) FLOPs9.06 TFLOPs12.74 TFLOPs-29% (or 3060 has +40%)
Double Precision (FP64) FLOPs283.2 GFLOPs199.0 GFLOPs+42%
Memory bandwidth448 GB/sec360 GB/sec+24%
RT cores (thanks cvaldes)4028 (but 2x faster)-29% (or 3060 has +40%)

So... RTX 3060 seems to have 40% more FP32 TFLOPs and RT performance but 2070 Super has everything else 40% more plus 24% more memory bandwidth. Of course, this doesn't take into account architectural efficiencies/inefficiencies. But it makes you wonder how such a drastic rebalancing changes the gaming performance.

On RT cores - Anandtech says
The ray tracing (RT) cores have also been beefed up (for Ampere) ...... the individual RT cores are said to be up to 2x faster, with NVIDIA specifically quoting ray/triangle intersection performance.

MetricRTX 2070 SuperRTX 30602070 Super advantage over 3060
Average FPS - 1080p124.5113.9+9%
Average FPS - 4k53.447.8+12%
Average FPS - 1080p RT
(mean data of below)
81.3779.35+2%
Control - 1080p RT46.845.3+3%
Control - 4k RT14.713.4+10%
Cyberpunk - 1080p RT31.531.4-
Cyberpunk - 4k RT9.49.3-
Doom Eternal - 1080p RT134.2128.1+5%
Doom Eternal - 4k RT14.654.2N/A as 2070 Super runs into VRAM limit
F1 2021 - 1080p RT127.6124.1+3%
F1 2021 - 4k RT43.741.7+5%
Far Cry 6 - 1080p RT75.276.8-2%
Far Cry 6 - 4k RT3735+6%
Metro Exodus - 1080p RT72.970.4+3%
Metro Exodus - 4k RT26.622.1+20%

So despite the rebalancing it does look like 2070 Super is faster in non-RT games. 2070's advantage almost disappears with Ray Tracing turned on. What a weird but interesting result. I think the RTX 3060 could benefit a lot from greater Texture Fill Rate and maybe from more Pixel Fill Rate. I doubt the lower FP16 performance is harming the 3060. FP64 is irrelevant to gaming anyway.

I wonder what was Nvidia's rationale for focusing on FP32 + RT performance over everything else. What do you think?
 
Last edited:
The type of transistors matters.

2070 Super: 40 RT cores
3060: 28 RT cores
 
2070 Super: 40 RT cores
3060: 28 RT cores

AmpereRT.jpg


Ampere's RT cores are 2x faster than Turing's RT cores. Source - https://www.anandtech.com/show/1605...re-for-gaming-starting-with-rtx-3080-rtx-3090
 
Or you know, they aren't and that's just marketing. You seem to be finding that.
I don't think they are lying about their RT cores performance. RTX 2070 Super has a 9% lead in 1080p non-RT games but that shrinks to a 2% lead in RT games. Turing's RT performance isn't as as good as Ampere's.
 
I don't think they are lying about their RT cores performance. RTX 2070 Super has a 9% lead in 1080p non-RT games but that shrinks to a 2% lead in RT games. Turing's RT performance isn't as as good as Ampere's.
as a 3060 owner, i want to believe but objectively (or maybe not) i don't trust slides w/nvidia watermark.

more of your OP (first post. not #3/second post.)

but thanks for taking the time. :)
 
Interesting comparison indeed - despite 2070S small lead in most cases one needs to consider that 3060 is a smaller chip (smaller node) and thusly more economic to run with almost all the peformance of 2070S and thus preferred.
 
So I was bored and I was looking through the GPU database (i am weird) and look what I found - Turing TU104 (used in 2070 Super, 2080 and 2080 Super) has a transistor count of 13.6 billion. GA106 (used in 3060) has a transistor count of 13.25 million. My! How similar!

I wonder what was Nvidia's rationale for focusing on FP32 + RT performance over everything else. What do you think?

My guess is Ampere SM, RT and Tensore cores are redesigned in order to squeeze more transistors together = save die space which is the only thing chip makers care.

With TSMC 7nm:
GA100: 65.6M/mm2
RDNA2: 51.5M/mm2

Transistors don't cost money, die size does :D
 
Interesting comparison indeed - despite 2070S small lead in most cases one needs to consider that 3060 is a smaller chip (smaller node) and thusly more economic to run with almost all the peformance of 2070S and thus preferred.
agreed. my intention was to do a performance per transistor comparison.

With TSMC 7nm:
GA100: 65.6M/mm2
RDNA2: 51.5M/mm2

RDNA2 runs at much faster (+40%) frequency though. That might necessitate less dense, higher frequency transistors.
 
I think architecture advances based on predictions, the sacrifices in Ampere are apparently the right ones as they allow other areas to move forward, specifically RT perf while not making a big sacrifice in net performance. And still taking advantage of a shrink. Turing was still testing those waters really.
 
RDNA2 runs at much faster (+40%) frequency though. That might necessitate less dense, higher frequency transistors.

Not necessarily, RDNA1 has transitors density of 41M/mm2 while frequency is ~2100mhz.
RDNA2 also has Infinity Cache that help squeezing in more transistors (L3 cache is 3-5x more dense than compute cores).

So yeah predicting performance base on transistors count and frequency is only academic, maybe avg FPS/die size is a more meaningful metric :D, 3060 is like 1/2 the die size of 2070 Super
 
Last edited:
You didnt add memory capacity to comparison. So hard for me to give an opinion.
 
I wonder what was Nvidia's rationale for focusing on FP32 + RT performance over everything else. What do you think?
It's hard to explain here without going into great detail and write entire pages but the gist of it is that Nvidia chose to make a faster SM overall but also one that is less efficient per computational resource. That's why the 2070 which has a lot more SMs wins despite the fact that the 3060 seemingly has a much bigger computational advantage.
 
Interesting comparison. I've got another one from the TPU database:

GTX 1080: 180 W TDP, 100% average performance,
RTX 2070: 175 W TDP, 116% average performance,
RTX 3060: 170 W TDP, 119% average performance.

My conclusion is that performance per power consumption has increased by 26% since Pascal (2 generational gaps).

A bonus feature:

The GTX 980 Ti has a TDP of 250 W, and 76% of the performance of the 1080. That's a performance per power increase of 82% within a single generational gap.

Something really went wrong in modern GPU design.

It's hard to explain here without going into great detail and write entire pages but the gist of it is that Nvidia chose to make a faster SM overall but also one that is less efficient per computational resource. That's why the 2070 which has a lot more SMs wins despite the fact that the 3060 seemingly has a much bigger computational advantage.
My gist is that nvidia changed the definition of cuda cores with Ampere. Before Ampere, a full INT32 core counted as a cuda core. With Ampere, half of the FP32 cores can also do INT32 operations, so they also count as cuda cores, despite the fact that they may be busy with FP32 operations half of the time.

Edited: TLDR: 3072 Ampere cores equal to somewhere between 3072 and 6144 Turing cores. Where exactly depends on the situation.
 
Last edited:
3072 Ampere cores equal to somewhere between 3072 and 6144 Turing cores.
If I am I understanding you correctly, don't you mean the opposite? That 6144 Ampere cores equal to somewhere between 3072 and 6144 Turing cores?

Or to put it simply, Ampere has an inflated core count compared to Turing?

Ninja Edit - Found some images. Now I get what you mean.
TURING-SM.jpg
NVIDIA-Ampere-GPU-SM-Block-Diagram.png



So the RTX 2070 has 2560 FP32 cores or 2560 CUDA cores at any given time. But the RTX 3060 can have anywhere between 1792 and 3584 FP32 cores available depending on the task (because half of them do double duty as INT32 cores and can be unavailable) and are advertised as 3584 CUDA cores.
 
Last edited:
So the RTX 2070 has 2304 FP32 cores or 2304 CUDA cores at any given time. But the RTX 3060 can have anywhere between 1792 and 3584 FP32 cores available depending on the task (because half of them do double duty as INT32 cores and can be unavailable) and are advertised as 3584 CUDA cores.
Exactly (with the minor correction). Not to mention that in addition to the 2304 full FP32 cores, the 2070 also has the same number of INT32 cores, while the 3060 shares half of its cores between INT32 and FP32 tasks. So it can have either 1792 INT32 and 1792 FP32 cores, or 3584 PF32 cores with no INT32. If you take the former situation as an example (a 50/50 split between INT/FP), then the 2070 really has 2x2304=4608 cores. Though the truth isn't that extreme, it's always somewhere in between, and that is why the two cards offer relatively similar performance in real life, despite having a massively different number of cuda cores on paper.

This is why I'm mad at nvidia with their cuda core naming convention change in Ampere. It's a good way to trick gamers into believing that it's a massively superior architecture compared to Turing when real world performance data show otherwise. In my eyes, it's only a mild refresh at best.
 
Last edited:
This is why I'm mad at nvidia with their cuda core naming convention change in Ampere. It's a good way to trick gamers into believing that it's a massively superior architecture compared to Turing when real world performance data show otherwise.
It also explains why AMD's 5120 core 6900XT is able to go toe to toe with Nvidia's 10240 core RTX 3080 Ti. Nvidia has the advantage of having effectively more than 5120 but less than 10240 FP32 cores and AMD has the advantage of running its 5120 FP32 cores at a greater clock speed so it evens out.
 
It also explains why AMD's 5120 core 6900XT is able to go toe to toe with Nvidia's 10240 core RTX 3080 Ti. Nvidia has the advantage of having effectively more than 5120 but less than 10240 FP32 cores and AMD has the advantage of running its 5120 FP32 cores at a greater clock speed so it evens out.
Yes, though I think AMD runs either FP or INT on essentially any core at any time, so that's an even more complicated story, not to mention the huge cache advantage there.

By the way, I've just read back that you were originally comparing the 3060 to the 2070 Super, which really has 2560 cores. My bad. :ohwell:
 
By the way, I've just read back that you were originally comparing the 3060 to the 2070 Super, which really has 2560 cores. My bad
Haha no that's on me for not being clear in my post.

Yes, though I think AMD runs either FP or INT on essentially any core at any time, so that's an even more complicated story.
Are you saying that while Nvidia has 10240 active cores on its 3080 Ti (running any combination of FP + INT), AMD has only 5120 cores active in 1 cycle?
 
NVidia has two different implementation of "Cuda Cores"


for example in a 3080 Ti with 10240 Cores:

it has 5120 traditional FP/INT Cores and 5120 Pascal Like FP OR INT Cores.
the Die actually has 10240 FPUs but only 5120 are the "proper ones" and the other ones are a cluster that either does INT in one cycle or FP32 in another one (Per SM)
 
Have you tried using a dedicated PhysX GPU on Metro Exodus (Enhanced Edition) Since it has Real time Raytracing added to it?
I'm wondering if this game still supports PhysX along with having the RT with it. I'd like to see some test done with it on vs with it off.
 
this is pretty interesting topic, i love it
 
Back
Top