Ampere v/s Turing

WhoDecidedThat · Jan 4, 2022

So I was bored and I was looking through the GPU database (i am weird) and look what I found - Turing TU104 (used in 2070 Super, 2080 and 2080 Super) has a transistor count of 13.6 billion. GA106 (used in 3060) has a transistor count of 13.25 million. My! How similar!

So I thought, you know what. Let's do a comparison. On the Ampere corner we have RTX 3060 (duh!) and on the Turing corner we need a TU104 disabled part like the RTX 3060 is for GA104. So I went with RTX 2070 Super. Now the RTX 2070 Super utilizes 2560/3072 = 83% of TU104 shaders compared to RTX 3060's 3584/3840 = 93% but TU104 has more transistors too (3% more) so I thought it should be good. Anyways, here are the numbers!

Metric	RTX 2070 Super	RTX 3060	2070 Super advantage over 3060
Pixel Fill Rate	113.3 GigaPixels/sec	85.30 GigaPixels/sec	+33%
Texture Fill Rate	283.2 GigaTexels/sec	199.0 GigaTexels/sec	+42%
Half Precision (FP16) FLOPs	18.12 TFLOPs	12.74 TFLOPs	+42%
Full Precision (FP32) FLOPs	9.06 TFLOPs	12.74 TFLOPs	-29% (or 3060 has +40%)
Double Precision (FP64) FLOPs	283.2 GFLOPs	199.0 GFLOPs	+42%
Memory bandwidth	448 GB/sec	360 GB/sec	+24%
RT cores (thanks cvaldes)	40	28 (but 2x faster)	-29% (or 3060 has +40%)

So... RTX 3060 seems to have 40% more FP32 TFLOPs and RT performance but 2070 Super has everything else 40% more plus 24% more memory bandwidth. Of course, this doesn't take into account architectural efficiencies/inefficiencies. But it makes you wonder how such a drastic rebalancing changes the gaming performance.

On RT cores - Anandtech says

The ray tracing (RT) cores have also been beefed up (for Ampere) ...... the individual RT cores are said to be up to 2x faster, with NVIDIA specifically quoting ray/triangle intersection performance.

Metric	RTX 2070 Super	RTX 3060	2070 Super advantage over 3060
Average FPS - 1080p	124.5	113.9	+9%
Average FPS - 4k	53.4	47.8	+12%
Average FPS - 1080p RT (mean data of below)	81.37	79.35	+2%
Control - 1080p RT	46.8	45.3	+3%
Control - 4k RT	14.7	13.4	+10%
Cyberpunk - 1080p RT	31.5	31.4	-
Cyberpunk - 4k RT	9.4	9.3	-
Doom Eternal - 1080p RT	134.2	128.1	+5%
Doom Eternal - 4k RT	14.6	54.2	N/A as 2070 Super runs into VRAM limit
F1 2021 - 1080p RT	127.6	124.1	+3%
F1 2021 - 4k RT	43.7	41.7	+5%
Far Cry 6 - 1080p RT	75.2	76.8	-2%
Far Cry 6 - 4k RT	37	35	+6%
Metro Exodus - 1080p RT	72.9	70.4	+3%
Metro Exodus - 4k RT	26.6	22.1	+20%

So despite the rebalancing it does look like 2070 Super is faster in non-RT games. 2070's advantage almost disappears with Ray Tracing turned on. What a weird but interesting result. I think the RTX 3060 could benefit a lot from greater Texture Fill Rate and maybe from more Pixel Fill Rate. I doubt the lower FP16 performance is harming the 3060. FP64 is irrelevant to gaming anyway.

I wonder what was Nvidia's rationale for focusing on FP32 + RT performance over everything else. What do you think?

cvaldes · Jan 4, 2022

The type of transistors matters.

2070 Super: 40 RT cores
3060: 28 RT cores

WhoDecidedThat · Jan 4, 2022

cvaldes said:
2070 Super: 40 RT cores
3060: 28 RT cores

Ampere's RT cores are 2x faster than Turing's RT cores. Source - https://www.anandtech.com/show/1605...re-for-gaming-starting-with-rtx-3080-rtx-3090

R-T-B · Jan 4, 2022

blanarahul said:
Ampere's RT cores are 2x faster than Turing's RT cores. Source - https://www.anandtech.com/show/1605...re-for-gaming-starting-with-rtx-3080-rtx-3090

Or you know, they aren't and that's just marketing. You seem to be finding that.

WhoDecidedThat · Jan 4, 2022

R-T-B said:
Or you know, they aren't and that's just marketing. You seem to be finding that.

I don't think they are lying about their RT cores performance. RTX 2070 Super has a 9% lead in 1080p non-RT games but that shrinks to a 2% lead in RT games. Turing's RT performance isn't as as good as Ampere's.

looniam · Jan 4, 2022

blanarahul said:
I don't think they are lying about their RT cores performance. RTX 2070 Super has a 9% lead in 1080p non-RT games but that shrinks to a 2% lead in RT games. Turing's RT performance isn't as as good as Ampere's.

as a 3060 owner, i want to believe but objectively (or maybe not) i don't trust slides w/nvidia watermark.

more of your OP (first post. not #3/second post.)

but thanks for taking the time.

pavle · Jan 4, 2022

Interesting comparison indeed - despite 2070S small lead in most cases one needs to consider that 3060 is a smaller chip (smaller node) and thusly more economic to run with almost all the peformance of 2070S and thus preferred.

nguyen · Jan 4, 2022

blanarahul said:
So I was bored and I was looking through the GPU database (i am weird) and look what I found - Turing TU104 (used in 2070 Super, 2080 and 2080 Super) has a transistor count of 13.6 billion. GA106 (used in 3060) has a transistor count of 13.25 million. My! How similar!

I wonder what was Nvidia's rationale for focusing on FP32 + RT performance over everything else. What do you think?

My guess is Ampere SM, RT and Tensore cores are redesigned in order to squeeze more transistors together = save die space which is the only thing chip makers care.

With TSMC 7nm:
GA100: 65.6M/mm2
RDNA2: 51.5M/mm2

Transistors don't cost money, die size does

WhoDecidedThat · Jan 4, 2022

pavle said:
Interesting comparison indeed - despite 2070S small lead in most cases one needs to consider that 3060 is a smaller chip (smaller node) and thusly more economic to run with almost all the peformance of 2070S and thus preferred.

agreed. my intention was to do a performance per transistor comparison.

nguyen said:
With TSMC 7nm:
GA100: 65.6M/mm2
RDNA2: 51.5M/mm2

RDNA2 runs at much faster (+40%) frequency though. That might necessitate less dense, higher frequency transistors.

Vayra86 · Jan 4, 2022

I think architecture advances based on predictions, the sacrifices in Ampere are apparently the right ones as they allow other areas to move forward, specifically RT perf while not making a big sacrifice in net performance. And still taking advantage of a shrink. Turing was still testing those waters really.

nguyen · Jan 4, 2022

blanarahul said:
RDNA2 runs at much faster (+40%) frequency though. That might necessitate less dense, higher frequency transistors.

Not necessarily, RDNA1 has transitors density of 41M/mm2 while frequency is ~2100mhz.
RDNA2 also has Infinity Cache that help squeezing in more transistors (L3 cache is 3-5x more dense than compute cores).

So yeah predicting performance base on transistors count and frequency is only academic, maybe avg FPS/die size is a more meaningful metric

, 3060 is like 1/2 the die size of 2070 Super

chrcoluk · Jan 4, 2022

You didnt add memory capacity to comparison. So hard for me to give an opinion.

Vya Domus · Jan 4, 2022

blanarahul said:
I wonder what was Nvidia's rationale for focusing on FP32 + RT performance over everything else. What do you think?

It's hard to explain here without going into great detail and write entire pages but the gist of it is that Nvidia chose to make a faster SM overall but also one that is less efficient per computational resource. That's why the 2070 which has a lot more SMs wins despite the fact that the 3060 seemingly has a much bigger computational advantage.

AusWolf · Jan 4, 2022

Interesting comparison. I've got another one from the TPU database:

GTX 1080: 180 W TDP, 100% average performance,
RTX 2070: 175 W TDP, 116% average performance,
RTX 3060: 170 W TDP, 119% average performance.

My conclusion is that performance per power consumption has increased by 26% since Pascal (2 generational gaps).

A bonus feature:

The GTX 980 Ti has a TDP of 250 W, and 76% of the performance of the 1080. That's a performance per power increase of 82% within a single generational gap.

Something really went wrong in modern GPU design.

Vya Domus said:
It's hard to explain here without going into great detail and write entire pages but the gist of it is that Nvidia chose to make a faster SM overall but also one that is less efficient per computational resource. That's why the 2070 which has a lot more SMs wins despite the fact that the 3060 seemingly has a much bigger computational advantage.

My gist is that nvidia changed the definition of cuda cores with Ampere. Before Ampere, a full INT32 core counted as a cuda core. With Ampere, half of the FP32 cores can also do INT32 operations, so they also count as cuda cores, despite the fact that they may be busy with FP32 operations half of the time.

Edited: TLDR: 3072 Ampere cores equal to somewhere between 3072 and 6144 Turing cores. Where exactly depends on the situation.

WhoDecidedThat · Jan 4, 2022

AusWolf said:
3072 Ampere cores equal to somewhere between 3072 and 6144 Turing cores.

If I am I understanding you correctly, don't you mean the opposite? That 6144 Ampere cores equal to somewhere between 3072 and 6144 Turing cores?

Or to put it simply, Ampere has an inflated core count compared to Turing?

Ninja Edit - Found some images. Now I get what you mean.

So the RTX 2070 has 2560 FP32 cores or 2560 CUDA cores at any given time. But the RTX 3060 can have anywhere between 1792 and 3584 FP32 cores available depending on the task (because half of them do double duty as INT32 cores and can be unavailable) and are advertised as 3584 CUDA cores.

AusWolf · Jan 4, 2022

blanarahul said:
So the RTX 2070 has 2304 FP32 cores or 2304 CUDA cores at any given time. But the RTX 3060 can have anywhere between 1792 and 3584 FP32 cores available depending on the task (because half of them do double duty as INT32 cores and can be unavailable) and are advertised as 3584 CUDA cores.

Exactly (with the minor correction). Not to mention that in addition to the 2304 full FP32 cores, the 2070 also has the same number of INT32 cores, while the 3060 shares half of its cores between INT32 and FP32 tasks. So it can have either 1792 INT32 and 1792 FP32 cores, or 3584 PF32 cores with no INT32. If you take the former situation as an example (a 50/50 split between INT/FP), then the 2070 really has 2x2304=4608 cores. Though the truth isn't that extreme, it's always somewhere in between, and that is why the two cards offer relatively similar performance in real life, despite having a massively different number of cuda cores on paper.

This is why I'm mad at nvidia with their cuda core naming convention change in Ampere. It's a good way to trick gamers into believing that it's a massively superior architecture compared to Turing when real world performance data show otherwise. In my eyes, it's only a mild refresh at best.

WhoDecidedThat · Jan 4, 2022

AusWolf said:
This is why I'm mad at nvidia with their cuda core naming convention change in Ampere. It's a good way to trick gamers into believing that it's a massively superior architecture compared to Turing when real world performance data show otherwise.

It also explains why AMD's 5120 core 6900XT is able to go toe to toe with Nvidia's 10240 core RTX 3080 Ti. Nvidia has the advantage of having effectively more than 5120 but less than 10240 FP32 cores and AMD has the advantage of running its 5120 FP32 cores at a greater clock speed so it evens out.

AusWolf · Jan 4, 2022

blanarahul said:
It also explains why AMD's 5120 core 6900XT is able to go toe to toe with Nvidia's 10240 core RTX 3080 Ti. Nvidia has the advantage of having effectively more than 5120 but less than 10240 FP32 cores and AMD has the advantage of running its 5120 FP32 cores at a greater clock speed so it evens out.

Yes, though I think AMD runs either FP or INT on essentially any core at any time, so that's an even more complicated story, not to mention the huge cache advantage there.

By the way, I've just read back that you were originally comparing the 3060 to the 2070 Super, which really has 2560 cores. My bad. :ohwell:

WhoDecidedThat · Jan 4, 2022

AusWolf said:
By the way, I've just read back that you were originally comparing the 3060 to the 2070 Super, which really has 2560 cores. My bad

Haha no that's on me for not being clear in my post.

Yes, though I think AMD runs either FP or INT on essentially any core at any time, so that's an even more complicated story.

Are you saying that while Nvidia has 10240 active cores on its 3080 Ti (running any combination of FP + INT), AMD has only 5120 cores active in 1 cycle?

GerKNG · Jan 4, 2022

NVidia has two different implementation of "Cuda Cores"

for example in a 3080 Ti with 10240 Cores:

it has 5120 traditional FP/INT Cores and 5120 Pascal Like FP OR INT Cores.
the Die actually has 10240 FPUs but only 5120 are the "proper ones" and the other ones are a cluster that either does INT in one cycle or FP32 in another one (Per SM)

DemonicRyzen666 · Jan 26, 2022

Have you tried using a dedicated PhysX GPU on Metro Exodus (Enhanced Edition) Since it has Real time Raytracing added to it?
I'm wondering if this game still supports PhysX along with having the RT with it. I'd like to see some test done with it on vs with it off.

Al Chafai · Apr 8, 2022

this is pretty interesting topic, i love it

System Name	daily driver Mac mini M2 Pro
Processor	Apple proprietary M2 Pro (6 p-cores, 4 e-cores)
Motherboard	Apple proprietary
Cooling	Apple proprietary
Memory	Apple proprietary 16GB LPDDR5 unified memory
Video Card(s)	Apple proprietary M2 Pro (16-core GPU)
Storage	Apple proprietary onboard 512GB SSD + various external HDDs
Display(s)	LG UltraFine 27UL850W (4K@60Hz IPS)
Case	Apple proprietary
Audio Device(s)	Apple proprietary
Power Supply	Apple proprietary
Mouse	Apple Magic Trackpad 2
Keyboard	Keychron K1 tenkeyless (Gateron Reds)
VR HMD	Oculus Rift S (hosted on a different PC)
Software	macOS Sonoma 14.7
Benchmark Scores	(My Windows daily driver is a Beelink Mini S12 Pro. I'm not interested in benchmarking.)

System Name	Pioneer
Processor	Ryzen 9 9950X
Motherboard	MSI MAG X670E Tomahawk Wifi
Cooling	Noctua NH-D15 + A whole lotta Sunon, Phanteks and Corsair Maglev blower fans...
Memory	64GB (2x 32GB) G.Skill Flare X5 @ DDR5-6200(Running 1T no GDM)
Video Card(s)	PNY RTX 5080 OC
Storage	Intel 5800X Optane 800GB boot, +2x Crucial P5 Plus 2TB PCIe 4.0 NVMe SSDs, 1x 2TB Seagate Exos 3.5"
Display(s)	55" LG 55" B9 OLED 4K Display
Case	Thermaltake Core X31
Audio Device(s)	TOSLINK->Schiit Modi MB->Asgard 2 DAC Amp->AKG Pro K712 Headphones or HDMI->B9 OLED
Power Supply	FSP Hydro Ti Pro 850W
Mouse	Logitech G305 Lightspeed Wireless
Keyboard	WASD Code v3 with Cherry Green keyswitches + PBT DS keycaps
Software	Gentoo Linux x64 / Windows 11 Enterprise (yes it's legit)

System Name	loon v4.0
Processor	i7-11700K
Motherboard	asus Z590TUF+wifi
Cooling	Custom Loop
Memory	ballistix 3600 cl16
Video Card(s)	eVga 3060 xc
Storage	WD sn570 1tb(nvme) SanDisk ultra 2tb(sata)
Display(s)	cheap 1080&4K 60hz
Case	Roswell Stryker
Power Supply	eVGA supernova 750 G6
Mouse	eats cheese
Keyboard	warrior!
Benchmark Scores	https://www.3dmark.com/spy/21765182 https://www.3dmark.com/pr/1114767

System Name	The de-ploughminator Mk-III
Processor	9800X3D
Motherboard	Gigabyte X870E Aorus Master
Cooling	DeepCool AK620
Memory	2x32GB G.SKill 6400MT Cas32
Video Card(s)	Asus Astral 5090 LC OC
Storage	4TB Samsung 990 Pro
Display(s)	48" LG OLED C4
Case	Corsair 5000D Air
Audio Device(s)	KEF LSX II LT speakers + KEF KC62 Subwoofer
Power Supply	Corsair HX1200
Mouse	Razor Death Adder v3
Keyboard	Razor Huntsman V3 Pro TKL
Software	win11

System Name	Tiny the White Yeti
Processor	7800X3D
Motherboard	MSI MAG Mortar b650m wifi
Cooling	CPU: Thermalright Peerless Assassin / Case: Phanteks T30-120 x3
Memory	32GB Corsair Vengeance 30CL6000
Video Card(s)	ASRock RX7900XT Phantom Gaming
Storage	Lexar NM790 4TB + Samsung 850 EVO 1TB + Samsung 980 1TB + Crucial BX100 250GB
Display(s)	Gigabyte G34QWC (3440x1440)
Case	Lian Li A3 mATX White
Audio Device(s)	Harman Kardon AVR137 + 2.1
Power Supply	EVGA Supernova G2 750W
Mouse	Steelseries Aerox 5
Keyboard	Lenovo Thinkpad Trackpoint II
VR HMD	HD 420 - Green Edition ;)
Software	W11 IoT Enterprise LTSC
Benchmark Scores	Over 9000

System Name	Main PC
Processor	13700k
Motherboard	Asrock Z690 Steel Legend D4 - Bios 13.02
Cooling	Noctua NH-D15S
Memory	32 Gig 3200CL14
Video Card(s)	4080 RTX SUPER FE 16G
Storage	1TB 980 PRO, 2TB SN850X, 2TB DC P4600, 1TB 860 EVO, 2x 3TB WD Red, 2x 4TB WD Red
Display(s)	LG 27GL850
Case	Fractal Define R4
Audio Device(s)	Soundblaster AE-9
Power Supply	Antec HCG 750 Gold
Software	Windows 10 21H2 LTSC

System Name	Good enough
Processor	AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard	ASRock B650 Pro RS
Cooling	2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory	32GB - FURY Beast RGB 5600 Mhz
Video Card(s)	Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage	1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s)	LG UltraGear 32GN650-B + 4K Samsung TV
Case	Phanteks NV7
Power Supply	GPS-750C

System Name	My second and third PCs are Intel + Nvidia
Processor	AMD Ryzen 7 7800X3D @ 45 W TDP Eco Mode
Motherboard	MSi Pro B650M-A Wifi
Cooling	Noctua NH-D9L chromax.black
Memory	2x 24 GB Corsair Vengeance DDR5-6000 CL36
Video Card(s)	PowerColor Reaper Radeon RX 9070 XT
Storage	2 TB Corsair MP600 GS, 4 TB Seagate Barracuda
Display(s)	Dell S3422DWG 34" 1440 UW 144 Hz
Case	Corsair Crystal 280X
Audio Device(s)	Logitech Z333 2.1 speakers, AKG Y50 headphones
Power Supply	750 W Seasonic Prime GX
Mouse	Logitech MX Master 2S
Keyboard	Logitech G413 SE
Software	Bazzite (Fedora Linux) KDE Plasma

Processor	AMD Ryzen 9 9950X3D
Motherboard	ASRock B850M PRO-A
Cooling	Corsair Nautilus 360 RS
Memory	2x32GB Kingston Fury Beast 6000 CL30
Video Card(s)	PowerColor Hellhound RX 9070 XT
Storage	1TB Samsung 990 Pro, 2TB Samsung 990 Pro, 4TB Samsung 990 Pro
Display(s)	LG 27GS95QE-B, MSI G272QPF E2
Case	Lian Li DAN Case A3 Black Wood Edition
Audio Device(s)	Bose Companion Series 2 III, Sennheiser GSP600 and HD599 SE - Creative Soundblaster X4
Power Supply	Corsair RM1000X ATX 3.1
Mouse	Razer Deathadder V3
Keyboard	Razer Black Widow V3 TKL
VR HMD	Oculus Rift S

System Name	S.L.I + RTX research rig
Processor	Ryzen 7 5800X 3D.
Motherboard	MSI MEG ACE X570
Cooling	Corsair H150i Cappellx
Memory	Corsair Vengeance pro RGB 3200mhz 32Gbs
Video Card(s)	2x Dell RTX 2080 Ti in S.L.I
Storage	Western digital Sata 6.0 SDD 500gb + fanxiang S660 4TB PCIe 4.0 NVMe M.2
Display(s)	HP X24i
Case	Corsair 7000D Airflow
Power Supply	EVGA G+1600watts
Mouse	Corsair Scimitar
Keyboard	Cosair K55 Pro RGB

System Name	HyPerioN
Processor	5900X
Motherboard	Asus X570 Strix-F
Cooling	Arctic Freezer II 360 Rev 5
Memory	32GB 3000Mhz G.Skill Rijaw V
Video Card(s)	GTX 2070
Storage	Intel 760p+Samsung 860 EVO+2TB Toshiba 7200rpm HDD+2TB Seagate Barracuda
Display(s)	LG UltraGear GL850
Case	Asus Tuf G501
Power Supply	EVGA 850W G6
Mouse	Razer Viper Mini
Keyboard	Corsair K70