Rumor: NVIDIA RTX 3080, 3070, 3060 Mobile Specifications Detailed

Raevenlord · Dec 30, 2020

Apparently, specifications for NVIDIA's upcoming RTX 30-series mobile solutions have been made public. According to Videocardz via Notebookcheck, NVIDIA will introduce three mobile versions of their RTX 30-series graphics cards in the form of the RTX 3080, RTX 3070 and RTX 3060. Like past NVIDIA mobile solutions, these won't directly correspond, hardware-wise, to their desktop counterparts; NVIDIA has the habit of downgrading their mobile solutions' chips compared to their desktop counterparts. According to the leaked specifications, this means the mobile RTX 3080 will maker use of the company's GA-104 chip, instead of the GA-102 silicon found on desktop versions of the card.

The mobile RTX 3080 should thus feature a total of 6,144 CUDA cores, as present in the fully-enabled GA-104 chip (compare that to the 5,888 CUDA cores available on the desktop RTX 3070, and the 8,704 CUDA cores available on the RTX 3080). These CUDA cores would be clocked at up to 1.7 GHz. The memory bus should also see a cut down to 256-bit, which would allow NVIDIA to distribute as many as 4 versions of the RTX 3080 mobile: Max-Q (TGP 80-90 W), Max-P (TGP 115-150 W), with either 8 GB or 16 GB of GDDR6 memory. The RTX 3070 mobile keeps the GA-104 chip, 256-bit bus and GDDR6 memory subsystem (apparently with only 8 GB memory pool available), but further cuts down CUDA cores to 5,120 (Max-Q TGP 80-90 W, Max-P TGP 115-150 W). Finally, the RTX 3060 mobile should make use of the GA106 chip, set up with 3,072 available CUDA cores and a 192-bit memory bus across its 6 GB of GDDR6 VRAM pool (Max-Q TGP 60-70 W), Max-P (TGP 80-115 W). Expect these specs to be confirmed (or not) come January 12th.

View at TechPowerUp Main Site

henok.gk · Dec 30, 2020

Back to Maxwell days I see. Goes to show how much pascal was efficient, I mean like all the desktop and laptop variants used same silicon (1070 being an exception which is even better than the desktop variant) with not that much gimping in terms of clocks to meet the TDP requirements and were within under 15% range in terms of performance difference compared to desktop variants. On the other hand Turing was a mess. Anything above 1660 Ti was nowhere close to desktop variant even though they used the same silicon including super variants that came later on. Ampere is as we all know a power-guzzling arc and there's no way they would fit anything above the GA104 silicon without melting the laptop chassis even with ridiculous gimping in terms of clocks and they just had to evolve but backwards this time.

Chrispy_ · Dec 30, 2020

The problem with Ampere on Samsung 8nm is that it isn't really any more efficient than Turing on TSMC 12nm

So you're paying this mad premium and all you're getting is the 3000-series name, it's still 100% contstrained by its performance/Watt which is distinctly 2018 levels, and Turing's 2018 level of performance/Watt wasn't actually amazing in the RTX cards, only the more efficient 1600-series cards really outshone 2016's Pascal.

owen10578 · Dec 30, 2020

Wow if that's true that's the most gimped mobile variant of their desktop counterparts in a while...not surprising though considering the power guzzler that is Ampere.

yotano211 · Dec 30, 2020

Ampere is only a power guzzler when you have a 3080 and higher. The 3070 is rated at 220w with the performance level of a 2080ti thats rated at, i think, 250 or 275w.
Its really hard to cut down a 3080 from 320w rated down 150w for a laptop or 200w for some of the bigger laptops. Most top end gpus for laptop are rated at 150w max, some are 200w if the cooling allows it.

Deleted member 185088 · Dec 30, 2020

It's a mess again, it shows how Ampere is not a big of a jump as portrait by some.

Vya Domus · Dec 31, 2020

Ouch, looks like the "M" parts are back. I wonder if they'll sell them under a different name compared to the desktop counterparts like they should, or if they'll be disingenuous about it like I'm expecting them to. "Max-Q" was already pretty bad and misleading.

Crackong · Dec 31, 2020

Cut-down in both core (CUDA count) and Memory (non-X )

Will the price gets a cut-down too ?

watzupken · Dec 31, 2020

I recall Nvidia proudly mentioned that the laptop GPU = desktop GPU back then with Pascal if I am not mistaken. With Ampere, that's taken a step back. I think it is a sensible decision for now because of the low supply of GA102, and not to mention that the power requirement is very high for a laptop part (which will be difficult to cool as well even if they scale back the clockspeed).

Ibotibo01 · Dec 31, 2020

If this is true, RTX 3060 will be good for laptops. Considering 3072 cores, it will give same performance of RTX 2060 desktop version for $1000 (i hope). Now think the desktop version, its name is RTX 3060 even so, it must give RTX 2070's performance also, there is no GTX series. So, RTX 3060 6GB which will be about $230 will be good value. Also, RTX 3060 12GB which will be about $299-329 for RTX 2070S-2080' performance level will be good value.

Chrispy_ said:
The problem with Ampere on Samsung 8nm is that it isn't really any more efficient than Turing on TSMC 12nm
So you're paying this mad premium and all you're getting is the 3000-series name, it's still 100% contstrained by its performance/Watt which is distinctly 2018 levels, and Turing's 2018 level of performance/Watt wasn't actually amazing in the RTX cards, only the more efficient 1600-series cards really outshone 2016's Pascal.

I believe that perf/watt is bad because of core counts and Ampere's cores don't give full performance or Ampere is bad architecture but, in years, Nvidia will improve their architecture's core performance. So, Ampere is the first architecture has a lot of cores.

Havefun · Dec 31, 2020

NVIDIA GeForce GTX 1660 Ti Mobile

Graphics Processor TU116 Cores 1536 TMUs 96 ROPs 48 Memory Size 6 GB Bus Width 192 bit
Base Clock 1455 MHz Boost Clock 1590 MHz
Theoretical Performance Pixel Rate 76.32 GPixel/s Texture Rate 152.6 GTexel/s FP16 (half) 9.769 TFLOPS (2:1) FP32 (float) 4.884 TFLOPS FP64 (double) 152.6 GFLOPS (1:32) Bandwidth 288.0 GB/s

NVIDIA GeForce RTX 3060 Mobile

Graphics Processor GA106 Cores 3072 TMUs 96 ROPs 48 Memory Size 6 GB Bus Width 192 bit
Tensor Cores 96 RT Cores 24
Base Clock 900 MHz Boost Clock 1425 MHz
Theoretical Performance Pixel Rate 68.40 GPixel/s Texture Rate 136.8 GTexel/s FP16 (half) 8.755 TFLOPS (1:1) FP32 (float) 8.755 TFLOPS FP64 (double) 136.8 GFLOPS (1:64)
Bandwidth 336.0 GB/s

Power efficiency is so bad, that they reduce clocks to 900 MHz (555 less). 1660Ti is 11% faster except for FP32(3600 is ~80% better due to Tensor cores) and some memory bandwith. I will not pay double price for 10 FPS in games. Lets hope Radeons mobile will be beter

Chrispy_ · Dec 31, 2020

Havefun said:
NVIDIA GeForce GTX 1660 Ti Mobile
Graphics Processor TU116 Cores 1536 TMUs 96 ROPs 48 Memory Size 6 GB Bus Width 192 bit
Base Clock 1455 MHz Boost Clock 1590 MHz
Theoretical Performance Pixel Rate 76.32 GPixel/s Texture Rate 152.6 GTexel/s FP16 (half) 9.769 TFLOPS (2:1) FP32 (float) 4.884 TFLOPS FP64 (double) 152.6 GFLOPS (1:32) Bandwidth 288.0 GB/s

NVIDIA GeForce RTX 3060 Mobile
Graphics Processor GA106 Cores 3072 TMUs 96 ROPs 48 Memory Size 6 GB Bus Width 192 bit
Tensor Cores 96 RT Cores 24
Base Clock 900 MHz Boost Clock 1425 MHz
Theoretical Performance Pixel Rate 68.40 GPixel/s Texture Rate 136.8 GTexel/s FP16 (half) 8.755 TFLOPS (1:1) FP32 (float) 8.755 TFLOPS FP64 (double) 136.8 GFLOPS (1:64)
Bandwidth 336.0 GB/s

Power efficiency is so bad, that they reduce clocks to 900 MHz (555 less). 1660Ti is 11% faster except for FP32(3600 is ~80% better due to Tensor cores) and some memory bandwith. I will not pay double price for 10 FPS in games. Lets hope Radeons mobile will be beter

The architecture isn't even the same. A Turing CUDA core has higher IPC than an Ampere CU:
Both the 2080S and 3060Ti boost to around 1900MHz and have damn-near identical performance, but Turing achieves that with just 3072 cores, whilst Ampere uses 4864 to achieve the same thing.

The combination of reduced core count and clocks on Ampere Mobile are going to be devastating. You can bet Nvidia will be pushing DLSS and RTX ON super hard in all of their press and "reviewer guides".

Ampere cores are Nividia's 'Bulldozer architecture' mistake. They've tried to double up certain things but whilst they've doubled the "core" count and power consumption they haven't actually doubled performance at all. 4864/3072 means that Turing does about 60% more work per core in traditional (read: current) games - or to rephrase that, Nvidia's attempts to double the core count with Ampere only resulted in a 26% performance gain. That's pitiful, and all those extra cores waste die area and power consumption without providing the expected performance.

nguyen · Jan 1, 2021

Chrispy_ said:
The architecture isn't even the same. A Turing CUDA core has higher IPC than an Ampere CU:
Both the 2080S and 3060Ti boost to around 1900MHz and have damn-near identical performance, but Turing achieves that with just 3072 cores, whilst Ampere uses 4864 to achieve the same thing.

The combination of reduced core count and clocks on Ampere Mobile are going to be devastating. You can bet Nvidia will be pushing DLSS and RTX ON super hard in all of their press and "reviewer guides".

Ampere cores are Nividia's 'Bulldozer architecture' mistake. They've tried to double up certain things but whilst they've doubled the "core" count and power consumption they haven't actually doubled performance at all. 4864/3072 means that Turing does about 60% more work per core in traditional (read: current) games - or to rephrase that, Nvidia's attempts to double the core count with Ampere only resulted in a 26% performance gain. That's pitiful, and all those extra cores waste die area and power consumption without providing the expected performance.

Ampere does appear to be more efficient at the lower performance tier though

At the same TDP as the Max-Q Turing, Max-Q Ampere would be ~25% faster, which is wasted with how slow mobile CPU currently are anyways.
I have an Intel 10875H + 2070 Super Max-Q laptop and most of the time I run into CPU bottleneck in game.

THANATOS · Jan 1, 2021

Chrispy_ said:
The architecture isn't even the same. A Turing CUDA core has higher IPC than an Ampere CU:
Both the 2080S and 3060Ti boost to around 1900MHz and have damn-near identical performance, but Turing achieves that with just 3072 cores, whilst Ampere uses 4864 to achieve the same thing.

The combination of reduced core count and clocks on Ampere Mobile are going to be devastating. You can bet Nvidia will be pushing DLSS and RTX ON super hard in all of their press and "reviewer guides".

Ampere cores are Nividia's 'Bulldozer architecture' mistake. They've tried to double up certain things but whilst they've doubled the "core" count and power consumption they haven't actually doubled performance at all. 4864/3072 means that Turing does about 60% more work per core in traditional (read: current) games - or to rephrase that, Nvidia's attempts to double the core count with Ampere only resulted in a 26% performance gain. That's pitiful, and all those extra cores waste die area and power consumption without providing the expected performance.

Comparing IPC between Turing and Ampere based on a single FP32(Cuda core) is pointless. Turing per SM has 64x FP32(Cuda) units + 64x INT32 units, Ampere per SM has 64x FP32 units + 64x FP32/INT32 units.
You are comparing 2080S with 48SM against 3060Ti which has only 38SM(Streaming multi-processor), so It's not surprising that the performance is pretty close.

Better comparison would be RTX 2080 vs RTX 3070, they have the same number of SM, the difference is in 2x more Cuda cores and 50% more ROPs, clockspeed and bandwidth is comparable.
The difference in performance is 28%, which doesn't look great considering the chip has 2x more Cuda and 50% more ROPs, but half of those Cuda cores are doing either INT or FP operation and the number of SM didn't change so you can't really expect massive performance gains.

Now the question is If It was worth It or not. The number of transistors increased by 28%(17.4 vs 13.6), the same as performance, and this increase shouldn't be caused only by adding extra Cuda cores when you also have more ROPs and new features. Average power consumption is 215W(2080) vs 220W(3070), so only 2%(5W) difference for 28% more performance, yes I know the manufacturing process is different. Or RTX3070 performs as RTX2080Ti while having less transistors and lower power consumption.
Ampere is not worse than Turing.

Havefun said:
NVIDIA GeForce GTX 1660 Ti Mobile
Graphics Processor TU116 Cores 1536 TMUs 96 ROPs 48 Memory Size 6 GB Bus Width 192 bit
Base Clock 1455 MHz Boost Clock 1590 MHz
Theoretical Performance Pixel Rate 76.32 GPixel/s Texture Rate 152.6 GTexel/s FP16 (half) 9.769 TFLOPS (2:1) FP32 (float) 4.884 TFLOPS FP64 (double) 152.6 GFLOPS (1:32) Bandwidth 288.0 GB/s

NVIDIA GeForce RTX 3060 Mobile
Graphics Processor GA106 Cores 3072 TMUs 96 ROPs 48 Memory Size 6 GB Bus Width 192 bit
Tensor Cores 96 RT Cores 24
Base Clock 900 MHz Boost Clock 1425 MHz
Theoretical Performance Pixel Rate 68.40 GPixel/s Texture Rate 136.8 GTexel/s FP16 (half) 8.755 TFLOPS (1:1) FP32 (float) 8.755 TFLOPS FP64 (double) 136.8 GFLOPS (1:64)
Bandwidth 336.0 GB/s

Power efficiency is so bad, that they reduce clocks to 900 MHz (555 less). 1660Ti is 11% faster except for FP32(3600 is ~80% better due to Tensor cores) and some memory bandwith. I will not pay double price for 10 FPS in games. Lets hope Radeons mobile will be beter

Do you know in what kind of workload does RTX 3060 clock as low as 900Mhz and what power consumption or TDP It actually has at that clockspeed? If your answer is that you don't know, then your conlusion is premature.
What is important is the actual clockspeed during gaming, If It is comparable to 1660Ti, then the performance difference should be >20%. If It's lower then performance will suffer, but Nvidia is positioning It above 1660Ti so It should perform better.
BTW from where did you get those clocks for 3060 Mobile?

Chrispy_ · Jan 1, 2021

THANATOS said:
Comparing IPC between Turing and Ampere based on a single FP32(Cuda core) is pointless. Turing per SM has 64x FP32(Cuda) units + 64x INT32 units, Ampere per SM has 64x FP32 units + 64x FP32/INT32 units.
You are comparing 2080S with 48SM against 3060Ti which has only 38SM(Streaming multi-processor), so It's not surprising that the performance is pretty close.

Better comparison would be RTX 2080 vs RTX 3070, they have the same number of SM, the difference is in 2x more Cuda cores and 50% more ROPs, clockspeed and bandwidth is comparable.
The difference in performance is 28%, which doesn't look great considering the chip has 2x more Cuda and 50% more ROPs, but half of those Cuda cores are doing either INT or FP operation and the number of SM didn't change so you can't really expect massive performance gains.

These are all valid points. Ampere is, on paper, and in a theoretical scenario or synthetic test, both faster and more efficient than Turing.

The problem is that the applications and games we have right now can't fully utilise it - and the relatively short window of advantage that mobile GPUs get before something better/more efficient comes along means that I doubt those applications and games will exist during Ampere's window of relevance for "premium DTR/Gaming laptops".

The reason I picked the 2080S and not the 2080 as a matchup is because that's an exact performance match for the 3060Ti as tested by TPU in a wide range of current titles, right now. Clock-for-clock, Ampere uses 59% more 'cores' than Turing's 'cores' even though those two definitions of cores aren't the same from a technological standpoint. The underlying architecture of whether it's an FP, INT, or combined core doesn't matter to today's games, even if it will probably scale differently in future applications.

For Ampere's 12-18 months of laptop shelf life, it only matters today that 3072 Turing cores can do the same exact work as 4864 Ampere cores. For people hanging onto these laptops for 5+ years, it will probably make a big difference. Right now, today, that means diddly squat

nguyen said:
Ampere does appear to be more efficient at the lower performance tier though

I'm not sure it's fair to make that comparison; An xx60 vs xx80 comparison isn't picking two SKUs targeting the same thing. If we're allowed to mix SKUs, the Turing 1660Ti matches the 3060Ti almost perfectly for performance/Watt and the 2080 is much better than the 3080!

For different models within any single product generation, the performance/Watt is less about architectural efficiency and more about the target market for the product. Lower end SKUs target more efficient operation for use with cheaper cooling/VRMs/PCB design. Flagship models go all out on cooling/VRMs/PCB and crank the power target as high as reasonably possible to be the best they can be for that generation. Same product generation and architecture, but opposite ends of the performance/Watt spectrum, which is why comparing them specifically on performance/Watt is so meaningless.

So yes, you can make the efficiency comparison, but it's only going to be fair in like-for-like examples aimed at the same performance segment and market point, so the closest we have to that is 3080 vs 2080, 3070 vs 2070 etc. It's not even as clean-cut as that, because you could argue that the 3080 is actually closer to the 2080Ti because they both share xx102 silicon dies, and likewise the closes match for a 3070 is actually a 2080S because both of those represent the xx104 dies. I'm not suggesting that either die parity or SKU parity is 100% right, but they're definitely less wrong than making comparisons that are neither the same die nor SKU.

nguyen · Jan 1, 2021

Chrispy_ said:
I'm not sure it's fair to make that comparison. an xx60 card vs an xx80 card comparison is two different power and performance targets. The Turing 1660Ti matches the 3060Ti almost perfectly in that regard and the 3080 is much worse than the 2080.

For different models within any single product generation, the performance/Watt is less about architectural efficiency and more about the target market for the product. Lower end SKUs target more efficient operation for use with cheaper cooling/VRMs/PCB design. Flagship models go all out on cooling/VRMs/PCB and crank the power target as high as reasonably possible to be the best they can be for that generation. Same product generation and architecture, but opposite ends of the performance/Watt spectrum.

So yes, you can make the efficiency comparison, but it's only going to be fair in like-for-like examples aimed at the same performance segment and market point, so the closest we have to that is 3080 vs 2080, 3070 vs 2070 etc.

You can't compare the efficiency of desktop GPUs and make conjecture about mobile GPUs anyways.

At the same TGP (80W-90W-115W form factor), the stronger GPU will be the better performer.
When you look at the performance per watt of the Desktop 2080 Super, it's nothing special, but the mobile 2080 Super Max-Q is the efficiency king.

So yeah, the 3080 Max-Q will no doubt beat the 2080 Super Max-Q by at least 20%, when you are not being CPU bottlenecked that is.
IMHO, mobile RTX3000 should be paired with Ryzen 5000 mobile and nothing less, it was kinda dissappointing to see OEM only paired AMD Renoir CPU with Mobile RTX 2060 or slower.

Chrispy_ · Jan 1, 2021

nguyen said:
You can't compare the efficiency of desktop GPUs and make conjecture about mobile GPUs anyways.

At the same TGP (80W-90W-115W form factor), the stronger GPU will be the better performer.
When you look at the performance per watt of the Desktop 2080 Super, it's nothing special, but the mobile 2080 Super Max-Q is the efficiency king.

So yeah, the 3080 Max-Q will no doubt beat the 2080 Super Max-Q by at least 20%, when you are not being CPU bottlenecked that is.
IMHO, mobile RTX3000 should be paired with Ryzen 5000 mobile and nothing less, it was kinda dissappointing to see OEM only paired AMD Renoir CPU with Mobile RTX 2060 or slower.

Yeah that's kind of my point, I'm not comparing GPUs I'm comparing architectures. In today's software Nvidia's definition of a Turing core is more efficient at any given clockspeed than Nividia's definition of an Ampere core.

I have been practising what you suggest for about a decade now - buy a higher-end SKU than I need and downclocking it to bring the efficiency up. That's all mobile models really are anyway.

THANATOS · Jan 1, 2021

Chrispy_ said:
These are all valid points. Ampere is, on paper, and in a theoretical scenario or synthetic test, both faster and more efficient than Turing.

Ampere is on average faster and more efficient in real world games as shown in TPU reviews.

The reason I picked the 2080S and not the 2080 as a matchup is because that's an exact performance match for the 3060Ti as tested by TPU in a wide range of current titles, right now. Clock-for-clock, Ampere uses 59% more 'cores' than Turing's 'cores' even though those two definitions of cores aren't the same from a technological standpoint. The underlying architecture of whether it's an FP, INT, or combined core doesn't matter to today's games, even if it will probably scale differently in future applications.

3060Ti is a bit faster than both 2080 and 2080s. If you want to compare exact same performance then there is 2080Ti vs 3070.
Ampere doesn't really have 2x more Cuda cores, the number of cores(units) is the same per SM, the difference is only that half of them can now do either FP32 or INT32 operation.
Even today It matters what kind of unit(FP, INT or combined) It is, If you have a combined core(unit) and the game needs to use INT32 units then there is no advantage over Turing, It's just 64x FP32 and 64x INT32 per SM, Ampere has the advantage when there is no INT32 instruction executed and then you have 128x FP32 units.

Chrispy_ said:
Yeah that's kind of my point, I'm not comparing GPUs I'm comparing architectures. In today's software Nvidia's definition of a Turing core is more efficient at any given clockspeed than Nividia's definition of an Ampere core.

I have been practising what you suggest for about a decade now - buy a higher-end SKU than I need and downclocking it to bring the efficiency up. That's all mobile models really are anyway.

Comparing Turing core to Ampere core is simply not right and you can't make any valid conclussion based on It.
Turing has fixed 64 FP32 and 64 INT32 units per SM.
Ampere has fixed 64 FP32 units and combined 64 INT32/FP32 units per SM.
If It was fixed 128 FP32 units + 64 INT32 units per SM, then so be It, but even that wouldn't be fair, If the specs of the rest of the chip stays the same.

Chrispy_ said:
I'm not sure it's fair to make that comparison; An xx60 vs xx80 comparison isn't picking two SKUs targeting the same thing. If we're allowed to mix SKUs, the Turing 1660Ti matches the 3060Ti almost perfectly for performance/Watt and the 2080 is much better than the 3080!

xx60 vs xx80 is just a marketing name, comparing based on actual GPU die(TU104 vs GA104) is still better, but in my opinion the best is comparing based on SM count.
BTW here is a link to performance/W chart from TPU and in 4K It look like this.
GTX 1660Ti vs RTX 3060Ti
89% vs 108%
RTX 2080 vs RTX 3080
90% vs 95%
RTX 2080s vs RTX 3080
84% vs 95%
I will wait for actual reviews for mobile Ampere and won't make a final conclusion before that.

medi01 · Jan 1, 2021

Havefun said:
Power efficiency is so bad, that they reduce clocks to 900 MHz (555 less). 1660Ti is 11% faster except for FP32(3600 is ~80% better due to Tensor cores) and some memory bandwith. I will not pay double price for 10 FPS in games. Lets hope Radeons mobile will be beter

3060 is likely better due to the CUs (which are in reality half of the claimed) supporting 2 fp32 ops in parallel.
Unless I'm mistaken and that doesn't cover fp32.

Chrispy_ · Jan 1, 2021

THANATOS said:
Ampere is on average faster and more efficient in real world games as shown in TPU reviews.

Hey, don't quote me out of context! We're talking about PER CORE here and as shown in TPU reviews, Ampere is slower than Turing with the 3072 cores of a 2080S matching the performance of 4864 Ampere cores in a 3060Ti

The rest of your post seems to be a disagreement about what a core is, based on how many of each type in an SM.

At the end of the day, you can theorise until you're blue in the face but according to the the official Nvidia definition of 'cores', Turing does more per core than Ampere across TPU's combined game benchmark suite. That's Nvidia's official numbers against TPU's independent real world testing. If you disagree with either the definition of a core or W1zzard's benchmark results, take that up with them respectively. I'm not making those claims, they are.

Havefun · Jan 1, 2021

THANATOS said:
Do you know in what kind of workload does RTX 3060 clock as low as 900Mhz and what power consumption or TDP It actually has at that clockspeed? If your answer is that you don't know, then your conlusion is premature.
What is important is the actual clockspeed during gaming, If It is comparable to 1660Ti, then the performance difference should be >20%. If It's lower then performance will suffer, but Nvidia is positioning It above 1660Ti so It should perform better.
BTW from where did you get those clocks for 3060 Mobile?

Yea, there is still no RTX 3060 mobile released so my conclusion is premature, same as other conclusions here. i have these data from Techpowerup GPU database. TDP 3600 = 80W. Now i checked it again - In relative performance 1660 Ti is 7% better then 3600 , and 18% better then 3600 Max-Q. In real performance 3600 will be better but i expect 10-15 FPS difference. Is it worth?

Chrispy_ said:
The architecture isn't even the same. A Turing CUDA core has higher IPC than an Ampere CU:
Both the 2080S and 3060Ti boost to around 1900MHz and have damn-near identical performance, but Turing achieves that with just 3072 cores, whilst Ampere uses 4864 to achieve the same thing.

The combination of reduced core count and clocks on Ampere Mobile are going to be devastating. You can bet Nvidia will be pushing DLSS and RTX ON super hard in all of their press and "reviewer guides".

Ampere cores are Nividia's 'Bulldozer architecture' mistake. They've tried to double up certain things but whilst they've doubled the "core" count and power consumption they haven't actually doubled performance at all. 4864/3072 means that Turing does about 60% more work per core in traditional (read: current) games - or to rephrase that, Nvidia's attempts to double the core count with Ampere only resulted in a 26% performance gain. That's pitiful, and all those extra cores waste die area and power consumption without providing the expected performance.

Yea exactly, when i saw 350W for desktop ampere i was courious how much they cut it down for laptops (50-100W). Ampere is so terrible that before it was released, they announced next gen Hopper.

DLSS and RTX ON is not enough in reviews, they also push 4k resolution, cos no laptops have that resolution

To produce 2x more cores also cost 2x more. Also Nvidia is forcing people to pay for huge die areas of "Ray Cores". They should cut it down atleast for Mobile versions.

nguyen · Jan 2, 2021

Havefun said:
Yea, there is still no RTX 3060 mobile released so my conclusion is premature, same as other conclusions here. i have these data from Techpowerup GPU database. TDP 3600 = 80W. Now i checked it again - In relative performance 1660 Ti is 7% better then 3600 , and 18% better then 3600 Max-Q. In real performance 3600 will be better but i expect 10-15 FPS difference. Is it worth?

Yea exactly, when i saw 350W for desktop ampere i was courious how much they cut it down for laptops (50-100W). Ampere is so terrible that before it was released, they announced next gen Hopper.
DLSS and RTX ON is not enough in reviews, they also push 4k resolution, cos no laptops have that resolution
To produce 2x more cores also cost 2x more. Also Nvidia is forcing people to pay for huge die areas of "Ray Cores". They should cut it down atleast for Mobile versions.

GA104 is 392mm2 vs TU104 545mm2. More cores or not GA104 is cheaper to produce than TU104.
Nvidia is paying Samsung and TSMC per wafer, not per chip or how many transistors they have.
Even if GA104 is only faster than TU104 by 20% at the same TGP, it is a success because it is enough of an upgrade that people are gonna buy them and Nvidia is making higher profit margin in the process.

THANATOS · Jan 2, 2021

Chrispy_ said:
Hey, don't quote me out of context! We're talking about PER CORE here and as shown in TPU reviews, Ampere is slower than Turing with the 3072 cores of a 2080S matching the performance of 4864 Ampere cores in a 3060Ti

The rest of your post seems to be a disagreement about what a core is, based on how many of each type in an SM.

At the end of the day, you can theorise until you're blue in the face but according to the the official Nvidia definition of 'cores', Turing does more per core than Ampere across TPU's combined game benchmark suite. That's Nvidia's official numbers against TPU's independent real world testing. If you disagree with either the definition of a core or W1zzard's benchmark results, take that up with them respectively. I'm not making those claims, they are.

You are talking about PER CORE performance and I am trying to tell you that's not a good comparison.
Yeah, Nvidia uses "Cuda core" as a marketing name for FP32 units and I don't really have a problem with that even If It's a bit misleading or with TPU benchmark results, what I have a problem is your comparison. Neither Nvidia, TPU or other reviewers make conclusions about Ampere vs Turing based on the number of Cuda cores, you are the only one.
Let's compare performance(IPC), power consumption and efficiency(performance/W) based on SM and Cuda cores -> RTX 2080 vs RTX 3070
RTX 3070 with 2x more Cuda is only 28% faster in 4K resolution and consumes 220W or 5W more than RTX 2080, which means It has 25% better performance/W ratio.
Performance PER SM -> Ampere SM is 28% faster and consumes 2.3% more W than Turing SM.
Performance PER Cuda -> Ampere Cuda core is ~36% slower and consumes 49% less power than Turing Cuda core and that's simply hillarious, because Cuda core is the same in both Ampere and Turing, there was no change. The change happened a level higher in SM, where the original 64x INT32 units are now capable of FP32 execution.

Havefun said:
Yea, there is still no RTX 3060 mobile released so my conclusion is premature, same as other conclusions here. i have these data from Techpowerup GPU database. TDP 3600 = 80W. Now i checked it again - In relative performance 1660 Ti is 7% better then 3600 , and 18% better then 3600 Max-Q. In real performance 3600 will be better but i expect 10-15 FPS difference. Is it worth?

There is no official info about performance or clockspeed, so It's just an estimate someone put there, that's why they write:

This product is not released yet.
Data on this page may change in the future.

My estimate for RTX 3060(Max-Q) is ~20-30% better performance than 1660Ti(Max-Q), If the clockspeed is comparable. If It's worth It or not that's something everyone has to answer for themselves.

Chrispy_ · Jan 2, 2021

THANATOS said:
You are talking about PER CORE performance and I am trying to tell you that's not a good comparison.
Yeah, Nvidia uses "Cuda core" as a marketing name for FP32 units and I don't really have a problem with that even If It's a bit misleading or with TPU benchmark results, what I have a problem is your comparison. Neither Nvidia, TPU or other reviewers make conclusions about Ampere vs Turing based on the number of Cuda cores, you are the only one.
Let's compare performance(IPC), power consumption and efficiency(performance/W) based on SM and Cuda cores -> RTX 2080 vs RTX 3070
RTX 3070 with 2x more Cuda is only 28% faster in 4K resolution and consumes 220W or 5W more than RTX 2080, which means It has 25% better performance/W ratio.
Performance PER SM -> Ampere SM is 28% faster and consumes 2.3% more W than Turing SM.
Performance PER Cuda -> Ampere Cuda core is ~36% slower and consumes 49% less power than Turing Cuda core and that's simply hillarious, because Cuda core is the same in both Ampere and Turing, there was no change. The change happened a level higher in SM, where the original 64x INT32 units are now capable of FP32 execution.

There is no official info about performance or clockspeed, so It's just an estimate someone put there, that's why they write:

My estimate for RTX 3060(Max-Q) is ~20-30% better performance than 1660Ti(Max-Q), If the clockspeed is comparable. If It's worth It or not that's something everyone has to answer for themselves.

You're still trying to have an irrelevant and misdirected argument with me about how the cores aren't comparable between generations.

I'm not the one defining cores.
I'm not the one publishing data showing the 3060Ti performance parity with a 2080S

If you don't like it, it's not me that needs convincing; You're preaching to the choir and have been for some time in this thread. Nvidia are, whether you like it or not, marketing and selling their product on core count. This article is mostly about core count (@Ravenlord mentions it 6 times in a single paragraph) and when most people look at GPU specs the two most important factors are the number of cores and the clocks those cores run at.

I get (I always got) the architectural dissimilarities between a Turing and an Ampere core. I know and I don't care that per-core performance isn't a good comparison - that's the comparison that is being made, that Nvidia themselves make, that reviewers make, and that many users will too. Regardless of the comparison's future/architectural relevance, mobile Turing owners will multiply the number of cores they currently have by 1.59 (for the 59% per-core advantage over Ampere) and know that in currently-benchmarked games, an Ampere purchase with fewer cores that that isn't going to be any faster. That's simple maths and backed up with clock-comparable empirical data.

THANATOS · Jan 3, 2021

Chrispy_ said:
You're still trying to have an irrelevant and misdirected argument with me about how the cores aren't comparable between generations.

I'm not the one defining cores.
I'm not the one publishing data showing the 3060Ti performance parity with a 2080S

If you don't like it, it's not me that needs convincing; You're preaching to the choir and have been for some time in this thread. Nvidia are, whether you like it or not, marketing and selling their product on core count. This article is mostly about core count (@Ravenlord mentions it 6 times in a single paragraph) and when most people look at GPU specs the two most important factors are the number of cores and the clocks those cores run at.

I get (I always got) the architectural dissimilarities between a Turing and an Ampere core. I know and I don't care that per-core performance isn't a good comparison - that's the comparison that is being made, that Nvidia themselves make, that reviewers make, and that many users will too. Regardless of the comparison's future/architectural relevance, mobile Turing owners will multiply the number of cores they currently have by 1.59 (for the 59% per-core advantage over Ampere) and know that in currently-benchmarked games, an Ampere purchase with fewer cores that that isn't going to be any faster. That's simple maths and backed up with clock-comparable empirical data.

Ok, by reading your last two sentences now I get what you were pointing at.
So the conclusion based on desktop models is that Ampere GPU is more power efficient(perf/W) than Turing GPU and as fast or faster depending on which models you are comparing(3060Ti vs 2080S, 3070 vs 2080, 3070 vs 2080ti), but because of the architectural changes in SM you need to watch out for the number of Cuda cores even with same clockspeed, because It's not representative of performance gain over Turing architecture and you could end up with much lower gaming performance than you wanted.

For example If you want to upgrade from RTX 2060 mobile with 1920 Cuda cores, then an Ampere GPU with 2944-3072 cores will perform similarly even If the difference in Cuda cores is 53-60% so you need to choose Ampere with 3840 Cuda or more, If you want at least ~25% more performance.
I think this sums It up pretty nicely.

I have to wonder If RTX 3060 mobile will have only 3072Cuda(24SM) with 192bit GDDR6 bus.
An uncut GA104 has 6144Cuda cores(48SM) and 256bit GDDR6 bus. Based on this even 128bit bus should be enough for 3072 cores(24SM) and Nvidia would need to deactivate half of cores(SM) to get this from GA104, that's too much of a waste.
I think this RTX 3060 is based on GA106 and there will be RTX 3060Ti or Super with 3840 cores. The question is If GA106 will have only 24SM or 30SM in full config, but considering GA104 has 48SM I think It will have 30SM and 192bit GDDR6 bus.

System Name	The Ryzening
Processor	AMD Ryzen 9 5900X
Motherboard	MSI X570 MAG TOMAHAWK
Cooling	Lian Li Galahad 360mm AIO
Memory	32 GB G.Skill Trident Z F4-3733 (4x 8 GB)
Video Card(s)	Gigabyte RTX 3070 Ti
Storage	Boot: Transcend MTE220S 2TB, Kintson A2000 1TB, Seagate Firewolf Pro 14 TB
Display(s)	Acer Nitro VG270UP (1440p 144 Hz IPS)
Case	Lian Li O11DX Dynamic White
Audio Device(s)	iFi Audio Zen DAC
Power Supply	Seasonic Focus+ 750 W
Mouse	Cooler Master Masterkeys Lite L
Keyboard	Cooler Master Masterkeys Lite L
Software	Windows 10 x64

System Name	Bragging Rights
Processor	Atom Z3735F 1.33GHz
Motherboard	It has no markings but it's green
Cooling	No, it's a 2.2W processor
Memory	2GB DDR3L-1333
Video Card(s)	Gen7 Intel HD (4EU @ 311MHz)
Storage	32GB eMMC and 128GB Sandisk Extreme U3
Display(s)	10" IPS 1280x800 60Hz
Case	Veddha T2
Audio Device(s)	Apparently, yes
Power Supply	Samsung 18W 5V fast-charger
Mouse	MX Anywhere 2
Keyboard	Logitech MX Keys (not Cherry MX at all)
VR HMD	Samsung Oddyssey, not that I'd plug it into this though....
Software	W10 21H1, barely
Benchmark Scores	I once clocked a Celeron-300A to 564MHz on an Abit BE6 and it scored over 9000.

System Name	Nero Mini
Processor	AMD Ryzen 7 5800X 4.7GHz-4.9GHz
Motherboard	Gigabyte X570i Aorus Pro Wifi
Cooling	Noctua NH-D15S+3x Noctua IPPC 3K
Memory	Team Dark 3800MHz CL16 2x16GB 55ns
Video Card(s)	Palit RTX 2060 Super JS Shunt Mod 2130MHz/1925MHz + 2x Noctua 120mm IPPC 3K
Storage	Adata XPG Gammix S50 1TB
Display(s)	LG 27UD68W
Case	Lian-Li TU-150
Power Supply	Corsair SF750 Platinum
Software	Windows 10 Pro

System Name	MSI GP76
Processor	intel i7 11800h
Cooling	2 laptop fans
Memory	32gb of 3000mhz DDR4
Video Card(s)	Nvidia 3070
Storage	x2 PNY 8tb cs2130 m.2 SSD--16tb of space
Display(s)	17.3" IPS 1920x1080 240Hz
Power Supply	280w laptop power supply
Mouse	Logitech m705
Keyboard	laptop keyboard
Software	lots of movies and Windows 10 with win 7 shell
Benchmark Scores	Good enough for me

System Name	Good enough
Processor	AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard	ASRock B650 Pro RS
Cooling	2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory	32GB - FURY Beast RGB 5600 Mhz
Video Card(s)	Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage	1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s)	LG UltraGear 32GN650-B + 4K Samsung TV
Case	Phanteks NV7
Power Supply	GPS-750C

Rumor: NVIDIA RTX 3080, 3070, 3060 Mobile Specifications Detailed

Raevenlord

News Editor

henok.gk

Chrispy_

owen10578

yotano211

Deleted member 185088

Guest

Vya Domus

Crackong

watzupken

Ibotibo01

Havefun

New Member

NVIDIA GeForce GTX 1660 Ti Mobile

NVIDIA GeForce RTX 3060 Mobile

Chrispy_

NVIDIA GeForce GTX 1660 Ti Mobile

NVIDIA GeForce RTX 3060 Mobile

nguyen

THANATOS

NVIDIA GeForce GTX 1660 Ti Mobile

NVIDIA GeForce RTX 3060 Mobile

Chrispy_

nguyen

Chrispy_

THANATOS

medi01

Chrispy_

Havefun

New Member

nguyen

THANATOS

Chrispy_

THANATOS

System Name	Personal Gaming Rig
Processor	Ryzen 7800X3D
Motherboard	MSI X670E Carbon
Cooling	MO-RA 3 420
Memory	32GB 6000MHz
Video Card(s)	RTX 4090 ICHILL FROSTBITE ULTRA
Storage	4x 2TB Nvme
Display(s)	Samsung G8 OLED
Case	Silverstone FT04

System Name	The de-ploughminator Mk-III
Processor	9800X3D
Motherboard	Gigabyte X870E Aorus Master
Cooling	DeepCool AK620
Memory	2x32GB G.SKill 6400MT Cas32
Video Card(s)	Asus RTX4090 TUF
Storage	4TB Samsung 990 Pro
Display(s)	48" LG OLED C4
Case	Corsair 5000D Air
Audio Device(s)	KEF LSX II LT speakers + KEF KC62 Subwoofer
Power Supply	Corsair HX850
Mouse	Razor Death Adder v3
Keyboard	Razor Huntsman V3 Pro TKL
Software	win11

System Name	M3401 notebook
Processor	5600H
Motherboard	NA
Memory	16GB
Video Card(s)	3050
Storage	500GB SSD
Display(s)	14" OLED screen of the laptop
Software	Windows 10
Benchmark Scores	3050 scores good 15-20% lower than average, despite ASUS's claims that it has uber cooling.

Rumor: NVIDIA RTX 3080, 3070, 3060 Mobile Specifications Detailed

News Editor

Deleted member 185088

Guest

New Member

NVIDIA GeForce GTX 1660 Ti Mobile​

NVIDIA GeForce RTX 3060 Mobile​

​

NVIDIA GeForce GTX 1660 Ti Mobile​

NVIDIA GeForce RTX 3060 Mobile​

​

NVIDIA GeForce GTX 1660 Ti Mobile​

NVIDIA GeForce RTX 3060 Mobile​

​

New Member

NVIDIA GeForce GTX 1660 Ti Mobile

NVIDIA GeForce RTX 3060 Mobile

NVIDIA GeForce GTX 1660 Ti Mobile

NVIDIA GeForce RTX 3060 Mobile

NVIDIA GeForce GTX 1660 Ti Mobile

NVIDIA GeForce RTX 3060 Mobile