• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

NVIDIA Announces GeForce Ampere RTX 3000 Series Graphics Cards: Over 10000 CUDA Cores

A 50% perf/watt uplift from Navi puts a 5700XT class card at 3080 performance for around 300W based on some back of the envelope math.
It does, but I don't quite think 50% will be an average number, especially since unlike Nvidia AMD doesn't have the benefit of a node shrink. There's also a question of whether AMD will be willing to go big enough on their top end die. Those decisions were likely made two years ago, so it'll be interesting to see where they placed their bets.




On a different topic, after processing those massive CUDA core counts for a couple of hours I'm now wondering if Ampere is the generation where Nvidia's gaming performance/Tflop comes crashing down. No doubt they'll still be powerful, but doubling the ALUs and leaving everything else the same is bound to create heaps of bottlenecks.




DirectStorage aims to reduce IO overhead not necessarily memory requirements, 1 GB of textures are still going to be 1 GB of textures, they'll just load more efficiently. Just because an engine no longer needs to load as many things ahead of time doesn't mean the memory wont fill up with something else, in facts that's the goal, to allow for an increases in the amount of assets used.
But that's the thing, isn't it - if you load textures more efficiently, i.e. you stop loading ones you don't actually need, you inherently reduce the memory footprint as you are by default loading fewer textures. Sure, you can then load other things more aggressively, but wouldn't it then make sense to use the same JIT principle for those loads as well? And what other data is supposed to fill several GB of VRAM? Reducing the texture prefetch time from an assumed 1-2s (HDD speed) to .1s or even less (NVMe SSD speed) can lead to dramatic drops in the amount of texture data that needs to be in memory. I'm obviously not saying this will necessarily result in dramatic across-the-board drops in VRAM usage, but it's well documented that current VRAM usage is massively bloated and wasteful and not actually necessary to sustain or even increase performance.

Second, 2080 Ti have more memory bandwidth than the 3070. That's why 3070 needs a lot more Cuda Cores.
That is literally the opposite of how this works. More cores necessitates more memory bandwidth for the cores to have data to work on. That would be like compensating for your car having no wheels by giving it a more powerful engine.
 
RTX 3080 is the sweet spot.
 
I think Nvidia is somewhat reviewing the term of what is a Cuda Core with the introduction of these new shaders. I don't think it will be directly comparable to the Cuda Cores of the previous generation.

Anyway, soon we should have it all dissected.
 
It does, but I don't quite think 50% will be an average number, especially since unlike Nvidia AMD doesn't have the benefit of a node shrink. There's also a question of whether AMD will be willing to go big enough on their top end die. Those decisions were likely made two years ago, so it'll be interesting to see where they placed their bets.

That's fair, but AMD has been pretty honest about their projected performance under Su. Should be interesting!

On a different topic, after processing those massive CUDA core counts for a couple of hours I'm now wondering if Ampere is the generation where Nvidia's gaming performance/Tflop comes crashing down. No doubt they'll still be powerful, but doubling the ALUs and leaving everything else the same is bound to create heaps of bottlenecks.

There's definitely a big architectural change there that I'm interested to hear about. At a very high, naive level it seems like a move toward a more GCN-like layout, or rather like AMD and NVIDIA are converging a bit in terms of general shader design.
 
I think Nvidia is somewhat reviewing the term of what is a Cuda Core with the introduction of these new shaders. I don't think it will be directly comparable to the Cuda Cores of the previous generation.

Anyway, soon we should have it all dissected.
They are taking a page from AMD's Bulldozer and Piledriver cores days, obviously not to catch up, but to distance their lead even further. As someone already said, it's probably not easy to keep the extra ALU's fed completely, thereby losing some of the scaling.
 
When he got the 3090 out of the oven!

A nice nod to the using an oven to fix the half baked solder on the 8800 GTX.
 
You can always use TFLOPS for that, pick any two random GPUs and compare their TFLOPS ratings then the actual performance. I'll be you anything that probably 80% of the time the GPU with more TFLOPS will be faster in the real world as well. This time around Nvidia is doing something finicky with the way they count CUDA "cores", I put that in quote marks because they were never real cores (same with AMD's stream processors), it's the SM/CU that's the real "core" of the GPU. But for some reason this time around they chose to be even more inconsistent as to what that means. Probably to make it look more impressive.

Nvidia would sure like you to believe that. Shading languages don't run on specialized hardware, they can't, they need generic all-purpose processors.

The TFLOPS he was referring to was 20 TFLOPS of the 3070 compared to 13 TFLOPS of the 2080 ti. If these cards have equivalent performance, TFLOPS doesn't matter!!!

And of course they are using specialized hardware! Do you actually think they are going to waste general-purpose CPUs just to compute graphics??? And you probably know that CPUs aren't even that good at those kinds of computations. That's the reason we have GPUs in the first place. Your argument doesn't even make any sense for that reason. And to just add to that, a GPU has many thousands little processing cores that are all the same, all doing pretty much the same exact matrix computations and manipulations for those graphics. That's a far cry from what a general-purpose CPU does, to say the least. How would Nvidia even hide something like this?

The part about shading language blatantly makes no sense.
 
Last edited:
Soo, when is NDA up on reviews.

Release Day?
 
There is already a 3070 Super and a 3080 ti listed on TPU with 16 GB and 20 GB VRAM respectively! Also probably a significant performance boost. As if Nvidia already knew they'll have people complain about the VRAM. I assume they'll probably cost a premium compared to these "low" VRAM versions, unfortunately.... A big reason why 3090 cost $1500 is the 24 GB of VRAM. But who knows, NVIDIA hasn't even mentioned them yet.

I wonder where TPU gets this information.

https://www.techpowerup.com/gpu-specs/geforce-rtx-3070-super.c3675
https://www.techpowerup.com/gpu-specs/geforce-rtx-3080-ti.c3581
 
Last edited:
So the CUDA cores doing double calculations and marketing needed them too look good on paper so they double the numbers?
 
So why then are they saying that the 3070 is for 1440p it's interesting, reviews will tell all.
Because they are probably just being honest. 2080 ti was never truly a 4K card. Especially now that we have something like 3090 that will really handle 4K easily I assume.. It was even called a 4K/8K card in the presentation but I'd be very skeptical about the 8K part. Honest on one side but then dishonesty back again on the other. Classic marketing.
 
The TFLOPS he was referring to was 13 TFLOPS of the 3070 compared to 20 TFLOPS of the 2080 ti. If these cards have equivalent performance, TFLOPS doesn't matter!!!

The 2080ti has no where near 20 TFLOPS , it has about 13 TFLOPS. TFLOPS and performance are highly correlated, it's the most objective measure of performance possible whether you like it or not. Rarely do you ever come across an example counter to that general rule.

Do you actually think they are going to waste general-purpose CPUs just to compute graphics???

GPUs are general purpose. Have been since early 2000s, that's why we have programmable shaders.

And to just add to that, a GPU has many thousands little processing cores that are all the same, all doing pretty much the same exact matrix computations and manipulations for those graphics. That's a far cry from what a general-purpose CPU does, to say the least.

First of all like I said these things don't really have thousands of cores but I'm not going to go into that, the point is that the analogous of a core is the SM. They do 4x4 matrix arithmetic if you chose to program that, they might as well do something else, which they do often within shaders because they're general purpose.

That's a far cry from what a general-purpose CPU does, to say the least.

The part about shading language blatantly makes no sense.


No it's not and it makes perfect sense, you think so because you probably have never seen a shader and don't know what I am talking about.

This is some random GLSL shader I found on the internet:

1599000321069.png


A lot more than matrix multiplication huh ? It's basically C code and you can't run C on special purpose hardware, you need a fairly robust ISA and control logic just like in a typical CPU. A GPU core is very similar to a CPU core, they're just optimized differently.

How would Nvidia even hide something like this?

I don't know what you are on about, you make it sound like it's some sort of conspiracy. It's really funny.
 
The 2080ti has no where near 20 TFLOPS , it has about 13 TFLOPS. TFLOPS and performance are highly correlated, it's the most objective measure of performance possible whether you like it or not. Rarely do you ever come across an example counter to that general rule.
It's the other way around. 3070 has 20 TFLOPS and 2080 ti has 13 TFLOPS... You could just read the original post about that and figure that out by now. Even if I miswrote the correct order, TFLOPS still don't predict performance, if these two cards have very similar performance.

First of all like I said these things don't really have thousands of cores but I'm not going to go into that, the point is that the analogous of a core is the SM. They do 4x4 matrix arithmetic if you chose to program that, they might as well do something else, which they do often within shaders because they're general purpose.

No, it's not. When I (and most people) say cores on a GPU, I mean Shading Units. The 3080 has 8704 cores in this case. They all work in parallel because GPU makes use of parallel computing WAY more than CPU. That is the difference that makes the whole GPU very different from a CPU.

And that C code runs purely on the GPU? Are you so sure of that? C is run on the CPU and the CPU eventually just controls the GPU...
 
Last edited:
TFLOPS still don't predict performance.

It predicts performance incredibly well, strikingly so. I know people get angry about that but it's the truth. Size matters, or in this case TFLOPS.

It's the other way around. 3070 has 20 TFLOPS and 2080 ti has 13 TFLOPS... You could just read the original post about that and figure that out by now.

You don't get it, even if you go by Nvidia's numbers, the GPU with more TFLOPS is the faster one.


5888*2*1730 = ~20 TFLOPS

Nvidia claims the 3070 is faster than the 2080ti and guess what, the 3070 has more TFLOPS. Tada !
 
It predicts performance incredibly well, strikingly so. I know people get angry about that but it's the truth. Size matters, or in this case TFLOPS.



You don't get it, even if you go by Nvidia's numbers, the GPU with more TFLOPS is the faster one.


5888*2*1730 = -20 TFLOPS

Nvidia claims the 3070 is faster than the 2080ti and guess what, the 3070 has more TFLOPS. Tada !
It's not 50% faster as it should be if you just compare the TFLOPS! Your point still makes no sense. I'm pretty sure it'll just maybe be 10% faster if you're lucky. So yea, if you're off by 40%, that's not predicting performance. All the people I heard are saying it's going to be pretty much the same performance.
 
Last edited:
It's not 50% faster as it should be if you just compare the TFLOPS! Your point still makes no sense.

Did you hear me say it's exactly 50% or whatever ? I said higher TFLOPS means higher performance in general, which is true. I don't know why you are so reluctant to accept it.

No, it's not. When I (and most people) say cores on a GPU, I mean Shading Units. The 3080 has 8704 cores in this case.

A core needs to fetch decode and execute introductions on it's own, CUDA cores or whatever Nvidia calls them don't do that, that's just marketing. Functionally speaking the SM is the core in a GPU. Have you noticed how Nvidia never says "core" but always makes sure to write "CUDA core" ? It's because they're not really cores, they're something else. They don't even do any shading, a CUDA core just means a FP32 unit.

And that C code runs purely on the GPU? Are you so sure of that? C is run on the CPU and the CPU eventually just controls the GPU...

Yes, it runs purely on the GPU, instruction by instruction for each instance of the shader. Look man, you are clearly not knowledgeable about these things, that's fine. You can either take my word for it or look all of this up on your own.
 
Did you hear me say it's exactly 50% or whatever ? I said higher TFLOPS means higher performance in general, which is true. I don't know why you are so reluctant to accept it.
Because no one even talked about "higher means higher". The poster who I was referring to before you interjected, was questioning the 20 TFLOPS vs 13 TFLOPS..... You just don't seem to get it still.

A core needs to fetch decode and execute introductions on it's own, CUDA cores whatever Nvidia calls them don't do that, that's just marketing. Functionally speaking the SM is the core in a GPU. Have you noticed how Nvidia never says "core" but always makes sure to write "CUDA core" ? It's because they're not really cores, they're something else. They don't even do any shading, a CUDA core just means a FP32 unit.
And I never talked about Cuda cores, you're just strawmanning me again. I'm talking about Shading Units, which do show the performance. By having more, you get faster GPUs. SM aren't even the "Cores", they are just arrays of Shading Units, which do the actual work.

Yes, it runs purely on the GPU. Look man, you are clearly not knowledgeable about these things, that's fine. You can either take my word for it or look all of this up on your own.
That is garbage. CPU always works together with the GPU. CPU instructs the GPU to do things all the time. I think you are way less knowledgeable than you believe.
 
Last edited:
Because no one even talked about "higher means higher". The poster who I was referring to before you interjected, was questioning the 20 TFLOPS vs 13 TFLOPS..... You just don't seem to get it still.

I'll lay it out as simple as I can :

You said that you can't predict performance with TFLOPS, except you can, given a value you can tell with a fairly good degree of accuracy if it will be faster or not than an existing GPU. How does that not qualify as a prediction only you know.
 
There is already a 3070 Super and a 3080 ti listed on TPU with 16 GB and 20 GB VRAM respectively! Also probably a significant performance boost. As if Nvidia already knew they'll have people complain about the VRAM. I assume they'll probably cost a premium compared to these "low" VRAM versions, unfortunately.... A big reason why 3090 cost $1500 is the 24 GB of VRAM. But who knows, NVIDIA hasn't even mentioned them yet.

I wonder where TPU gets this information.

https://www.techpowerup.com/gpu-specs/geforce-rtx-3070-super.c3675
https://www.techpowerup.com/gpu-specs/geforce-rtx-3080-ti.c3581

Why is stuff like this even on the main page? The more I look at TPU the more its credibility takes a hit with me. Should Reddit speculation be siteworthy?
 
I think Navi is going to struggle catching this 3080 to be honest. AMD has yet to surpass 2070S performance convincingly, and now they're making an 80% jump ahead? Not likely, unless they make something absolutely gargantuan. But let's not dive into the next pond of speculation... my heart... :p

By the by, do we have TDPs for these Ampere releases already? The real numbers?

I am going by pure data that’s out their and what’s rumored. If big Navi has minimum double the CUs of 5700xt that will get right close to 3080 territory. There should also be other tweaks made it increase IPC and big Navi should get fairly high in clock speeds given the speeds on Xbox series X and how efficient that chip is as an APU.

I am suspecting them to compete with 3080 at minimum. Nvidia seems to have done right here pricing 3080 at 699.99. That does put amd in a tough spot and will have to under cut NVIDIA even at same speed. They would have to be faster to sell close to 699.99-750.
 
I'll lay it out as simple as I can :

You said that you can't predict performance with TFLOPS, except you can, given a value you can tell with a fairly good degree of accuracy if it will be faster or not than an existing GPU. How does that not qualify as a prediction only you know.
Because that wasn't even the thing in question... This is getting annoying to discuss with you because you obviously are trying to mischaracterize completely what I was talking about. Just stop, you missed the point. It's ok and move on. The point is that a potential 3060 could also have many more TFLOPS than 2080 ti but still be slower. That's the whole point. It just works in this case but it's still a difference of 50% more TFLOPS for pretty much the same performance on 3070, so TFLOPS again, don't reflect the actual PERFORMANCE of the GPU, as I have repeated many times to you...

Why is stuff like this even on the main page? The more I look at TPU the more its credibility takes a hit with me. Should Reddit speculation be siteworthy?
It was there for the 3070, 3080 and 3090 with all the details like this for at least a week now. And I think that was all the correct information, too. I thought that was weird as well.
 
Last edited:
It predicts performance incredibly well, strikingly so. I know people get angry about that but it's the truth. Size matters, or in this case TFLOPS.



You don't get it, even if you go by Nvidia's numbers, the GPU with more TFLOPS is the faster one.


5888*2*1730 = ~20 TFLOPS

Nvidia claims the 3070 is faster than the 2080ti and guess what, the 3070 has more TFLOPS. Tada !

I think what he meant is it’s not not as fast as tflops show. 7 more tflops is a lot if you are taking Turing tflops. So ampere you are actually getting less performance Since 3070 is not 1.7x performance of 2080ti. So it’s almost like they are like tflops of gcn where you get less gaming performance.

so yes it has higher tflops but not as fast as it shows.
 
The power and heat for 3080/3090 is really bad, but what killed these cards for me is the vram size, just pathetic. 3070 should have 12GB and 3080 16GB at this point - and everyone can try to ignore the reality as much as they want, but these cards will severely lack enough memory, both now but even more in the next few years. Consoles are getting 16GB of GDDR6 and should cost 500 USD for the entire thing, and people are going crazy for a 3070 for the same price and just 8GB. People got so used to the 2000 series ridiculous pricing that they are now blind and just see the price drops. The 2k series was nVidia true colours when AMD couldn't compete, the current pricing is a direct response to RDNA2 - the arch that will power the next 5 years of consoles and the arch that will receive the most optimizations from developers - again, because everything is made to/for consoles where the bulk of gamers are, and then ported to PC. nVidia is scared of becoming another Intel, and I'm loving all of this, since the new Radeons will definitely be more power efficient, age better, apparently have more vram, and now are limited by nVidia prices!
 
how do you all like my new sig? never thought i'd see the day... LMAO
It's dumb. Plenty of us with 2080tis will just keep them and put them in our other rigs. Plenty of people who bought 2080tis can afford another video card that's just as expensive. Guess who will be buying 3090s? A lot of the same people who bought 2080tis. Personally I'll probably exchange my 2080ti for a 3080 since I don't think I need a 3090.
 
Back
Top