Friday, April 22nd 2016

NVIDIA GP104 "Pascal" ASIC Pictured

Here are two of the first pictures of NVIDIA's upcoming "GP104" graphics processor. This chip will drive at least three new GeForce SKUs bound for a June 2016 launch; and succeeds the GM204 silicon, which drives the current-gen GTX 980 and GTX 970. Based on the "Pascal" architecture, the GPU will be built on TSMC's latest 16 nm FinFET+ node. The chip appears to feature a 256-bit wide GDDR5 memory interface, and is rumored to feature a memory clock of 8 Gbps, yielding a memory bandwidth of 256 GB/s.
Sources: ChipHell, AnandTech Forums
Add your own comment

56 Comments on NVIDIA GP104 "Pascal" ASIC Pictured

#26
FordGT90Concept
"I go fast!1!11!1!"
AMD and NVIDA are both releasing mid-upper level cards with 14/16nm. They're saving the top tier for HBM2 next year. :cry: Prices aren't going to change much. You'll get better performance per watt but that's pretty much it.
Posted on Reply
#27
vega22
same performance plateau since 2013....
Posted on Reply
#28
FordGT90Concept
"I go fast!1!11!1!"
But in this case, the plateau is intentional and internal (AMD/NVIDIA), not unintentional and external (TSMC). No one cares about HBM that much; they should be offering a GDDR5X monster.
Posted on Reply
#29
efikkan
Personally I don't care about HBM2 or not at all. I just want more GPU performance, and I trust Nvidia will be able to find a way to supply enough memory bandwidth.

GP104 will be the new mid range GPU from Nvidia, and will roughly perform on the level of today's high end. So for anyone currently owning a GTX 980 Ti this wouldn't be much of an upgrade.

I'm looking forward to the new high-end model (probably a new Titan first), which should arrive in late Q3 or Q4. There is a GP102 chip in the works so this could be it. Even if it turns out that it uses GDDR5X I will be satisfied as long as they fill it with many Cuda cores.
Posted on Reply
#30
Ruru
S.T.A.R.S.
efikkanPersonally I don't care about HBM2 or not at all. I just want more GPU performance, and I trust Nvidia will be able to find a way to supply enough memory bandwidth.
512-bit GDDR5X would be cool. Feels like 512-bit is enough, so I don't even OC my R9 290's memory at all.
Posted on Reply
#31
efikkan
9700 Pro512-bit GDDR5X would be cool. Feels like 512-bit is enough, so I don't even OC my R9 290's memory at all.
Why would you need 512-bit? A 256-bit bus will give up to 384 GB/s and 384-bit bus up to 576 GB/s which means that a 384-bit bus with GDDR5X should really be more than enough for both Pascal and Vega.
Posted on Reply
#32
BlueFalcon
efikkanWhy would you need 512-bit? A 256-bit bus will give up to 384 GB/s and 384-bit bus up to 576 GB/s which means that a 384-bit bus with GDDR5X should really be more than enough for both Pascal and Vega.
HBM provides 3 other major benefits. The first is reduced GPU die size. The memory controller of Fury X is smaller than 512-bit Hawaii's and Hawaii's was in turn smaller than Tahiti's. This can either help reduce manufacturing costs, or allow AMD/NV to use up the extra transistor/die space on making a more powerful flagship card. The second is reduced power usage. Even if HBM saves 15-30W of power over faster GDDR5X, that's 15-30W of power that can be used to improve perf/watt marketing standing or increase GPU clock speeds to improve performance. The third major benefit relates to PCB length. The compact nature of HBM will allow even smaller flagship cards or alternatively allow AMD/NV greater headroom to release flagship Vega Radeon Duo and Titan Z successors if they wanted to.

There is too much hate regarding HBM because people simply compare Fury X to 980Ti and discredit all the major benefits of HBM. Fury X's front-end bottlenecks, lack of sufficient ROPs, weak geometry engines and lower clock speeds exacerbated by lack of decent overclocking headroom, pump whine, etc. all contributed to its lackluster performance and standing against after-market 980Ti cards. If we were to isolate HBM and compare it head-to-head against GDDR5X, it's a far superior alternative, if costs permit its use.

Considering how NV was able to perform very well this generation with 256-bit/384-bit bus cards and not even rely on HBM, but now we know that GP100 uses HBM2, I am going to agree with AMD and NV engineers that if costs are not a factor, then HBM2 big Pascal and Vega are far superior options. Since NV already has GP100 with HBM2 and AMD has Vega with HBM2 on the road-map, it's pointless to debate a hypothetical 384-bit big Pascal or Vega.
Posted on Reply
#33
rruff
the54thvoidFor now, AMD is still clawing it's way back up and Intel and Nvidia are pissing about on other work.
Even if Polaris and Zen are good, AMD lacks the funds to keep refining and supporting their products like Nvidia and Intel. They've dug themselves a big hole and it's going to take a lot to get out of it.
Posted on Reply
#34
PP Mguire
BlueFalconHBM provides 3 other major benefits. The first is reduced GPU die size. The memory controller of Fury X is smaller than 512-bit Hawaii's and Hawaii's was in turn smaller than Tahiti's. This can either help reduce manufacturing costs, or allow AMD/NV to use up the extra transistor/die space on making a more powerful flagship card. The second is reduced power usage. Even if HBM saves 15-30W of power over faster GDDR5X, that's 15-30W of power that can be used to improve perf/watt marketing standing or increase GPU clock speeds to improve performance. The third major benefit relates to PCB length. The compact nature of HBM will allow even smaller flagship cards or alternatively allow AMD/NV greater headroom to release flagship Vega Radeon Duo and Titan Z successors if they wanted to.

There is too much hate regarding HBM because people simply compare Fury X to 980Ti and discredit all the major benefits of HBM. Fury X's front-end bottlenecks, lack of sufficient ROPs, weak geometry engines and lower clock speeds exacerbated by lack of decent overclocking headroom, pump whine, etc. all contributed to its lackluster performance and standing against after-market 980Ti cards. If we were to isolate HBM and compare it head-to-head against GDDR5X, it's a far superior alternative, if costs permit its use.

Considering how NV was able to perform very well this generation with 256-bit/384-bit bus cards and not even rely on HBM, but now we know that GP100 uses HBM2, I am going to agree with AMD and NV engineers that if costs are not a factor, then HBM2 big Pascal and Vega are far superior options. Since NV already has GP100 with HBM2 and AMD has Vega with HBM2 on the road-map, it's pointless to debate a hypothetical 384-bit big Pascal or Vega.
Nobody is debating what the big chips will have, some of us are saying it's not necessary to have more memory bandwidth currently on the mid-tier cards. There is too much hate on P104 equipping GDDR5 instead of X or HBM, but the fact is it's not necessary to have the higher bandwidth chips raising the costs of the midrange segment.
Posted on Reply
#35
rtwjunkie
PC Gaming Enthusiast
PP MguireNobody is debating what the big chips will have, some of us are saying it's not necessary to have more memory bandwidth currently on the mid-tier cards. There is too much hate on P104 equipping GDDR5 instead of X or HBM, but the fact is it's not necessary to have the higher bandwidth chips raising the costs of the midrange segment.
:respect: Preach it, Brother! :clap:
Posted on Reply
#36
efikkan
BlueFalconHBM provides 3 other major benefits...
Most of that is true, provided that you have a GPU which needs the bandwidth of a 512-bit memory bus or more.
BlueFalconThere is too much hate regarding HBM because people simply compare Fury X to 980Ti and discredit all the major benefits of HBM. Fury X's front-end bottlenecks, lack of sufficient ROPs, weak geometry engines and lower clock speeds exacerbated by lack of decent overclocking headroom, pump whine, etc. all contributed to its lackluster performance and standing against after-market 980Ti cards. If we were to isolate HBM and compare it head-to-head against GDDR5X, it's a far superior alternative, if costs permit its use.
No one is complaining about the benefits of HBM(1/2), but the point is that Fiji doesn't need HBM at all. AMD wasted a lot of resources on something they wouldn't need for a couple of generations. GTX 980 Ti (5632 Gflop/s, 336 GB/s) is able to outperform Fury X (8602 GFlop/s(+53%), 512 GB/s(+52%)), but in theory Fury X "should" have been 50% faster. There is no way it needs all that bandwidth when GTX 980 Ti can do without it.

And now that we have GDDR5X which is currently much cheaper than HBM, and GDDR5X on a 384-bit bus will 576 GB/s, it will still be a while before gaming GPUs really need HBM.

BTW, Fiji is not struggling from ROP performance, that kind of problem would increase with resolution or AA. It does however have enormous inefficiencies in it's scheduling, in the 30-50% range compared to Maxwell.
BlueFalconConsidering how NV was able to perform very well this generation with 256-bit/384-bit bus cards and not even rely on HBM, but now we know that GP100 uses HBM2, I am going to agree with AMD and NV engineers that if costs are not a factor, then HBM2 big Pascal and Vega are far superior options. Since NV already has GP100 with HBM2 and AMD has Vega with HBM2 on the road-map, it's pointless to debate a hypothetical 384-bit big Pascal or Vega.
HBM will replace GDDR5(X) over time. GP100 uses HBM2 because it needs the bandwidth for compute. HBM2 will still be limited in supply throuout 2016. We'll see in a few months what GP102 has in store for us, it wouldn't surprise me if it uses GDDR5X, which would be fast enough until HBM becomes cheaper.
Posted on Reply
#37
Nihilus
rtwjunkieSo.....the 1080 (or whatever it shall be) with 256-bit bus has 50% less bandwidth than the 256-bit bus of the 980? :confused: Am I understanding your complaint right?

It fills the same slot the 980 does now (upper mid-level), so I'm not sure why you would compare it to 980Ti. Is it very likely to equal or come very close to the 980Ti in performance? Yes, which is a win all around for consumers, as it will be cheaper than the current 980Ti flagship.
Usually the next generation of cards is closer to the tier above on the previous card. ie the Amd 480 is looking to match the 390. Look at the GTX 970 - it was closer to that of a 780ti! We will see if the price of the 1080 is a cheap as you can find a 980 or even lightly used 980ti.
Posted on Reply
#38
Nihilus
Masoud1980Thanks for the answer
Google Translate Translate does not forgive good
I feel certain familiar forgive or Iranians or Persians you're right?
No problem. Struggled with your last sentance, but we know you are trying. Have fun on the Forums!
Posted on Reply
#39
bug
rtwjunkieSo.....the 1080 (or whatever it shall be) with 256-bit bus has 50% less bandwidth than the 256-bit bus of the 980? :confused: Am I understanding your complaint right?

It fills the same slot the 980 does now (upper mid-level), so I'm not sure why you would compare it to 980Ti. Is it very likely to equal or come very close to the 980Ti in performance? Yes, which is a win all around for consumers, as it will be cheaper than the current 980Ti flagship.
How about we ignore specs and wait the reviews that should be available within a month or so?
My hunch is, given that both AMD and Nvidia have now access to 14/16nm processes (which are a great leap from 28nm), their new architectures will play a much greater role than simply comparing shader count and bandwidth.
Posted on Reply
#40
Ferrum Master
Those all that say that 256bit is enough are quite flawed...

Titan X cannot access all his 12GB in one cycle actually, thus making some hurdles in particular usage scenarios like rendering and and heavy data processing with lot of calculated data. For high resolutions 4-5K it will be crucial to have really wide bus, the more power it has, the more space and less latency it needs.

HBM is actually developed for server and compute needs, the gaming market comes second.
Posted on Reply
#41
efikkan
Ferrum MasterThose all that say that 256bit is enough are quite flawed...

Titan X cannot access all his 12GB in one cycle actually
If you knew how rendering works, you'd know it will never need to access all of it in a single render. The largest part of the allocated memory are object and landscape; meshes, textures, normal maps, displacement maps, uv maps and so on. All of this is pretty much static and most of it is stored in multiple detail levels (typically 4-6 levels). This means that even if you are for some strange reason rendering every object in the game in the highest resolution you will never need more than 50% of these resources. All modern games apply LoD algorithms and culling algorithms, so they usually use less than 15% of these resources in a single render. There is no game using over 25% of all allocated memory in a single render. Even with resource streaming no game will work in the way you describe, it would result either in resource "popping"(ref. Rage) or 0.4 FPS in performance.
Ferrum Masterthus making some hurdles in particular usage scenarios like rendering and and heavy data processing with lot of calculated data.
The usage of "calculated data" (like perlin noise mixed with low resolution data) has nothing at all to do with the bandwidth between the GPU and it's memory. In fact there are two ways to solve it and render giant unique landscapes: having a giant GPU memory (like 200 GB in size) or use resource streaming. The speed between the GPU and it's memory is in no way the bottleneck here.
Ferrum MasterFor high resolutions 4-5K it will be crucial to have really wide bus, the more power it has, the more space and less latency it needs.
Why? Do you actually know how much the temporary frame for a 4K render actually needs? 64 MB without AA, 256 MB with 4xMSAA (before compression). You obviously don't need hundreds of GB/s to store this data.
Posted on Reply
#42
Ferrum Master
efikkanIf you knew how rendering works
Obviously you din't understand the issue. Budget time and latency is the problem. Try enabling ray tracing in CGI scene an let the numbers roll.
Posted on Reply
#43
efikkan
Ferrum MasterObviously you din't understand the issue. Budget time and latency is the problem. Try enabling ray tracing in CGI scene an let the numbers roll.
Budget time and latency has nothing to do with it. Everyone knows calculations is the bottleneck of ray tracing.
Posted on Reply
#44
the54thvoid
Super Intoxicated Moderator
I'm backing @efikkan for a description I don't understand in my laymans terms. They seem to know exactly what they are talking about.
Posted on Reply
#45
deu
rruffEven if Polaris and Zen are good, AMD lacks the funds to keep refining and supporting their products like Nvidia and Intel. They've dug themselves a big hole and it's going to take a lot to get out of it.
Someone didnt check the internet since friday!

wccftech.com/amd-stock-52-highest-percentage-gain-listing/
Posted on Reply
#46
PP Mguire
Ferrum MasterThose all that say that 256bit is enough are quite flawed...

Titan X cannot access all his 12GB in one cycle actually, thus making some hurdles in particular usage scenarios like rendering and and heavy data processing with lot of calculated data. For high resolutions 4-5K it will be crucial to have really wide bus, the more power it has, the more space and less latency it needs.

HBM is actually developed for server and compute needs, the gaming market comes second.
That's taking a stance like the debate is on compute, these are gaming cards (especially P104, we might see Quadro variants with HBM). We don't need memory bandwidth like that for gaming. As a Titan X owner on 4k, increasing VRAM clock speed doesn't increase FPS at all negating the need for a wider bus or higher memory bandwidth. We lack the raw processing power. With P100 HBM2 will just be icing on the proverbial cake, but for P104 it's an unnecessary increase in cost that reaps no benefits.
Posted on Reply
#47
Ferrum Master
PP MguireThat's taking a stance like the debate is on compute, these are gaming cards (especially P104, we might see Quadro variants with HBM). We don't need memory bandwidth like that for gaming. As a Titan X owner on 4k, increasing VRAM clock speed doesn't increase FPS at all negating the need for a wider bus or higher memory bandwidth. We lack the raw processing power. With P100 HBM2 will just be icing on the proverbial cake, but for P104 it's an unnecessary increase in cost that reaps no benefits.
That's what i am saying. Albeit we are on the edge... 384bits is the bare minimum. I do have FPS gains upping my vram. Actualy a simple rendering benchmars, like GPU-Z have also really reacts well to the memory speed increase. And that is actually bad... Okay, our beloved valley... my card does react to vRAM OC.... see for yourself.

Posted on Reply
#48
the54thvoid
Super Intoxicated Moderator
deuSomeone didnt check the internet since friday!

wccftech.com/amd-stock-52-highest-percentage-gain-listing/
Jesus - reading all the comments following was like stepping in afterbirth. Some quite rabid fanboys on that site. Even our most avid AMD chaps here aren't that bad.

As for the stock jump - no biggie - AMD announces x,y and z and investors buy because people buy cheap stocks to gamble they will make money. It's speculative investment - most will sell just before the launch of Polaris and Zen. Also helps to announce a prospective mega bucks deal with China - makes it all look better.

Not saying these things are not good or happening but as a bit of an anti capitalist (specifically in the proliferation of huge private wealth, gained at the expense of others, with zero social distribution) I fucking hate the stock markets - they're bogus.
Posted on Reply
#49
PP Mguire
Ferrum MasterThat's what i am saying. Albeit we are on the edge... 384bits is the bare minimum. I do have FPS gains upping my vram. Actualy a simple rendering benchmars, like GPU-Z have also really reacts well to the memory speed increase. And that is actually bad... Okay, our beloved valley... my card does react to vRAM OC.... see for yourself.

That's a synthetic benchmark, which we all know doesn't really equate to real world performance. In heavy VRAM games upping my VRAM my old 980s didn't help much on 1440p and neither does upping my Titans on 4k. Even in the synthetic bench you're showing a measly 4fps up in average and in real games at 4k I show less than that with a heavy VRAM clock. Yet upping my core boost from 1300 to 1500 shows a good 10+ in games. Like I've said I think 3 times now, memory bandwidth isn't an issue. We need raw chip power.
Posted on Reply
#50
rruff
deuSomeone didnt check the internet since friday!
Yes, I'm well aware that AMD's stock went up. Check back in 5 years.
Posted on Reply
Add your own comment
Dec 24th, 2024 12:05 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts