Monday, July 25th 2016

NVIDIA Accelerates Volta to May 2017?

Jul 25th, 2016 02:47 Discuss (102 Comments)

Following the surprise TITAN X Pascal launch slated for 2nd August, it looks like NVIDIA product development cycle is running on steroids, with reports emerging of the company accelerating its next-generation "Volta" architecture debut to May 2017, along the sidelines of next year's GTC. The architecture was originally scheduled to make its debut in 2018.

Much like "Pascal," the "Volta" architecture could first debut with HPC products, before moving on to the consumer graphics segment. NVIDIA could also retain the 16 nm FinFET+ process at TSMC for Volta. Stacked on-package memory such as HBM2 could be more readily available by 2017, and could hit sizable volumes towards the end of the year, making it ripe for implementation in high-volume consumer products.

Source: WCCFTech

Add your own comment

102 Comments on NVIDIA Accelerates Volta to May 2017?

#51

bug

the54thvoidOh dear, I made a terrible mistake, I see that.

The GTX 1080 and 1070 only have 7200 transistors compared to Fiji's 8900. That's a nut busting 24% increase. Fury X has 4096 shaders to 1920 on the GTX 1070 (a 113% increase in hardware). Same ROPS. Higher bandwidth.

So, you're happy that a card with the hardware prowess of the Fury X can only match the paltry hardware inside a GTX 1070? That's not impressive. Do you not see? That's worrying on AMD's side. How can the GTX 1070 even stand beside a Fury X in DX12?

I bet he owned a Prescott at some point.

#52

PP Mguire

iOSo they havent even released their ultra expensive, huge die P100 to lower priority customers and someone comes up with the idea that it somehow would be economically viable to release the next gen within less than a year?!

Also gotta love how those threads always derail....

Sure, due to the fact that they're contracted to ship these chips by end of year next year. Doesn't necessarily mean we'll have Volta in our PCs by then.

#53

GhostRyder

Doubtful this will change much in decisions since we have no idea what cards will even be released by that point... Sure as heck has not changed my mind about purchases this year.

Either way, we will have to see what consumer versions come from this. I doubt there will be much soon...

#54

HD64G

the54thvoidOh dear, I made a terrible mistake, I see that.

The GTX 1080 and 1070 only have 7200 transistors compared to Fiji's 8900. That's a nut busting 24% increase. Fury X has 4096 shaders to 1920 on the GTX 1070 (a 113% increase in hardware). Same ROPS. Higher bandwidth.

So, you're happy that a card with the hardware prowess of the Fury X can only match the paltry hardware inside a GTX 1070? That's not impressive. Do you not see? That's worrying on AMD's side. How can the GTX 1070 even stand beside a Fury X in DX12?

If I were an AMD fanboy with a brain, I'd be worried that for all my posturing about DX12 and async, the mid range Pascal chip (GTX 1080) with the poorer async optimisations humps my beloved Fury X. I'd be worried that AMD's next step has to be a clock increase but to do that the hardware count has to suffer (relatively speaking). I'd be worried that Nvidia's top end Pascal consumer chip (GP102, Titan X) has 1000 more shaders than the GTX 1080 (which already beats everything at everything).

Your optimism is very misplaced. But that's okay, we need optimism in todays world.

You like to be objective eh? Why don't you put into the equation of the comparison the difference in clocks then? 1070 is getting to 1900MHz when gaming and Fury X reaches 1050MHz. Compare again now...

As for Vulcan, it is built on the same principles as DX12, it is just not only for Windows 10, so it is more independent to the market and thus, more objective.

And all people have brains (fanboys or not). Intelligence is another topic though...

#55

Assimilator

Bleh, willy-waving over die size vs number of transistors vs shader counts vs ROP counts vs core clocks vs memory clocks is just that, because both companies' architectures are so different. It's pretty obvious that just as in its CPUs, AMD has chosen to scale out its graphics architecure (more shaders) while NVIDIA has chosen to scale up (more clock speed). Both approaches have their upsides and downsides, but I feel that NVIDIA's approach has given it the edge, because extracting maximal performance from parallelism is a really really difficult problem, one that DirectX 12 isn't likely to solve in and of itself.

#56

the54thvoid

Super Intoxicated Moderator

HD64GYou like to be objective eh? Why don't you put into the equation of the comparison the difference in clocks then? 1070 is getting to 1900MHz when gaming and Fury X reaches 1050MHz. Compare again now...

As for Vulcan, it is built on the same principles as DX12, it is just not only for Windows 10, so it is more independent to the market and thus, more objective.

And all people have brains (fanboys or not). Intelligence is another topic though...

The hardware in AMD chips limits clock speeds. The current i7 enthusiast chips demonstrate that very well. Moar Cores = lower frequency. It's a fact that Nvidia dropped a lot of 'chip hardware (shaders etc) in favour of a leaner, more efficient and way faster chip. This is why these discussions about - just you wait, DX12 and AMD are going to win are futile. AMD will not stomp all over Nvidia - they will at best, achieve parity - which is very good. The reason the GTX1080 (GTX 9 freaking 80 replacement) is so expensive is because AMD have nothing to match it.

This is the 7970 versus 680 all over again, except this time Nvidia have screwed the mother load with pricing of their chips. Don't get me wrong - I hate the pricing of GTX 1080. I am not a fan of this wallet shafting policy but Nvidia know AMD has nothing to match it. Not even on DX12, not for their GP104 and certainly not for GP102.

And again, Vulkan is great for AMD - the hardware inside their architecture gets to shine but it bloody well should. AMD's 'on paper' stats should have them all over Nvidia but their lack of DX11 prowess in favour of Unicorn chasing DX12 has let them down for 2-3 years. And now DX12 is sort of coming along (because let's face it - it's not really anywhere near replacing DX11) it means Nvidia can build it;'s own Unicorn stable and call it Volta (I'd prefer Roach).

Vega cannot have multiples of shaders and ACE hardware AND be as fast as Pascal. Sacrifices will be made. Hell, look at GP Titan X - it's already about 200Mhz slower than GTX 1080 as it has 1000 more cores.

Unfortunately, the other glaring issue with AMD is that they absolutely need to make a developer adopt an abundance of their suited DX12 features. Nvidia can use DX12 just fine but if Nvidia help develop a game, they're not going to 'allow' a full utilisation of AMD hardware - like it or not. Is it shit? Yes. Is it business? Yes. The next big 'real' game that isn't a tech demo or small release is Deus Ex. I love those freaking games. It's AMD sponsored. It will be very good to see how that runs. Bearing in mind, I'm still DX11 bound, it's meaningless to me anyway but if Nvidia's Pascal runs that game fine, that will give you a good idea of the future of DX12.

The reason I get so ranty is I'm pissed off AMD haven't come to the table with a faster chip than Fiji. We now have GTX 1070/1080/Titan X from Nvidia and very little back from AMD. A sneeze with Polaris - great for 1080p/1440p but not future looking for 1440p. I would buy a Fury X but what's the point? My card is way faster than stock 980ti so my card is also way faster than Fury X, especially in my native DX11. Even in DX12 my card's clocks make it a GTX1070 match.

Meh, rant over.

#57

ViperXTR

Fury X vs 1070

Fury X
Shading Units: 4096
TMUs: 256
ROPs: 64
Compute Units: 64
Pixel Rate: 67.2 GPixel/s
Texture Rate: 268.8 GTexel/s
Floating-point performance: 8,602 GFLOPS
Memory Size: 4096 MB
Memory Type: HBM
Memory Bus: 4096 bit
Bandwidth: 512 GB/s

GTX 1070
Shading Units: 1920
TMUs: 120
ROPs: 64
SM Count: 15
Pixel Rate: 96.4 GPixel/s
Texture Rate: 180.7 GTexel/s
Floating-point performance: 5,783 GFLOPS (up to ~7000+ GFLOPS if 1900Mhz)
Memory Size: 8192 MB
Memory Type: GDDR5
Memory Bus: 256 bit
Bandwidth: 256.3 GB/s

GTX 1070 is basically obliterated here by the Fury X (aside from Pixel filtrate and Memory amount)

Also, indeed were out of topic already

#58

Unregistered

Looks like my theory was right. NV is preparing a monster structure which will render even latest GCN obsolete in terms of Async Computing. The history of Tesslation may repeat itself soon.

#59

Unregistered

By the time NV completely shifts to full on Async, both Pascal and GCN will be done. GCN based cards my be relavent for a little bit longer though. In the end who commends developers controls the market, and Nvidia has never failed to tighten its grasp over developers.

#60

Slizzo

RejZoRThat's like saying GTX 980 has async compute then. It actually does. At so small queues they make no practical use. What good is saying "yes, we have Async" which then does basically nothing. Like, lol?

Maxwell has a weak implementation of async compute, and the fact that nVidia hasn't enabled it on the 9 series GPUs in TimeSpy points to the fact that it's implementation is likely flawed.

Async compute IS enabled for Pascal, and it DOES show performance improvement. I think the holding on the the straws of Maxwell having async compute should stop. It doesn't have it, in any way that nVidia cares to support it...

Pascal, for me, shows improvement with Vulkan enabled. I have a good 10+FPS boost when I have it enabled, so I will keep it enabled. Even if async compute isn't yet implemented on Pascal in Vulkan on Doom.

#61

the54thvoid

Super Intoxicated Moderator

xkm1948By the time NV completely shifts to full on Async, both Pascal and GCN will be done. GCN based cards my be relavent for a little bit longer though. In the end who commends developers controls the market, and Nvidia has never failed to tighten its grasp over developers.

All very true. Though I imagine AMD's successor to GCN will maintain the ACE hardware (it's very good).
And yes, all arguments about DX 'anything' can take a back seat to one sided game development.

#62

FordGT90Concept

"I go fast!1!11!1!"

SlizzoPascal, for me, shows improvement with Vulkan enabled. I have a good 10+FPS boost when I have it enabled, so I will keep it enabled. Even if async compute isn't yet implemented on Pascal in Vulkan on Doom.

It is used for TSAA in DOOM. 8x TSAA, on a card like Fury X with lots of idle shaders, is practically zero-cost in terms of framerate.

Pascal improves async compute functionality compared to Maxwell (which facepalms when trying) but NVIDIA's implementation is still behind AMDs. Then again, async compute is only beneficial when the card has idle hardware.

#63

Unregistered

FordGT90ConceptIt is used for TSAA in DOOM. 8x TSAA, on a card like Fury X with lots of idle shaders, is practically zero-cost in terms of framerate.

Pascal improves async compute functionality compared to Maxwell (which facepalms when trying) but NVIDIA's implementation is still behind AMDs. Then again, async compute is only beneficial when the card has idle hardware.

I bet Volta will come with >8192 ALUs, which will make it perfect for Async Compute. Unfortunately then the older cards with few ALU will all be rendered useless with Volta optimized games and applications. Which is why I say Tessellation history will soon repeat itself.

#64

HD64G

the54thvoidThe hardware in AMD chips limits clock speeds. The current i7 enthusiast chips demonstrate that very well. Moar Cores = lower frequency. It's a fact that Nvidia dropped a lot of 'chip hardware (shaders etc) in favour of a leaner, more efficient and way faster chip. This is why these discussions about - just you wait, DX12 and AMD are going to win are futile. AMD will not stomp all over Nvidia - they will at best, achieve parity - which is very good. The reason the GTX1080 (GTX 9 freaking 80 replacement) is so expensive is because AMD have nothing to match it.

This is the 7970 versus 680 all over again, except this time Nvidia have screwed the mother load with pricing of their chips. Don't get me wrong - I hate the pricing of GTX 1080. I am not a fan of this wallet shafting policy but Nvidia know AMD has nothing to match it. Not even on DX12, not for their GP104 and certainly not for GP102.

And again, Vulkan is great for AMD - the hardware inside their architecture gets to shine but it bloody well should. AMD's 'on paper' stats should have them all over Nvidia but their lack of DX11 prowess in favour of Unicorn chasing DX12 has let them down for 2-3 years. And now DX12 is sort of coming along (because let's face it - it's not really anywhere near replacing DX11) it means Nvidia can build it;'s own Unicorn stable and call it Volta (I'd prefer Roach).

Vega cannot have multiples of shaders and ACE hardware AND be as fast as Pascal. Sacrifices will be made. Hell, look at GP Titan X - it's already about 200Mhz slower than GTX 1080 as it has 1000 more cores.

Unfortunately, the other glaring issue with AMD is that they absolutely need to make a developer adopt an abundance of their suited DX12 features. Nvidia can use DX12 just fine but if Nvidia help develop a game, they're not going to 'allow' a full utilisation of AMD hardware - like it or not. Is it shit? Yes. Is it business? Yes. The next big 'real' game that isn't a tech demo or small release is Deus Ex. I love those freaking games. It's AMD sponsored. It will be very good to see how that runs. Bearing in mind, I'm still DX11 bound, it's meaningless to me anyway but if Nvidia's Pascal runs that game fine, that will give you a good idea of the future of DX12.

The reason I get so ranty is I'm pissed off AMD haven't come to the table with a faster chip than Fiji. We now have GTX 1070/1080/Titan X from Nvidia and very little back from AMD. A sneeze with Polaris - great for 1080p/1440p but not future looking for 1440p. I would buy a Fury X but what's the point? My card is way faster than stock 980ti so my card is also way faster than Fury X, especially in my native DX11. Even in DX12 my card's clocks make it a GTX1070 match.

Meh, rant over.

Agreed at most of what you wrote here.

Strategy of AMD needed to focus on market share and only way to achieve gains there was to get low to mid priced GPUs out 1st. 460, 470 and 480 will do just that. Is it shit? Yes for guys like you who wait for the new flagships from both companies to allow price war to help you get the best in vfm. No for 80% of whoever is in search for their next GPU to play @1080P though. And since we don't know Vega's core size (HBM2 ommited from that size), we cannot calculate if it reaches or surpass 1080 performance. So, it might do the trick and help all in search of better and cheaper high end GPUs as they already achieved this for the ones who want a worthy GPU for less than $300. Volta is too far ahead to affect anything in 2016-2017 market after all. Let's hope DX12 and Vulcan get adopted sooner than later for the benefit of all, as better (more advanced in physics and graphical fidelity) games in several aspects will come out of that change. :toast:

#65

Unregistered

In the essence of DX12/Vulkan is to remove as much hardware constrains from software developers as possible. This will be perfect for consoles and mobile segments since they don't need new GPU every 6 months. DX12/Vulkan will make GPU life span a lot longer. Capable GPU with tons of compute units can last a lot longer than flagship GPUs of current gen. In the end it will be good for us consumers. While buying new cards are fun, nobody likes to see their hard earned money go obsolete in a mere 6 months ~ 1 year.

#66

RejZoR

No. Async has nothing to do with "idle" shaders. That's like saying a V12 Ferrari will be faster with 8 idle cylinders lol, makes no sense. It has everything to do with ability to do several tasks in parallel at once, how good is the scheduler and caches as well as the multithreading engine. Clearly, the one in GTX 1080 is not even virtually as good as the one in Radeon graphic cards. Which is not a surprise since AMD has been doing async since HD7000 and NVIDIA just now got half working async engine...

Synchronous rendering is when you have 1 graphics rendering thread and all graphics compute tasks are executed in sequence. It's what we have with D3D11 and older. Here, it was all about utilizing available shaders as effectively as possible, that's why they were always chasing that sweetspot of not wasting GPU die size on stuff you can't possibly use efficiently for a single threaded rendering. All hardware capabilities that weren't used were essentially wasted.

Asynchronous is when you can split up your rendering workload into several rendering threads and compute them in parallel. You can split it up whichever way you like it for as long as that makes it beneficial to either coding or rendering performance. Of course, more shaders will mean you can do more things in parallel before you stuff them all to 100% and they can't accept more workload.

#67

swirl09

I don't get the fuss, the time between gens has typically been just over a year and under a year and a half, with the exception of this most recent gap. So unless the 1180 is on the shelf next May, it's just business as usual.

#68

FordGT90Concept

"I go fast!1!11!1!"

RejZoRNo. Async has nothing to do with "idle" shaders. That's like saying a V12 Ferrari will be faster with 8 idle cylinders lol, makes no sense.

The better analogy would be GM's V8 cylinder deactivation. Fury X, in a lot of games, runs like a V4 because the graphics pipeline isn't saturated enough to fill all of the shaders. When enabling TSAA or other async workloads, it puts most of the shaders to work like a full V8.

My understanding is that Pascal doesn't actually do async but they fixed the scheduling problem so that Pascal can rapidly change task instead of waiting for the lengthy pipeline to clear. This change allows it to get a 5% performance boost where AMD sees 10%.

There's two things going on here: async shaders and scheduling. Scheduling involves interrupting the graphics queue to inject a compute task (Pascal does this). Async involves finding idle hardware an utilizing it (Pascal doesn't do this). Both are complex and both are important.

Edit: Don't believe me? Believe Anandtech:

Ryan SmithMeanwhile not shown in these simple graphical examples is that for async’s concurrent execution abilities to be beneficial at all, there needs to be idle time bubbles to begin with. Throwing compute into the mix doesn’t accomplish anything if the graphics queue can sufficiently saturate the entire GPU. As a result, making async concurrency work on Maxwell 2 is a tall order at best, as you first needed execution bubbles to fill, and even then you’d need to almost perfectly determine your partitions ahead of time.

#69

ppn

GTX 1180 has to be at least 3200 Cuda, so unless they use gimped GP102 which will not be optimal. HBM2 2x4GB Volta is imminent.

#70

FordGT90Concept

"I go fast!1!11!1!"

"3,584 CUDA cores at 1.53GHz" for Titan X. I suspect Volta will be similar if not the same with a different memory controller.

#71

danyearight

techy1ok... great news... but why? what is financial reasons for this... now they have midrange cards (1070, 1080) for incredible high prices and sold out as soon as those hit the store.... it would be financial reasonable to keep it that way for as long as possible.. unless they know :O

Oh and here me and everybody else though the 1080 and 1070 outperformed the last generation I guess since you say they are only mid range cards and you know it best so tell us what is more powerful then?

#72

rtwjunkie

PC Gaming Enthusiast

danyearightI guess since you say they are only mid range cards and you know it best so tell us what is more powerful then?

Nothing right now. They are still mid-range cards built on a mid-range chip, GP104. It has always been thus: New midrange outperforms previous gen high-range. and the cycle goes on and on.

#73

jabbadap

HumanSmokePrecisely. Once Nvidia decided to not wait around for 10nm and commit Volta to 16nmFFC production there probably wasn't any good reason not to go ahead with the HPC orders since it ties in with IBM's POWER9 schedule.

Yeah, and don't forget: Volta has been on nvidia's official roadmaps longer than Pascal(since 2013). I kind of believe it will be quite major architecture change, Pascal was just little bit updated maxwell on 16nm with higher clocks.

#74

deu

rtwjunkieHey smarta**, I looked it up too, just not this morning. So I did pretty good getting it right after reading a tiny little blurb about it several weeks ago. Next time, look your facts up first before you roll out the PR machine.

And, for the record, I despise PR machine posts from both sides.

I dont get your point; Are you mad because so many people dont know those facts or the fact that I told them that? I see no problem in telling people the facts. I could do on about all the good things about NVIDIA as well but then I would be a fanboi from camp green. Just because you people got a red /green war going doesnt mean that you can just put people on teams. I argue for and against but lately this site have been swarmed with outright shi**y comments about bashing AMD. That would be ok if the claims where facts. But ALOT of them are not! I saw a dude state AMD had double the powerdraw on 480 as in 1060 and shitty performance. I saw people saying complain about AMD being in the shi**er when they are at their best in 3 years. I will discuss anything and take any argument for and against any company as long as they are substatiated by facts and knowledge.

#75

RejZoR

FordGT90ConceptThe better analogy would be GM's V8 cylinder deactivation. Fury X, in a lot of games, runs like a V4 because the graphics pipeline isn't saturated enough to fill all of the shaders. When enabling TSAA or other async workloads, it puts most of the shaders to work like a full V8.

My understanding is that Pascal doesn't actually do async but they fixed the scheduling problem so that Pascal can rapidly change task instead of waiting for the lengthy pipeline to clear. This change allows it to get a 5% performance boost where AMD sees 10%.

There's two things going on here: async shaders and scheduling. Scheduling involves interrupting the graphics queue to inject a compute task (Pascal does this). Async involves finding idle hardware an utilizing it (Pascal doesn't do this). Both are complex and both are important.

Edit: Don't believe me? Believe Anandtech:

Fast switching is not async. Async is simultaneous processing of two or more workloads at the exact same time. Everything else is pretending to be something it is not. The "idle" shaders is nonsense. Why R9 Fury starts to fly with heavy workload, especially with TSAA and other stuff is because instead of forcing TSAA within the existing rendering thread, making it longer and choking performance, it's actually running in parallel with the usual rendering thread. R9 Fury has the hardware grunt which has been underutilized for all this time. That's the whole point of async. And delivering sufficient compute performance via shaders is exactly the same as today. You simply need to have enough of them to process something. When you run out o them, performance starts to suffer. It has nothing to do with "idle" shaders. It's just utilization as we know it now. Except it's rather inefficient in D3D11 where shaders indeed idle, that's why they had to balance things perfectly depending on game trends and what kind of workloads are expected from game engines. With async, you can basically throw more of everything into a chip and it will perform better exponentially. For as long as the code is written to utilize it.

Add your own comment

NVIDIA Accelerates Volta to May 2017?

102 Comments on NVIDIA Accelerates Volta to May 2017?

Latest GPU Drivers

New Forum Posts

Popular Reviews

TPU on YouTube

Controversial News Posts

NVIDIA Accelerates Volta to May 2017?

Related News

102 Comments on NVIDIA Accelerates Volta to May 2017?

Latest GPU Drivers

New Forum Posts

Popular Reviews

TPU on YouTube

Controversial News Posts