Monday, March 28th 2016
AMD "Greenland" Vega10 Silicon Features 4096 Stream Processors?
The LinkedIn profile of an R&D manager at AMD discloses key details of the company's upcoming "Greenland" graphics processor, which is also codenamed Vega10. Slated for an early-2017 launch, according to AMD's GPU architecture roadmap, "Greenland" will be built on AMD's "Vega" GPU architecture, which succeeds even the "Polaris" architecture, which is slated for later this year.
The LinkedIn profile of Yu Zheng, an R&D manager at AMD (now redacted), screencaptured by 3DCenter.org, reveals the "shader processor" (stream processor) count of Vega10 to be 4,096. This may look identical to the SP count of "Fiji," but one must take into account "Greenland" being two generations of Graphics CoreNext tech ahead of "Fiji," and that the roadmap slide hints at HBM2 memory, which could be faster. One must take into account AMD's claims of a 2.5X leap in performance-per-Watt over the current architecture with Polaris, so Vega could only be even faster.In related news, AMD could be giving final touches to its first chips based on the "Polaris" architecture, a performance-segment chip codenamed "Ellesmere" or Polaris10, and a mid-range chip codenamed "Baffin" or Polaris11. "Ellesmere" is rumored to feature 36 GCN 4.0 compute units, which works out to 2,304 stream processors; and a 256-bit wide GDDR5 (or GDDR5X?) memory interface, with 8 GB standard memory amount. The specs of "Baffin" aren't as clear. The only specification doing rounds is its 128-bit wide GDDR5 memory bus. Products based on both these chips could launch in Q3, 2016.
Sources:
3DCenter, 1, 2
The LinkedIn profile of Yu Zheng, an R&D manager at AMD (now redacted), screencaptured by 3DCenter.org, reveals the "shader processor" (stream processor) count of Vega10 to be 4,096. This may look identical to the SP count of "Fiji," but one must take into account "Greenland" being two generations of Graphics CoreNext tech ahead of "Fiji," and that the roadmap slide hints at HBM2 memory, which could be faster. One must take into account AMD's claims of a 2.5X leap in performance-per-Watt over the current architecture with Polaris, so Vega could only be even faster.In related news, AMD could be giving final touches to its first chips based on the "Polaris" architecture, a performance-segment chip codenamed "Ellesmere" or Polaris10, and a mid-range chip codenamed "Baffin" or Polaris11. "Ellesmere" is rumored to feature 36 GCN 4.0 compute units, which works out to 2,304 stream processors; and a 256-bit wide GDDR5 (or GDDR5X?) memory interface, with 8 GB standard memory amount. The specs of "Baffin" aren't as clear. The only specification doing rounds is its 128-bit wide GDDR5 memory bus. Products based on both these chips could launch in Q3, 2016.
21 Comments on AMD "Greenland" Vega10 Silicon Features 4096 Stream Processors?
FUD I say.
Polaris (10 and 11) - May / July - Expect at least R9 390 X class performance cards for cheap. (Much like the HD2900 XT to the HD 3870 transition) R9 470 and R9 480 parts.
Vega - ( September 2016 to January 2017) I expect these to be the R9 490 X cards.
Or you could tier these one up being Polaris 11/10 the 480 and 490 class of cards and Vega be the Fiji successor.
So they will give a 15-25% performance improvement, mostly from higher frequencies and architectural changes, for 3/4 of the power consumption thanks to the 14nm/16nm process, compared to today's models and that's it for this summer.
Add to that GDDR5/X and not HBM, the rumors that Polaris remains a feature level 12_0 card and Pascal still doesn't know what Async compute is, and we already start to question if this summer's models are the cards we are waiting for and not those that will come latter.
Things really don't seem to get interesting till Vega. The last time AMD hyped up "performance per watt" we got Fury and were underwhelmed. 2.5x means squat. It's just more of that hype train trying to make Polaris look real great when its Vega folks should be more keen on.
AMD needs something to get them through till next year. Polaris will likely do. I just don't feel so bad about having to jump on a 390 before I planned to spend.
but i think they want to switch to a more nvidia style too with a tier of card between the gamers cards and firepro too.
as for greenland next year, i doubt it unless polaris fails.
Same on the NV side, if you have a Maxwell or a Pascal and only at 1080p possibly 1440p, you should be good for awhile. I think GPUs are finally falling in line with CPUs, they hitting that size wall and are more about cutting down power/energy efficiency. Sure a die shrink will help performance but they are only going to get so fast.
Developers are lazy and don't really optimize their code. Hence the ports we have been getting on PC lately. They optimize for PS4 and XBone because the hardware is static. With PC they have all the different configurations and unless either side does something to get developers off their ass and actually code for the PC hardware, it is going to be the same dog and pony show it has been for the past 6-8 years.
We've had pretty awesome hardware for awhile now and it just isn't utilized properly.
For some of us we are still on first gen I7's and FX8350's. Unless you synthetic benchmark all day, you'd be hard pressed for a difference. Hardware is starting to get to a limit and finally all this bloat and laziness with developers is going to bite them in their ass.
Now I know some studios do much better than others with their optimization and getting their products to run on various platforms but, a fair majority don't.
I am not too worried though, my 2x Nanos should be more than plenty for awhile.
It may be a smart change for AMD to target the upper mid range rather than the high-end, after the Fiji blunder where they spent all the resources on a high-end product that didn't sell well. The $300-550 market is after all the most profitable market, and this should make AMD able to cover most of the market share they can cover with their limited resources.
But in terms of demand, the demand is actually increasing at a higher rate than in the last ten years, since gamers now want higher resolutions and higher frame rates at the same time. And we are still not at the point where GPUs are "powerful enough" so game developers can achieve everything they want, and we can expect performance requirements to continue to increase for new games.
The jump from 28nm to 14/16nm is actually "two steps", except for the interconnect which is still at 20nm. So Pascal is probably going to be the largest performance gain we have seen for a long time, and we are probably not going to see a similar increase for Volta, and post-Volta. Currently a single GTX 980 Ti is still not "powerful enough" for 4K gaming, and is not close to 60 FPS in all games at stock speed. And for those who want higher frame rates, even though GTX 980 Ti is OK for 1440p, it's not enough to push 120-144 FPS in all games at stock speed. With Pascal probably increasing the gaming performance by >60%, we will still not be at 4K 120 FPS. If Volta(2018) is not going to be another shrink, then we might get as little as 20% more performance, which is not going to keep up with the demand. You are touching a very important subject. Game developers have gotten used to performance leaps every two years or so, so by the time a game is released they expect people to by more powerful hardware than it was developed on. We all know that performance gains in hardware is going to decrease over time, so writing good code is going to become increasingly important.
The gaming consoles are a big problem, which uses outdated low-end hardware. And as long as developers keep making games for these machines and porting them to PCs by cranking up the model details, they are going to continue to suck. The current API call mania (Direct3D 12/Vulkan/etc.) is not going to help the situation. Every developer knows that batching the draws is the only efficient way to render, and when doing efficient batching the API overhead is low anyway.
Game engines are using way too much abstraction to use GPUs efficiently. If the API overhead is a problem for a game, then the engine CPU overhead is going to be even larger. Doing all kinds of GPU manipulation through thousands of API calls is a step backwards. Scaling with API calls is not going to work well with the raw performance of Pascal, Volta, post-Volta and so on.
Perhaps I'm all wrong here, but what else does SOC mean?
Modern engines batch draw calls automatically if different surfaces share textures, materials or shaders. When designing optimal art for 3d games it's all about reusing stuff while making it look like you are not reusing stuff.
Optimizing on a shader level is done one time for all eternity ... essentially there is one optimal "physical" based lighting shader all games use these days with diffuse, gloss/specular, emission, occlusion, normal and displacement textures (with additional detail diffuse+normal textures on top that are visible when close up) that allows sky based global illumination. Very little optimization room left there.
All optimization on cpu side is basically how to feed gpu command queues while minimizing number of context switching on gpu.
The hidden part that can make every game look unoptimized is occlusion culling algorithm which importance is often wrongly underestimated. Too many engines are used in a way they unnecessarily draw occluded objects.
The real problem starts when devs push consoles to the limits, while very much relying on low api overhead and low latencies heterogeneous memory design allows, only to reach locked 30 fps ... then do a straight port on pci-e bus induced latencies and higher overhead api. It may be feasible for jaguar core in ps4 to directly write something in video ram every frame, doing that over pci-e on pc would introduce extra latency.
Once you start using benefits of hsa on consoles, you get less scalable port to pc simply because of the modular nature of the pc.
The opposite way would be: develop optimally for pc, then leverage use of hsa on consoles to get acceptable performance... but I'm digressing and borderline rambling It's system on chip, every soc is asic but not every asic is soc.
If you try to render a bunch of meshes in a single API call the GPU works way more efficient than if you do it by thousands of small API calls. Even an old GTX 680 is able to render millions of polygons at the screen at a high frame rate, but no game is pushing through geometry at that level due to inefficient usage of the GPU. Well, not quite. The GPU itself is way better at scheduling it's GPU threads/batches and even out the load, way better than an infinite powerful CPU could ever do. The GPUs themselves are to a large extent able to automatically cull a lot, that was actually one of the hardware improvements between GF100 and GF110.
Still both vertex shaders and compute shaders can be utilized for efficient culling. It's actually way more efficient to do the fine detailed culling on the GPU in the shader, rather than calculating it in the CPU and passing each part of a mesh as separate API calls.
You are mistaking hardware efficiency and muscle for poor coding, when its actually showing off the hardware and software improvements.