• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.
  • The forums have been upgraded with support for dark mode. By default it will follow the setting on your system/browser. You may override it by scrolling to the end of the page and clicking the gears icon.

AMD Vega Microarchitecture Technical Overview

VSG

Editor, Reviews & News
Staff member
Joined
Jul 1, 2014
Messages
3,726 (0.94/day)
At the dawn of its Radeon RX Vega family launch, we visit its underlying GPU architecture, which AMD touts as the most advanced and future-ready, in a bid to find out if the company has overcome its longstanding architectural weaknesses while building on its strengths.

Show full review
 
Last edited by a moderator:
...They demonstrate a use case of Rapid Packed Math using 3DMark Serra- possibly a yet unannounced Futuremark benchmark...

"Serra" - italian or portuguese?
 
"Serra" - italian or portuguese?

Funny thing is, no one seems to know what 3DMark Serra is. I emailed Futuremark to see if they can share any details, but best guess is it is an upcoming benchmark or stability testing tool.
 
The best thing is serra is the Italian for greenhouse. Things might get warm
 
FYI, I contacted Futuremark about this and got the following: "Thanks for reaching out to ask about Serra. It is actually a custom demo we created for AMD to highlight the benefits of their technology. Serra might appear within 3DMark at a future date as a feature test. However we have no official plans or timing for it at the moment."
 
its more gcn with tacked on 'features' that no developer will take the time/want to use

and coming with that is a whooping 300 to 400watss of power consumption
 
I'm curious if anyone looks at this similarly as myself.

I see Volta (likely 896/1792/3584sp @ higher clocks) as one option, but where is nVIDIA going to go after that? I think it's a serpentine line to more-or-less exactly what AMD is doing with more costly R&D in-between. It's great that they can afford that, and that consumers can reap those specialized benefits, but I don't see it being the end-all of the matter.

Clock scaling is going to end for nvidia's current methodology (think essentially half that of a typical CPU; an 1800-1850mhz GPU being similar efficiency as a 3600-3700mhz CPU...with clock scaling up to ~2.34ghz [similar to ~4.6-4.7ghz CPU]), just as it has with countless other architectures. We have seen it, and will likely continue to see it (just as Intel has been stuck at ~4.3-4.7 for generations). We've seen Apple nary increase the clock of the A9x to A10x...Heck, we've even seen ARM completely revamp A53 to A55...EXCEPT IN THAT CASE it uses pretty much the exact same clocks but more transistors for greater power efficiency. The difference for the last being that it's a more high-density arch that's optimized closer to ~1/3 ratio (just like Polaris or Vega).

When the point arrives that nvidia transitions to 7nm, odds IMHO are pretty safe ~1792/3584sp designs will be shrunk down in an obvious manner, while what we've seen on V100 will evolve into something twice that of GP102/GV104. The problem there-in lies for them to make sense for each market (~75/150/225/300w) core architecture (more xtors) use will have to play a larger role than clockspeed as power consumption for higher clocks likely won't fit into those parameters because it doesn't scale nearly as well.

Hence, we see something like the Snapdragon 820/821->835 (although the same things could be said of the Apple A series).

High power cores ran at 1800mhz for power efficiency (like Pascal). The tuned version ran at 2.15ghz (like Pascal overclocking or probably like Volta stock). The maximum performance version was 2.34ghz (which is probably where Volta will overclock).

On 10nm, that clock was only raised to 2.45ghz (2.38 for Apple), implying a fairly hard wall was hit in terms of scaling at efficient perf/w. It makes sense; it's not far off half clock of the exact same wall Intel hit.

OTOH, low power cores are a different story. While the power-efficient cores on 820 were 1363mhz (like Polaris), and tuned to 1593mhz on 821 (Vega), ON 10nm WE SEE THEM SCALED TO 1900MHZ. A much lower ratio between the two...and I imagine that will shrink even further on 7nm as higher-density cores may clock even slightly better while larger cores stay more-or-less stagnant.

I don't know how else to demonstrate this should be one of those 'a-ha' moments for why high density/more fixed units/parallelism is key, and preparation for it's inevitability isn't a terrible idea. AMD has done it with Zen, and is doing it again with Vega.

In the same way one can look at the efficient ~3.5-3.8ghz all-core aim for Zen (or 1/3 of that for Polaris/Vega) and clearly see a way forward to (at least boost clocks of) ~5ghz on 7nm for the CPUs, one could look at Vega and see a way forward to maximizing the use of 1ghz (say equal to a 1080/1170 and good for 4k30) or 1.2ghz (4k60) HBM2 in perhaps 150/225w power envelopes.

I'm not going to sit here and defend Vega as it sits; obviously nVIDIA made the best call for THIS node size (and will likely continue it to 12nm). The shader/rop/voltage/clock capability/power consumption/transistor library etc ratios just make more sense at this juncture. AMD went all-out on density and high compute and it straight-up didn't work because they were some combination of too poor to completely revamp the design or too early for it to make sense.

That said, and obviously this is simply only my opinion, AMD has laid a groundwork for both CPUs and GPU designs that should both transition incredibly easily to 7nm while exploiting them to the fullest (without requiring a completely new arch right away)...from the fab that is cranking out high performance parts on that process first...and that ain't for nothing. Sure, they could have made Zen with less cores and higher clock (and perhaps IPC) on 14nm and straight up lost to Intel on Intel's ground. Instead, they did something different that capitalized on the current process' strengths with an eye toward the very near future not only for application (of more cores) but scalability (7nm). Sure, they could have made Vega with less CUs and higher clocks and lost to nvidia because 14nm is less performative (but inherently more dense) process than TSMC's 16nm...but they didn't. Instead they buckled down, got their architecture advancements in place, and made the best of the current situation with a distinct way forward in the next 6 months to a year that will likely only be equaled from nVIDIA two generations from now.

I don't mean to sound apologist, but when I see Intel (with their higher IPC/clockspeed potential designs) scaling clocks back on Coffee Lake to EXACTLY where AMD aimed their core CPU design on a similar but less performative process, I laugh. When I read stories that Intel is having problems scaling clocks past the exact point they've been stuck (for max 'normal overclocking' clocks) forever on 10nm, I laugh.

When I foresee nVIDIA essentially recycling their arch to save them money (can't we all see a ~200mm2 32 ROP 1792sp/~2100+mhz design replacing 1070 and a 400mm2 64 ROP 3584sp/2100mhz+ design being sold at WAYYY too much money to get those last extra 4k60 frames) and calling it ingenious circumvention of Moore's Law instead of planned obsolescence of their former architecture to perpetuate constant sales, I cry.

When I look at Zen and Vega, I see potential. I can understand where they're trying to go with this and how they currently had to make the best of a less-than-ideal situation (with regards to process and likely R&D budget), even if the first iteration isn't perfect. Maybe Zen 2, even if it clocks well, will only catch up to the overall performance of Skylake per core (after adjusted clocks)...but that's okay. Their R&D can take the extra time they need to pump up the arch over the long road of 7nm DUV and EUV for Zen 3, allowing strengths in different areas than their competition on the road to perhaps eventual parity, while in the meantime adding healthy competition or better pricing. Likewise, maybe a shrunken Vega only at most competes with GV104 (at half die size and probably more than half power), or even it's castrated version (2688sp?), but that too is okay. The potential is there for awesome 4k30/4k60 parts that are better/cheaper than 1080, more performative than any potential 1792sp design that would likely be of similar size (and if nvidia has their way, cost) and could perhaps even scale to 2x within 300/375w for our eventual 4k60+/4k120 hdmi 2.1 future.

In short, AMD's designs make me hopeful.

And really....Is that so bad?
 
Volta V100 supposedly has 5120 CUDA cores. It's huge at 815 mm2 versus Vega's 484 mm2. Volta will only show up in Titan-level cards. It's too costly to produce to offer it in GeForce branding (it would have to be a down-sized model of the architecture). Volta appears to be specifically for high paying compute customers (deep learning, GPGPU workloads).

On the other hand, I wouldn't be surprised if Volta is a one trick (compute) pony so smaller variants won't be made.


Max clock speeds are determined by the pipeline length. Longer pipes give way to higher clock speeds. GPUs, up until Pascal and Vega, had short pipes (lots of work in few clocks).


Maxwell was new from the ground up. Vega is still GCN which has been around for many years now.
 
Last edited:
I suspect post volta we will see a new groundup-arch from team green
the reason we haven't seen one yet is there hasn't been a need for one
nvidia has been able to scale performance and power consumption enough to stay competitive

I really don't understand AMD. GCN in any form just does not scale well they need to stop
..
and make something better
 
Vega represents the largest change GCN has ever seen.

What we know now is that, as AMD said, Frontier Edition doesn't reflect the performance of the consumer card in gaming. I don't think we have any gaming benchmarks of the final product except the ones AMD had specifically made for Vega.
 
Vega represents the largest change GCN has ever seen.

What we know now is that, as AMD said, Frontier Edition doesn't reflect the performance of the consumer card in gaming. I don't think we have any gaming benchmarks of the final product except the ones AMD had specifically made for Vega.
but it does reflect the power consumption increase and thats what I ment by doesn't scale
you can add GCN compute units for days and get MOAR speed but you get MOAAAAAAAAAR power consumption

amd needs to do-more-with-less approach instead of throwing Cu's at the problem
 
Actually, it does. Power saving circuitry that will be enabled in the consumer card is not enabled in Frontier Edition.

Vega has the exact same number of compute units as Fiji did (4096). Vega is significantly smaller than Fiji because of the process difference and removal of two HBM stacks.

[conspiracy]Someone toyed with the idea that maybe AMD was mining cryptocurrencies with the Frontier Edition before boxing them up and shipping them out. It makes too much sense especially if AMD compiled mining software utilizing the mining-specific instructions in Vega. Mining cards run full throttle (no point in power saving circuits) and the DSBR is also worthless for mining. Both of these things were confirmed to be disabled in Frontier Edition.[/conspiracy] In gaming, Frontier Edition basically behaves like a die-shrunk, overclocked Fiji, and it really did.
 
#define power-saving-circuits because there are only a handful of ways to reduce power consumption and nearly ALL of them involve some kind of performance hit

do get improvement without a large performance hit you need todo more then just fiddle with clock speeds you need to increase the efficiency of the core less wasted operations = better power consumption
 
arch-37.jpg
 
so it reduces speed and downclocks the IF buss ...

and that explicitly says idle-power-consumption

which doesn't do anything for those 300 to 400W load gpu-only draws
even 250 would be boarder-line unacceptable for the performance you get

so you are telling me they are going to loose over 120W and still complete with the 1070?
BULL
 
Last edited:
I see 64 CUs and 50...circuits...highlighted. Mention of Infinity Fabric too. I don't think it's stretching to believe this power management microcontroller can cut power to idle cores (not unlike CPUs), responding directly to GPU load. Chips as wide as Vega can really benefit from that, especially gaming at low resolutions. Vega may be capable of running on just 14 CUs.

And think about it: Infinity Fabric, in GPU? This is the framework for Navi to work on. They link the bus together in APU, in multiple PCIE slots, and as far as the operating system is concerned, you got just one really wide GPU, just like how AMD manages to cram 32 CPU cores into 4 MCM'd chips.
 
Last edited:
in its 56cu config its still 300W
so what does turning CU's off or clocking them down do for power consumption when you are gaming at 4K
which is where this card is priced
 
so lets assuming they release a 'gamer' variant with _only_ 48cu enabled

so now we are scrapping 250W and its edging out a 1060 beating it handily only at high res

the math doesn't add up don't fall for the pr bullshit math is math performance is(mostly) a function of CU X ROPs X clock-speed

once you know the base-line numbers for a given config its pretty easy to extrapolate a ball-park where it will perform
 
I don't know how fast they can trigger but keep in mind that rendering occurs in waves. At the beginning of a new frame, GPUs have a lot of work to do but it falls off until just a few CUs are putting the final touches on the frame before transitioning to the next. In that period, it's possible for the idle CUs to be cut off then power on again when the next wave front starts.

I wouldn't expect much power saving under high load though, especially async compute situations.
 
I don't know how fast they can trigger but keep in mind that rendering occurs in waves. At the beginning of a new frame, GPUs have a lot of work to do but it falls off until just a few CUs are putting the final touches on the frame before transitioning to the next. In that period, it's possible for the idle CUs to be cut off then power on again when the next wave front starts.

I wouldn't expect much power saving under high load though, especially async compute situations.

exactly so you might pickup 25 MABY 40W and that kind of power-saving 'trickory' generally incurs a frame latency penalty OR worse studdering
 
so lets assuming they release a 'gamer' variant with _only_ 48cu enabled

so now we are scrapping 250W and its edging out a 1060 beating it handily only at high res

the math doesn't add up don't fall for the pr bullshit math is math performance is(mostly) a function of CU X ROPs X clock-speed

once you know the base-line numbers for a given config its pretty easy to extrapolate a ball-park where it will perform
You do realize that RX 580 can draw 200+ watts with 36 compute units right?

exactly so you might pickup 25 MABY 40W and that kind of power-saving 'trickory' generally incurs a frame latency penalty OR worse studdering
40w is greater than 10% savings. That's pretty damn significant.


Remember, most gamers play with either frame lock, vsync, or adaptive sync on so the GPU doesn't run away. Those situations easily translate to Vega imitating a thinner GPU than it is when playing not-so-intensive games.
 
thats being EXTREMELY generous tho
 
Point is, it was nonfunctional in Frontier Edition so we effectively no nothing about it.

Frontier Edition exists for a reason.
 
Point is, it was nonfunctional in Frontier Edition so we effectively no nothing about it.

Frontier Edition exists for a reason.
its not just the FE tho leaked benchmarks and the buzz going about and even AMD's own slides point to this

lets throw everything I said out the window and assume the top-tier vega-gamer card shipps at 250W TDP
we already _know_ this its bigger badasser brother can't hang with a 1070 let alone a 1080
so unless they are gonna poor the nos on
 
Back
Top