AMD Vega Microarchitecture Technical Overview

VSG · Jul 31, 2017

At the dawn of its Radeon RX Vega family launch, we visit its underlying GPU architecture, which AMD touts as the most advanced and future-ready, in a bid to find out if the company has overcome its longstanding architectural weaknesses while building on its strengths.

Show full review

StefanM · Jul 31, 2017

...They demonstrate a use case of Rapid Packed Math using 3DMark Serra- possibly a yet unannounced Futuremark benchmark...

"Serra" - italian or portuguese?

VSG · Jul 31, 2017

StefanM said:
"Serra" - italian or portuguese?

Funny thing is, no one seems to know what 3DMark Serra is. I emailed Futuremark to see if they can share any details, but best guess is it is an upcoming benchmark or stability testing tool.

Tartaros · Jul 31, 2017

StefanM said:
"Serra" - italian or portuguese?

It's also a surname and a small village in Spain xD

Mirkoskji · Jul 31, 2017

The best thing is serra is the Italian for greenhouse. Things might get warm

VSG · Jul 31, 2017

FYI, I contacted Futuremark about this and got the following: "Thanks for reaching out to ask about Serra. It is actually a custom demo we created for AMD to highlight the benefits of their technology. Serra might appear within 3DMark at a future date as a feature test. However we have no official plans or timing for it at the moment."

OneMoar · Jul 31, 2017

its more gcn with tacked on 'features' that no developer will take the time/want to use

and coming with that is a whooping 300 to 400watss of power consumption

alwayssts · Aug 1, 2017

I'm curious if anyone looks at this similarly as myself.

I see Volta (likely 896/1792/3584sp @ higher clocks) as one option, but where is nVIDIA going to go after that? I think it's a serpentine line to more-or-less exactly what AMD is doing with more costly R&D in-between. It's great that they can afford that, and that consumers can reap those specialized benefits, but I don't see it being the end-all of the matter.

Clock scaling is going to end for nvidia's current methodology (think essentially half that of a typical CPU; an 1800-1850mhz GPU being similar efficiency as a 3600-3700mhz CPU...with clock scaling up to ~2.34ghz [similar to ~4.6-4.7ghz CPU]), just as it has with countless other architectures. We have seen it, and will likely continue to see it (just as Intel has been stuck at ~4.3-4.7 for generations). We've seen Apple nary increase the clock of the A9x to A10x...Heck, we've even seen ARM completely revamp A53 to A55...EXCEPT IN THAT CASE it uses pretty much the exact same clocks but more transistors for greater power efficiency. The difference for the last being that it's a more high-density arch that's optimized closer to ~1/3 ratio (just like Polaris or Vega).

When the point arrives that nvidia transitions to 7nm, odds IMHO are pretty safe ~1792/3584sp designs will be shrunk down in an obvious manner, while what we've seen on V100 will evolve into something twice that of GP102/GV104. The problem there-in lies for them to make sense for each market (~75/150/225/300w) core architecture (more xtors) use will have to play a larger role than clockspeed as power consumption for higher clocks likely won't fit into those parameters because it doesn't scale nearly as well.

Hence, we see something like the Snapdragon 820/821->835 (although the same things could be said of the Apple A series).

High power cores ran at 1800mhz for power efficiency (like Pascal). The tuned version ran at 2.15ghz (like Pascal overclocking or probably like Volta stock). The maximum performance version was 2.34ghz (which is probably where Volta will overclock).

On 10nm, that clock was only raised to 2.45ghz (2.38 for Apple), implying a fairly hard wall was hit in terms of scaling at efficient perf/w. It makes sense; it's not far off half clock of the exact same wall Intel hit.

OTOH, low power cores are a different story. While the power-efficient cores on 820 were 1363mhz (like Polaris), and tuned to 1593mhz on 821 (Vega), ON 10nm WE SEE THEM SCALED TO 1900MHZ. A much lower ratio between the two...and I imagine that will shrink even further on 7nm as higher-density cores may clock even slightly better while larger cores stay more-or-less stagnant.

I don't know how else to demonstrate this should be one of those 'a-ha' moments for why high density/more fixed units/parallelism is key, and preparation for it's inevitability isn't a terrible idea. AMD has done it with Zen, and is doing it again with Vega.

In the same way one can look at the efficient ~3.5-3.8ghz all-core aim for Zen (or 1/3 of that for Polaris/Vega) and clearly see a way forward to (at least boost clocks of) ~5ghz on 7nm for the CPUs, one could look at Vega and see a way forward to maximizing the use of 1ghz (say equal to a 1080/1170 and good for 4k30) or 1.2ghz (4k60) HBM2 in perhaps 150/225w power envelopes.

I'm not going to sit here and defend Vega as it sits; obviously nVIDIA made the best call for THIS node size (and will likely continue it to 12nm). The shader/rop/voltage/clock capability/power consumption/transistor library etc ratios just make more sense at this juncture. AMD went all-out on density and high compute and it straight-up didn't work because they were some combination of too poor to completely revamp the design or too early for it to make sense.

That said, and obviously this is simply only my opinion, AMD has laid a groundwork for both CPUs and GPU designs that should both transition incredibly easily to 7nm while exploiting them to the fullest (without requiring a completely new arch right away)...from the fab that is cranking out high performance parts on that process first...and that ain't for nothing. Sure, they could have made Zen with less cores and higher clock (and perhaps IPC) on 14nm and straight up lost to Intel on Intel's ground. Instead, they did something different that capitalized on the current process' strengths with an eye toward the very near future not only for application (of more cores) but scalability (7nm). Sure, they could have made Vega with less CUs and higher clocks and lost to nvidia because 14nm is less performative (but inherently more dense) process than TSMC's 16nm...but they didn't. Instead they buckled down, got their architecture advancements in place, and made the best of the current situation with a distinct way forward in the next 6 months to a year that will likely only be equaled from nVIDIA two generations from now.

I don't mean to sound apologist, but when I see Intel (with their higher IPC/clockspeed potential designs) scaling clocks back on Coffee Lake to EXACTLY where AMD aimed their core CPU design on a similar but less performative process, I laugh. When I read stories that Intel is having problems scaling clocks past the exact point they've been stuck (for max 'normal overclocking' clocks) forever on 10nm, I laugh.

When I foresee nVIDIA essentially recycling their arch to save them money (can't we all see a ~200mm2 32 ROP 1792sp/~2100+mhz design replacing 1070 and a 400mm2 64 ROP 3584sp/2100mhz+ design being sold at WAYYY too much money to get those last extra 4k60 frames) and calling it ingenious circumvention of Moore's Law instead of planned obsolescence of their former architecture to perpetuate constant sales, I cry.

When I look at Zen and Vega, I see potential. I can understand where they're trying to go with this and how they currently had to make the best of a less-than-ideal situation (with regards to process and likely R&D budget), even if the first iteration isn't perfect. Maybe Zen 2, even if it clocks well, will only catch up to the overall performance of Skylake per core (after adjusted clocks)...but that's okay. Their R&D can take the extra time they need to pump up the arch over the long road of 7nm DUV and EUV for Zen 3, allowing strengths in different areas than their competition on the road to perhaps eventual parity, while in the meantime adding healthy competition or better pricing. Likewise, maybe a shrunken Vega only at most competes with GV104 (at half die size and probably more than half power), or even it's castrated version (2688sp?), but that too is okay. The potential is there for awesome 4k30/4k60 parts that are better/cheaper than 1080, more performative than any potential 1792sp design that would likely be of similar size (and if nvidia has their way, cost) and could perhaps even scale to 2x within 300/375w for our eventual 4k60+/4k120 hdmi 2.1 future.

In short, AMD's designs make me hopeful.

And really....Is that so bad?

FordGT90Concept · Aug 1, 2017

Volta V100 supposedly has 5120 CUDA cores. It's huge at 815 mm2 versus Vega's 484 mm2. Volta will only show up in Titan-level cards. It's too costly to produce to offer it in GeForce branding (it would have to be a down-sized model of the architecture). Volta appears to be specifically for high paying compute customers (deep learning, GPGPU workloads).

On the other hand, I wouldn't be surprised if Volta is a one trick (compute) pony so smaller variants won't be made.

Max clock speeds are determined by the pipeline length. Longer pipes give way to higher clock speeds. GPUs, up until Pascal and Vega, had short pipes (lots of work in few clocks).

Maxwell was new from the ground up. Vega is still GCN which has been around for many years now.

OneMoar · Aug 1, 2017

I suspect post volta we will see a new groundup-arch from team green
the reason we haven't seen one yet is there hasn't been a need for one
nvidia has been able to scale performance and power consumption enough to stay competitive

I really don't understand AMD. GCN in any form just does not scale well they need to stop

..
and make something better

FordGT90Concept · Aug 1, 2017

Vega represents the largest change GCN has ever seen.

What we know now is that, as AMD said, Frontier Edition doesn't reflect the performance of the consumer card in gaming. I don't think we have any gaming benchmarks of the final product except the ones AMD had specifically made for Vega.

OneMoar · Aug 1, 2017

FordGT90Concept said:
Vega represents the largest change GCN has ever seen.

What we know now is that, as AMD said, Frontier Edition doesn't reflect the performance of the consumer card in gaming. I don't think we have any gaming benchmarks of the final product except the ones AMD had specifically made for Vega.

but it does reflect the power consumption increase and thats what I ment by doesn't scale
you can add GCN compute units for days and get MOAR speed but you get MOAAAAAAAAAR power consumption

amd needs to do-more-with-less approach instead of throwing Cu's at the problem

FordGT90Concept · Aug 1, 2017

Actually, it does. Power saving circuitry that will be enabled in the consumer card is not enabled in Frontier Edition.

Vega has the exact same number of compute units as Fiji did (4096). Vega is significantly smaller than Fiji because of the process difference and removal of two HBM stacks.

[conspiracy]Someone toyed with the idea that maybe AMD was mining cryptocurrencies with the Frontier Edition before boxing them up and shipping them out. It makes too much sense especially if AMD compiled mining software utilizing the mining-specific instructions in Vega. Mining cards run full throttle (no point in power saving circuits) and the DSBR is also worthless for mining. Both of these things were confirmed to be disabled in Frontier Edition.[/conspiracy] In gaming, Frontier Edition basically behaves like a die-shrunk, overclocked Fiji, and it really did.

OneMoar · Aug 1, 2017

#define power-saving-circuits because there are only a handful of ways to reduce power consumption and nearly ALL of them involve some kind of performance hit

do get improvement without a large performance hit you need todo more then just fiddle with clock speeds you need to increase the efficiency of the core less wasted operations = better power consumption

FordGT90Concept · Aug 1, 2017

OneMoar · Aug 1, 2017

so it reduces speed and downclocks the IF buss ...

and that explicitly says idle-power-consumption

which doesn't do anything for those 300 to 400W load gpu-only draws
even 250 would be boarder-line unacceptable for the performance you get

so you are telling me they are going to loose over 120W and still complete with the 1070?
BULL

FordGT90Concept · Aug 1, 2017

I see 64 CUs and 50...circuits...highlighted. Mention of Infinity Fabric too. I don't think it's stretching to believe this power management microcontroller can cut power to idle cores (not unlike CPUs), responding directly to GPU load. Chips as wide as Vega can really benefit from that, especially gaming at low resolutions. Vega may be capable of running on just 14 CUs.

And think about it: Infinity Fabric, in GPU? This is the framework for Navi to work on. They link the bus together in APU, in multiple PCIE slots, and as far as the operating system is concerned, you got just one really wide GPU, just like how AMD manages to cram 32 CPU cores into 4 MCM'd chips.

OneMoar · Aug 1, 2017

in its 56cu config its still 300W
so what does turning CU's off or clocking them down do for power consumption when you are gaming at 4K
which is where this card is priced

OneMoar · Aug 1, 2017

so lets assuming they release a 'gamer' variant with _only_ 48cu enabled

so now we are scrapping 250W and its edging out a 1060 beating it handily only at high res

the math doesn't add up don't fall for the pr bullshit math is math performance is(mostly) a function of CU X ROPs X clock-speed

once you know the base-line numbers for a given config its pretty easy to extrapolate a ball-park where it will perform

FordGT90Concept · Aug 1, 2017

I don't know how fast they can trigger but keep in mind that rendering occurs in waves. At the beginning of a new frame, GPUs have a lot of work to do but it falls off until just a few CUs are putting the final touches on the frame before transitioning to the next. In that period, it's possible for the idle CUs to be cut off then power on again when the next wave front starts.

I wouldn't expect much power saving under high load though, especially async compute situations.

OneMoar · Aug 1, 2017

FordGT90Concept said:
I don't know how fast they can trigger but keep in mind that rendering occurs in waves. At the beginning of a new frame, GPUs have a lot of work to do but it falls off until just a few CUs are putting the final touches on the frame before transitioning to the next. In that period, it's possible for the idle CUs to be cut off then power on again when the next wave front starts.

I wouldn't expect much power saving under high load though, especially async compute situations.

exactly so you might pickup 25 MABY 40W and that kind of power-saving 'trickory' generally incurs a frame latency penalty OR worse studdering

FordGT90Concept · Aug 1, 2017

OneMoar said:
so lets assuming they release a 'gamer' variant with _only_ 48cu enabled

so now we are scrapping 250W and its edging out a 1060 beating it handily only at high res

the math doesn't add up don't fall for the pr bullshit math is math performance is(mostly) a function of CU X ROPs X clock-speed

once you know the base-line numbers for a given config its pretty easy to extrapolate a ball-park where it will perform

You do realize that RX 580 can draw 200+ watts with 36 compute units right?

OneMoar said:
exactly so you might pickup 25 MABY 40W and that kind of power-saving 'trickory' generally incurs a frame latency penalty OR worse studdering

40w is greater than 10% savings. That's pretty damn significant.

Remember, most gamers play with either frame lock, vsync, or adaptive sync on so the GPU doesn't run away. Those situations easily translate to Vega imitating a thinner GPU than it is when playing not-so-intensive games.

OneMoar · Aug 1, 2017

thats being EXTREMELY generous tho

FordGT90Concept · Aug 1, 2017

Point is, it was nonfunctional in Frontier Edition so we effectively no nothing about it.

Frontier Edition exists for a reason.

OneMoar · Aug 1, 2017

FordGT90Concept said:
Point is, it was nonfunctional in Frontier Edition so we effectively no nothing about it.

Frontier Edition exists for a reason.

its not just the FE tho leaked benchmarks and the buzz going about and even AMD's own slides point to this

lets throw everything I said out the window and assume the top-tier vega-gamer card shipps at 250W TDP
we already _know_ this its bigger badasser brother can't hang with a 1070 let alone a 1080
so unless they are gonna poor the nos on

System Name	Acer Nitro 5 (AN515-45-R715)
Processor	AMD Ryzen 9 5900HX
Motherboard	AMD Promontory / Bixby FCH
Cooling	Acer Nitro Sense
Memory	32 GB
Video Card(s)	AMD Radeon Graphics (Cezanne) / NVIDIA RTX 3080 Laptop GPU
Storage	WDC PC SN530 SDBPNPZ
Display(s)	BOE CQ NE156QHM-NY3
Software	Windows 11 beta channel

System Name	Rectangulote 2
Processor	Ryzen 7 9800X3D
Motherboard	Asus TUF Gaming B650-PLUS
Cooling	Alphacool Eisbaer Pro Aurora 360 + 240 ST30
Memory	64 GB DDR5 6000mhz Corsair Vengeance
Video Card(s)	Asus TUF Gaming RTX 4090 OC
Storage	3 x WD Black SN-850X 1TB
Display(s)	2 x Asus ROG Swift PG278QR / LG C4
Case	Corsair 5000D Airflow
Audio Device(s)	Evga Nu Audio + Beyerdynamic DT 150 + Trust GTX 258
Power Supply	Corsair RMX1000
Mouse	Razer Naga Wireless Pro
Keyboard	Keychron K4
Software	Windows 11 Pro

System Name	Frankenstin 2.0, Alienware X17 R2
Processor	Ryzen 5 3600 @ 4400mhz, 1,248v fixed
Motherboard	Fatal1ty B450 Gaming-ITX/ac
Cooling	Swiftech Apogee drive 2 + XSPC x360 + generic GPU Waterblock
Memory	32Gb G.skill 3200 cl16
Video Card(s)	Powercolor RX Vega 56, Custom watercooling - @ 64 mod
Storage	Sabrent Rocket 1TB NVME
Display(s)	Samsung LC27JG500
Case	Thermaltake Core G3
Audio Device(s)	Integrated + Denon AVR 2800
Power Supply	Enermax Revolution SFX 650w
Mouse	Trust GXT 152
Keyboard	Logitech G413
Software	Windows 10 Pro x64

System Name	RPC MK2.5
Processor	Ryzen 5800x
Motherboard	Gigabyte Aorus Pro V2
Cooling	Thermalright Phantom Spirit SE
Memory	CL16 BL2K16G36C16U4RL 3600 1:1 micron e-die
Video Card(s)	GIGABYTE RTX 3070 Ti GAMING OC
Storage	Nextorage NE1N 2TB ADATA SX8200PRO NVME 512GB, Intel 545s 500GBSSD, ADATA SU800 SSD, 3TB Spinner
Display(s)	LG Ultra Gear 32 1440p 165hz Dell 1440p 75hz
Case	Phanteks P300 /w 300A front panel conversion
Audio Device(s)	onboard
Power Supply	SeaSonic Focus+ Platinum 750W
Mouse	Kone burst Pro
Keyboard	SteelSeries Apex 7
Software	Windows 11 +startisallback

System Name	HTPC whhaaaat?
Processor	2600k @ 4500mhz
Motherboard	Asus Maximus IV gene-z gen3
Cooling	Noctua NH-C14
Memory	Gskill Ripjaw 2x4gb
Video Card(s)	EVGA 1080 FTW @ 2037/11016
Storage	2x512GB MX100/1x Agility 3 128gb ssds, Seagate 3TB HDD
Display(s)	Vizio P 65'' 4k tv
Case	Lian Li pc-c50b
Audio Device(s)	Denon 3311
Power Supply	Corsair 620HX

System Name	BY-2021
Processor	AMD Ryzen 7 5800X (65w eco profile)
Motherboard	MSI B550 Gaming Plus
Cooling	Scythe Mugen (rev 5)
Memory	2 x Kingston HyperX DDR4-3200 32 GiB
Video Card(s)	AMD Radeon RX 7900 XT
Storage	Samsung 980 Pro, Seagate Exos X20 TB 7200 RPM
Display(s)	Nixeus NX-EDG274K (3840x2160@144 DP) + Samsung SyncMaster 906BW (1440x900@60 HDMI-DVI)
Case	Coolermaster HAF 932 w/ USB 3.0 5.25" bay + USB 3.2 (A+C) 3.5" bay
Audio Device(s)	Realtek ALC1150, Micca OriGen+
Power Supply	Enermax Platimax 850w
Mouse	Nixeus REVEL-X
Keyboard	Tesoro Excalibur
Software	Windows 10 Home 64-bit
Benchmark Scores	Faster than the tortoise; slower than the hare.

AMD Vega Microarchitecture Technical Overview

Editor, Reviews & News

Editor, Reviews & News

Editor, Reviews & News

There is Always Moar

"I go fast!1!11!1!"

There is Always Moar

"I go fast!1!11!1!"

There is Always Moar

"I go fast!1!11!1!"

There is Always Moar

"I go fast!1!11!1!"

There is Always Moar

"I go fast!1!11!1!"

There is Always Moar

There is Always Moar

"I go fast!1!11!1!"

There is Always Moar

"I go fast!1!11!1!"

There is Always Moar

"I go fast!1!11!1!"

There is Always Moar