Monday, April 30th 2018

AMD "Vega 20" with 32 GB HBM2 3DMark 11 Score Surfaces

Apr 30th, 2018 07:05 Discuss (77 Comments)

With the latest Radeon Vega Instinct reveal, it's becoming increasingly clear that "Vega 20" is an optical shrink of the "Vega 10" GPU die to the new 7 nm silicon fabrication process, which could significantly lower power-draw, enabling AMD to increase clock-speeds. A prototype graphics card based on "Vega 20," armed with a whopping 32 GB of HBM2 memory, was put through 3DMark 11, on a machine powered by a Ryzen 7 1700 processor, and compared with a Radeon Vega Frontier Edition.

The prototype had lower GPU clock-speeds than the Vega Frontier Edition, at 1.00 GHz, vs. up to 1.60 GHz of the Vega Frontier Edition. Its memory, however, was clocked higher, at 1250 MHz (640 GB/s) vs. 945 MHz (483 GB/s). Despite significantly lower GPU clocks, the supposed "Vega 20" prototype appears to score higher performance clock-for-clock, but loses out on overall performance, in all tests. This could mean "Vega 20" is not just an optical-shrink of "Vega 10," but also benefits from newer architecture features, besides faster memory.

Source: VideoCardz

Add your own comment

77 Comments on AMD "Vega 20" with 32 GB HBM2 3DMark 11 Score Surfaces

#51

efikkan

FordGT90ConceptThat's how starved Vega is for bandwidth. Remember, Vega 64 and Fury X have the same bandwidth. Vega 64's shaders can handle about 80% more throughput per shader than Fury X thanks to NCU and higher clockspeeds but it only gets a fraction of that because Vega 64's shaders are constantly waiting on the memory subsystem to feed it more data. Evidencing this is how miners underclock and undervolt the GPU while overclocking the VRAM. Vega 20 finally gives Vega the bandwidth it has always needed. Granted, if the task at hand doesn't require lots of higher level memory accesses, it won't see much benefit from the extra stacks.

Please elaborate, why is the Vega bandwidth starved? Are you talking about specific workloads? It has 50% more memory bandwidth than GTX 1080, and the same bandwidth as GTX 1080 Ti.

#52

medi01

efikkant has 50% more memory bandwidth than...

Some other chip with a different architecture.
That's hardly a reasonable argument.

What matters, however, is that if Vega was bandwidth starved, we could see it in actual tests. Which show it isn't.

"At 1600MHz core the gains from a 16% Memory OC are roughly 4%. Memory starved gains look different."

#53

Slizzo

OctopussWhat exactly does "starved for bandwidth" mean? Isn't HBM's bandwidth numerous times higher than GDDR5?

Not really.

VEGA 64 has a memory bandwidth of 484GB/s
GTX 1080 Ti has a memory bandwidth of 484GB/s.

VEGA 64 is on a 2048bit bus, with a memory speed of 1890MHz effective.
1080Ti is on a 384bit bus, with an 11,008MHz effective memory speed.

#54

efikkan

medi01Some other chip with a different architecture.
That's hardly a reasonable argument.

My argument is that when the competition manages the same workload with significantly less of these specific hardware resources, then the problem is in managing the resources, not in the amount of resources. This is a solid argument.

#55

jabbadap

T4C Fantasyyou mentioned the R9 290X though as in the next (R9 290X) Tahiti is the best (Desktop) FP64 chip they ever made... happy? xD

Well yeah kind of(Workstation Firepro W9100 is desktop graphics card too). Well let's agree it's the AMDs best consumer graphics gpu for double precision compute tasks. Which itself outside of HPC is quite moot point of having. But with all this chitchat one would think of this: if AMD will release consumer product from Vega 20, what are the odds that it will have all the possible fp64 compute capabilities intact or will it be crippled like R9 290x was.

#56

FordGT90Concept

"I go fast!1!11!1!"

efikkanMy argument is that when the competition manages the same workload with significantly less of these specific hardware resources, then the problem is in managing the resources, not in the amount of resources. This is a solid argument.

Vega10 (11.5+ TFLOPs) has more compute potential than 1080 Ti (10.6+ TFLOPs).
Vega10 has HBCC which adds some latency that was intended to be compensated with massive memory bandwidth which Vega10 did not have. Vega20 finally delievers both.

#57

jabbadap

FordGT90ConceptVega10 (11.5+ TFLOPs) has more compute potential than 1080 Ti (10.6+ TFLOPs).
Vega10 has HBCC which adds some latency that was intended to be compensated with massive memory bandwidth which Vega10 did not have. Vega20 finally delievers both.

Uhm marketing differs a lot between the two. Vega⁶⁴ have up-to 12.7TFlops and gtx1080ti has very conservative 11.3TFlops, in gaming load they are quite reversed in fp32 TFLops. Vega does not keep up-to clocks and gtx1080ti is running higher than marketing boost clocks.

#58

efikkan

FordGT90ConceptVega10 (11.5+ TFLOPs) has more compute potential than 1080 Ti (10.6+ TFLOPs).

Vega10 has HBCC which adds some latency that was intended to be compensated with massive memory bandwidth which Vega10 did not have. Vega20 finally delievers both.

All the bandwidth in the world will not hide latency, nor is latency the true problem for GCN. The problem is simply resource management. The problem is not new with Vega either, it also existed in Polaris and Fiji.

If you have a GPU like GP104 with a 256-bit GDDR controller, it's really four separate 64-bit, supplying a total of 320 GB/s of theoretical bandwidth, but only when load is spread evenly across them. Each controller can only be used by one cluster at the time, meaning if four clusters are scheduled to read/write from the same memory controller, they have to wait in turn, leaving you with an effective 80 GB/s instead of 320 GB/s.

The reason why Nvidia scales better is they manage their resources much better, while AMD have a much simpler and more brute-force approach throwing much more resources at the problem, resulting in much larger and more power-hungry designs. Simply adding more resources (memory bandwidth, higher clocks, more cores, etc.) is not necessarily going to solve any problems if they are going to manage the extra resources just as poorly. More bandwidth is not going to solve any bottlenecks caused by data dependencies. If AMD threw more memory bandwidth into their design, they might risk decreasing the energy efficiency.

#59

Captain_Tom

FordGT90ConceptVega10 (11.5+ TFLOPs) has more compute potential than 1080 Ti (10.6+ TFLOPs).
Vega10 has HBCC which adds some latency that was intended to be compensated with massive memory bandwidth which Vega10 did not have. Vega20 finally delievers both.

This is exactly correct. AMD was hoping to have 1000MHz or higher clocks on Vega 64, but unfortunately HBM2 was just not quite there yet.

But now it will be with 1250MHz and 1600MHz chips on the way, and Vega 20 will have double the bus too! This more than doubling of bandwidth would easily add 50%+ performance alone (And 7nm should bring at least 20% higher core clocks too).

P.S. To all those who say hilarious things like "INSERT_CARD doesn't need more bandwidth," my Vega 64 gets a flat 12% performance boost simply by only overclocking the HBM to 1125MHz (19% uplift in bandwidth).

#60

T4C Fantasy

CPU & GPU DB Maintainer

Captain_TomThis is exactly correct. AMD was hoping to have 1000MHz or higher clocks on Vega 64, but unfortunately HBM2 was just not quite there yet.

But now it will be with 1250MHz and 1600MHz chips on the way, and Vega 20 will have double the bus too! This more than doubling of bandwidth would easily add 50%+ performance alone (And 7nm should bring at least 20% higher core clocks too).

P.S. To all those who say hilarious things like "INSERT_CARD doesn't need more bandwidth," my Vega 64 gets a flat 12% performance boost simply by only overclocking the HBM to 1125MHz (19% uplift in bandwidth).

www.techpowerup.com/forums/threads/post-your-final-fantasy-xv-benchmark-results.242200/
do the benchmark we need more vegas on the leaderboard

#61

nemesis.ie

Is there a demo with the B/M built in? No way I am buying the game. I have a pair of Vegas under water that I could submit some scores for if it's free.

#62

InVasMani

Captain_TomThis is exactly correct. AMD was hoping to have 1000MHz or higher clocks on Vega 64, but unfortunately HBM2 was just not quite there yet.

But now it will be with 1250MHz and 1600MHz chips on the way, and Vega 20 will have double the bus too! This more than doubling of bandwidth would easily add 50%+ performance alone (And 7nm should bring at least 20% higher core clocks too).

P.S. To all those who say hilarious things like "INSERT_CARD doesn't need more bandwidth," my Vega 64 gets a flat 12% performance boost simply by only overclocking the HBM to 1125MHz (19% uplift in bandwidth).

I'm pretty sure that's only if the game is bandwidth limited in the first place on the doubling of the bandwidth part. That's actually a big part of the problem with Vega though it's effective memory clock speed will be a little higher and that will help a lot about 20%-37.5% more than Fury X had which Vega could only match from a HBM overclock. I think that latency reduction will help a lot and might improve HBCC a bit as well. The other big thing Vega 20 could stand to improve upon also is more ROP's which would help it's AA performance a lot further.

#63

FordGT90Concept

"I go fast!1!11!1!"

jabbadapUhm marketing differs a lot between the two. Vega⁶⁴ have up-to 12.7TFlops and gtx1080ti has very conservative 11.3TFlops, in gaming load they are quite reversed in fp32 TFLops. Vega does not keep up-to clocks and gtx1080ti is running higher than marketing boost clocks.

I looked at core clock, not boost clocks. You run compute loads on a card, it's likely to thermal/power throttle so boost clock is unreliable at best.

efikkanIf AMD threw more memory bandwidth into their design, they might risk decreasing the energy efficiency.

That's kind of the idea. Vega is waiting for data as much as it is executing. Getting the data to the shaders faster translates to improved efficiency but also more power consumption due to fewer idle resources.

#64

Captain_Tom

InVasManiThe other big thing Vega 20 could stand to improve upon also is more ROP's which would help it's AA performance a lot further.

The lack of additional ROP's since the 290X is baffling for everyone lol. That's one thing I find perplexing about all of the Vega 20 Rumors I have seen:

-faster HBM
-double the bandwidth
-7nm node
-and NO increase in ROP , TMU, or SP count?!

I mean Vega isn't that bandwidth starved to the point that it's ONLY worth increasing the bandwidth by a factor of 2.5! Glofo's 7nm node allows the relative die size of chips to be nearly cut in half! Why not add at least 20% more Compute Units and 50% more ROP's?!

If you are going to increase the bandwidth by 150%, at least make it an RX Vega 80!

#65

T4C Fantasy

CPU & GPU DB Maintainer

Captain_TomThe lack of additional ROP's since the 290X is baffling for everyone lol. That's one thing I find perplexing about all of the Vega 20 Rumors I have seen:

-faster HBM
-double the bandwidth
-7nm node
-and NO increase in ROP , TMU, or SP count?!

I mean Vega isn't that bandwidth starved to the point that it's ONLY worth increasing the bandwidth by a factor of 2.5! Glofo's 7nm node allows the relative die size of chips to be nearly cut in half! Why not add at least 20% more Compute Units and 50% more ROP's?!

If you are going to increase the bandwidth by 150%, at least make it an RX Vega 80!

Believe it or not vega isnt rop starved at all, even w1zzard says this, if it was rop starved performance wouldnt close the gap like it does on higher resolutions, so i believe the rop argument is a myth

#66

Captain_Tom

T4C FantasyBelieve it or not vega isnt rop starved at all, even w1zzard says this, if it was rop starved performance wouldnt close the gap like it does on higher resolutions, so i believe the rop argument is a myth

Then explain to me why the Vega 24 (in the Intel + Vega APU) performs as well as the 570/1060 with less CU's and much lower clocks. Note: It has the SAME number of ROP's the 570 has!

There is absolutely no doubt that Vega 64/56 has less issues with its low ROP count than Polaris and Fiji had, but it clearly would at least aid in performance if there were relatively more ROP's. All I suggested was adding 50% more ROP's if they added 20% more CU's (and keep in mind 7nm will allow 20% higher core clocks too).

#67

T4C Fantasy

CPU & GPU DB Maintainer

Captain_TomThen explain to me why the Vega 24 (in the Intel + Vega APU) performs as well as the 570/1060 with less CU's and much lower clocks. Note: It has the SAME number of ROP's the 570 has!

There is absolutely no doubt that Vega 64/56 has less issues with its low ROP count than Polaris and Fiji had, but it clearly would at least aid in performance if there were relatively more ROP's. All I suggested was adding 50% more ROP's if they added 20% more CU's (and keep in mind 7nm will allow 20% higher core clocks too).

well first of all its Intel+Polaris APU xD no Vega specific features at all. just Polaris with HBM. and yeah it could benefit but its just not starved thats all.

AMD Radeon RX Vega M
Graphics/Compute: GFX8
Display Core Engine: 11.2
Unified Video Decoder: 6.3
Video Compression Engine: 3.4

REAL Vega
Graphics/Compute: GFX9
Display Core Engine: 12.0
Unified Video Decoder: 7.0
Video Compression Engine: 4.0

#68

medi01

So, what we got at the end of the day is allegedly 7nm chip running at who knows what clockspeed with more memory that is slower (in the given benchmark) than the older model of the same manufacturer.

Yey, excitement...

efikkanMy argument is that when the competition manages the same workload with significantly less of these specific hardware resources, then the problem is in managing the resources, not in the amount of resources. This is a solid argument.

You were responding to "is memory starved". "Competition isn't, so it isn't either" is by no means a solid argument.
Vega isn't mem starved because tests of Vega itself show so, not because of whatever some other companies out there somewhere do.

#69

nemesis.ie

Given it's an ES and clocks etc. are not really known, no, we have not got a slower than the older model product at the end of the day.

We have a test run that may appear slower due to this ES being potentially clocked lower for testing, nothing more, nothing less. ;)

If the clock reported is to be believed, it actually should be faster than the current product at the same clock speed said current product runs at - and it may clock even faster than that.

So yes, I think there is a reason for some excitement. :)

#70

InVasMani

T4C FantasyBelieve it or not vega isnt rop starved at all, even w1zzard says this, if it was rop starved performance wouldnt close the gap like it does on higher resolutions, so i believe the rop argument is a myth

A great ROPs test would be benchmarking Vega 56 with the HBM memory clocked up to 1GHz versus 1080Ti down clocked core to match Vega 56's pixel rate since 1080Ti and Vega 56 share a identical number of TMU's and Shaders. It's a very good apples to apples comparison to test from 1080p up to 4K and including 99 percentiles as well. It would be revealing seeing how much impact the ROPs difference has and at which point in the pipeline. I'd monitor the VRAM usage though to ensure the VEGA 56 isn't starved in that area. The ROPs deficiency is larger between the Frontier Edition and 1080Ti you are talking 24 more ROPs versus 8 if comparing Vega 56 to a 1070/1080 and Vega FE gets killed bad it's a total buzzkill card for gaming purposes in comparison. The power consumption would also be intriguing in the 1080Ti vs Vega 56 ROPs test it might give some insight on how that might decrease moving forward as more ROPs are added to GPUs.

#71

jabbadap

MSAA uses ROPs too so test that. If I remember correctly Vega takes a bigger hit compared to nvidia when using 2x, 4x, 8x MSAA, of course that depends on game too.

#72

ValenOne

SlizzoNot really.

VEGA 64 has a memory bandwidth of 484GB/s
GTX 1080 Ti has a memory bandwidth of 484GB/s.

VEGA 64 is on a 2048bit bus, with a memory speed of 1890MHz effective.
1080Ti is on a 384bit bus, with an 11,008MHz effective memory speed.

VEGA 64 has GP104 level classic GPU hardware with GP102 level compute hardware.

TFLOPS alone is useless without read/write units which is TMUs and ROPS.

#73

Vya Domus

rvalenciaVEGA 64 has GP104 level classic GPU hardware with GP102 level compute hardware.

TFLOPS alone is useless without read/write units which is TMUs and ROPS.

That's not how it works, ALUs are ALUs. Vega 10 is GP100 class.

#74

Captain_Tom

rvalenciaVEGA 64 has GP104 level classic GPU hardware with GP102 level compute hardware.

TFLOPS alone is useless without read/write units which is TMUs and ROPS.

That's actually a very good way to put it. Vega was built to compete with Volta at a lower price, and that's about it. It traded assets gaming cards usually have to be the most efficient/cheap at compute.

Now if you utilize Async compute, FP16, and specifically program for Vega's rasterization engine - It can certainly compete with the Titan Xp very well, especially if you overclock the HBM. However that's really a best case scenario, and if you don't use those extra assets it just treads water above the 1080.

#75

ValenOne

Vya DomusThat's not how it works, ALUs are ALUs. Vega 10 is GP100 class.

Wrong. ALU's main read/write operations are from TMUs or ROPS.

Compute Shader Path uses TMUs read/write functions.
Pixel Shader Path uses ROPs read/write functions.

Vega 64 has GP104 class classic GPU hardware. GP102 has more rasterization and ROPS hardware when compared to Vega 64. Vega GPUs gained ROPS being connected to L2 cache design as per Maxwell/Pascal designs.

Vega 64 has GP102 class FP32 compute hardware.

Both Vega 64 and GP104 has quad rasterization /quad GPC units and 64 ROPS.

Add your own comment

AMD "Vega 20" with 32 GB HBM2 3DMark 11 Score Surfaces

77 Comments on AMD "Vega 20" with 32 GB HBM2 3DMark 11 Score Surfaces

Latest GPU Drivers

New Forum Posts

Popular Reviews

TPU on YouTube

Controversial News Posts

AMD "Vega 20" with 32 GB HBM2 3DMark 11 Score Surfaces

Related News

77 Comments on AMD "Vega 20" with 32 GB HBM2 3DMark 11 Score Surfaces

Latest GPU Drivers

New Forum Posts

Popular Reviews

TPU on YouTube

Controversial News Posts