AMD "Vega 20" with 32 GB HBM2 3DMark 11 Score Surfaces

T4C Fantasy · May 2, 2018

jabbadap said:
Come on T4C, you should know better than that. Amd crippled FP64 performance on R9 290x. Have a look on workstation and server Hawaii cards:
https://www.amd.com/en-us/products/graphics/workstation/firepro-3d/9100
https://www.amd.com/en-us/products/graphics/server/s9100
https://www.amd.com/en-us/products/graphics/server/s9170

you mentioned the R9 290X though as in the next (R9 290X) Tahiti is the best (Desktop) FP64 chip they ever made... happy? xD

efikkan · May 2, 2018

FordGT90Concept said:
That's how starved Vega is for bandwidth. Remember, Vega 64 and Fury X have the same bandwidth. Vega 64's shaders can handle about 80% more throughput per shader than Fury X thanks to NCU and higher clockspeeds but it only gets a fraction of that because Vega 64's shaders are constantly waiting on the memory subsystem to feed it more data. Evidencing this is how miners underclock and undervolt the GPU while overclocking the VRAM. Vega 20 finally gives Vega the bandwidth it has always needed. Granted, if the task at hand doesn't require lots of higher level memory accesses, it won't see much benefit from the extra stacks.

Please elaborate, why is the Vega bandwidth starved? Are you talking about specific workloads? It has 50% more memory bandwidth than GTX 1080, and the same bandwidth as GTX 1080 Ti.

medi01 · May 2, 2018

efikkan said:
t has 50% more memory bandwidth than...

Some other chip with a different architecture.
That's hardly a reasonable argument.

What matters, however, is that if Vega was bandwidth starved, we could see it in actual tests. Which show it isn't.

"At 1600MHz core the gains from a 16% Memory OC are roughly 4%. Memory starved gains look different."

Slizzo · May 2, 2018

Octopuss said:
What exactly does "starved for bandwidth" mean? Isn't HBM's bandwidth numerous times higher than GDDR5?

Not really.

VEGA 64 has a memory bandwidth of 484GB/s
GTX 1080 Ti has a memory bandwidth of 484GB/s.

VEGA 64 is on a 2048bit bus, with a memory speed of 1890MHz effective.
1080Ti is on a 384bit bus, with an 11,008MHz effective memory speed.

efikkan · May 2, 2018

medi01 said:
Some other chip with a different architecture.
That's hardly a reasonable argument.

My argument is that when the competition manages the same workload with significantly less of these specific hardware resources, then the problem is in managing the resources, not in the amount of resources. This is a solid argument.

jabbadap · May 2, 2018

T4C Fantasy said:
you mentioned the R9 290X though as in the next (R9 290X) Tahiti is the best (Desktop) FP64 chip they ever made... happy? xD

Well yeah kind of(Workstation Firepro W9100 is desktop graphics card too). Well let's agree it's the AMDs best consumer graphics gpu for double precision compute tasks. Which itself outside of HPC is quite moot point of having. But with all this chitchat one would think of this: if AMD will release consumer product from Vega 20, what are the odds that it will have all the possible fp64 compute capabilities intact or will it be crippled like R9 290x was.

FordGT90Concept · May 2, 2018

efikkan said:
My argument is that when the competition manages the same workload with significantly less of these specific hardware resources, then the problem is in managing the resources, not in the amount of resources. This is a solid argument.

Vega10 (11.5+ TFLOPs) has more compute potential than 1080 Ti (10.6+ TFLOPs).
Vega10 has HBCC which adds some latency that was intended to be compensated with massive memory bandwidth which Vega10 did not have. Vega20 finally delievers both.

jabbadap · May 2, 2018

FordGT90Concept said:
Vega10 (11.5+ TFLOPs) has more compute potential than 1080 Ti (10.6+ TFLOPs).
Vega10 has HBCC which adds some latency that was intended to be compensated with massive memory bandwidth which Vega10 did not have. Vega20 finally delievers both.

Uhm marketing differs a lot between the two. Vega⁶⁴ have up-to 12.7TFlops and gtx1080ti has very conservative 11.3TFlops, in gaming load they are quite reversed in fp32 TFLops. Vega does not keep up-to clocks and gtx1080ti is running higher than marketing boost clocks.

efikkan · May 2, 2018

FordGT90Concept said:
Vega10 (11.5+ TFLOPs) has more compute potential than 1080 Ti (10.6+ TFLOPs).

Vega10 has HBCC which adds some latency that was intended to be compensated with massive memory bandwidth which Vega10 did not have. Vega20 finally delievers both.

All the bandwidth in the world will not hide latency, nor is latency the true problem for GCN. The problem is simply resource management. The problem is not new with Vega either, it also existed in Polaris and Fiji.

If you have a GPU like GP104 with a 256-bit GDDR controller, it's really four separate 64-bit, supplying a total of 320 GB/s of theoretical bandwidth, but only when load is spread evenly across them. Each controller can only be used by one cluster at the time, meaning if four clusters are scheduled to read/write from the same memory controller, they have to wait in turn, leaving you with an effective 80 GB/s instead of 320 GB/s.

The reason why Nvidia scales better is they manage their resources much better, while AMD have a much simpler and more brute-force approach throwing much more resources at the problem, resulting in much larger and more power-hungry designs. Simply adding more resources (memory bandwidth, higher clocks, more cores, etc.) is not necessarily going to solve any problems if they are going to manage the extra resources just as poorly. More bandwidth is not going to solve any bottlenecks caused by data dependencies. If AMD threw more memory bandwidth into their design, they might risk decreasing the energy efficiency.

Captain_Tom · May 2, 2018

FordGT90Concept said:
Vega10 (11.5+ TFLOPs) has more compute potential than 1080 Ti (10.6+ TFLOPs).
Vega10 has HBCC which adds some latency that was intended to be compensated with massive memory bandwidth which Vega10 did not have. Vega20 finally delievers both.

This is exactly correct. AMD was hoping to have 1000MHz or higher clocks on Vega 64, but unfortunately HBM2 was just not quite there yet.

But now it will be with 1250MHz and 1600MHz chips on the way, and Vega 20 will have double the bus too! This more than doubling of bandwidth would easily add 50%+ performance alone (And 7nm should bring at least 20% higher core clocks too).

P.S. To all those who say hilarious things like "INSERT_CARD doesn't need more bandwidth," my Vega 64 gets a flat 12% performance boost simply by only overclocking the HBM to 1125MHz (19% uplift in bandwidth).

T4C Fantasy · May 2, 2018

Captain_Tom said:
This is exactly correct. AMD was hoping to have 1000MHz or higher clocks on Vega 64, but unfortunately HBM2 was just not quite there yet.

But now it will be with 1250MHz and 1600MHz chips on the way, and Vega 20 will have double the bus too! This more than doubling of bandwidth would easily add 50%+ performance alone (And 7nm should bring at least 20% higher core clocks too).

P.S. To all those who say hilarious things like "INSERT_CARD doesn't need more bandwidth," my Vega 64 gets a flat 12% performance boost simply by only overclocking the HBM to 1125MHz (19% uplift in bandwidth).

https://www.techpowerup.com/forums/threads/post-your-final-fantasy-xv-benchmark-results.242200/
do the benchmark we need more vegas on the leaderboard

nemesis.ie · May 2, 2018

Is there a demo with the B/M built in? No way I am buying the game. I have a pair of Vegas under water that I could submit some scores for if it's free.

InVasMani · May 2, 2018

Captain_Tom said:
This is exactly correct. AMD was hoping to have 1000MHz or higher clocks on Vega 64, but unfortunately HBM2 was just not quite there yet.

But now it will be with 1250MHz and 1600MHz chips on the way, and Vega 20 will have double the bus too! This more than doubling of bandwidth would easily add 50%+ performance alone (And 7nm should bring at least 20% higher core clocks too).

P.S. To all those who say hilarious things like "INSERT_CARD doesn't need more bandwidth," my Vega 64 gets a flat 12% performance boost simply by only overclocking the HBM to 1125MHz (19% uplift in bandwidth).

I'm pretty sure that's only if the game is bandwidth limited in the first place on the doubling of the bandwidth part. That's actually a big part of the problem with Vega though it's effective memory clock speed will be a little higher and that will help a lot about 20%-37.5% more than Fury X had which Vega could only match from a HBM overclock. I think that latency reduction will help a lot and might improve HBCC a bit as well. The other big thing Vega 20 could stand to improve upon also is more ROP's which would help it's AA performance a lot further.

FordGT90Concept · May 2, 2018

jabbadap said:
Uhm marketing differs a lot between the two. Vega⁶⁴ have up-to 12.7TFlops and gtx1080ti has very conservative 11.3TFlops, in gaming load they are quite reversed in fp32 TFLops. Vega does not keep up-to clocks and gtx1080ti is running higher than marketing boost clocks.

I looked at core clock, not boost clocks. You run compute loads on a card, it's likely to thermal/power throttle so boost clock is unreliable at best.

efikkan said:
If AMD threw more memory bandwidth into their design, they might risk decreasing the energy efficiency.

That's kind of the idea. Vega is waiting for data as much as it is executing. Getting the data to the shaders faster translates to improved efficiency but also more power consumption due to fewer idle resources.

Captain_Tom · May 3, 2018

InVasMani said:
The other big thing Vega 20 could stand to improve upon also is more ROP's which would help it's AA performance a lot further.

The lack of additional ROP's since the 290X is baffling for everyone lol. That's one thing I find perplexing about all of the Vega 20 Rumors I have seen:

-faster HBM
-double the bandwidth
-7nm node
-and NO increase in ROP , TMU, or SP count?!

I mean Vega isn't that bandwidth starved to the point that it's ONLY worth increasing the bandwidth by a factor of 2.5! Glofo's 7nm node allows the relative die size of chips to be nearly cut in half! Why not add at least 20% more Compute Units and 50% more ROP's?!

If you are going to increase the bandwidth by 150%, at least make it an RX Vega 80!

T4C Fantasy · May 3, 2018

Captain_Tom said:
The lack of additional ROP's since the 290X is baffling for everyone lol. That's one thing I find perplexing about all of the Vega 20 Rumors I have seen:

-faster HBM
-double the bandwidth
-7nm node
-and NO increase in ROP , TMU, or SP count?!

I mean Vega isn't that bandwidth starved to the point that it's ONLY worth increasing the bandwidth by a factor of 2.5! Glofo's 7nm node allows the relative die size of chips to be nearly cut in half! Why not add at least 20% more Compute Units and 50% more ROP's?!

If you are going to increase the bandwidth by 150%, at least make it an RX Vega 80!

Believe it or not vega isnt rop starved at all, even w1zzard says this, if it was rop starved performance wouldnt close the gap like it does on higher resolutions, so i believe the rop argument is a myth

Captain_Tom · May 3, 2018

T4C Fantasy said:
Believe it or not vega isnt rop starved at all, even w1zzard says this, if it was rop starved performance wouldnt close the gap like it does on higher resolutions, so i believe the rop argument is a myth

Then explain to me why the Vega 24 (in the Intel + Vega APU) performs as well as the 570/1060 with less CU's and much lower clocks. Note: It has the SAME number of ROP's the 570 has!

There is absolutely no doubt that Vega 64/56 has less issues with its low ROP count than Polaris and Fiji had, but it clearly would at least aid in performance if there were relatively more ROP's. All I suggested was adding 50% more ROP's if they added 20% more CU's (and keep in mind 7nm will allow 20% higher core clocks too).

T4C Fantasy · May 3, 2018

Captain_Tom said:
Then explain to me why the Vega 24 (in the Intel + Vega APU) performs as well as the 570/1060 with less CU's and much lower clocks. Note: It has the SAME number of ROP's the 570 has!

There is absolutely no doubt that Vega 64/56 has less issues with its low ROP count than Polaris and Fiji had, but it clearly would at least aid in performance if there were relatively more ROP's. All I suggested was adding 50% more ROP's if they added 20% more CU's (and keep in mind 7nm will allow 20% higher core clocks too).

well first of all its Intel+Polaris APU xD no Vega specific features at all. just Polaris with HBM. and yeah it could benefit but its just not starved thats all.

AMD Radeon RX Vega M
Graphics/Compute: GFX8
Display Core Engine: 11.2
Unified Video Decoder: 6.3
Video Compression Engine: 3.4

REAL Vega
Graphics/Compute: GFX9
Display Core Engine: 12.0
Unified Video Decoder: 7.0
Video Compression Engine: 4.0

medi01 · May 3, 2018

So, what we got at the end of the day is allegedly 7nm chip running at who knows what clockspeed with more memory that is slower (in the given benchmark) than the older model of the same manufacturer.

Yey, excitement...

efikkan said:
My argument is that when the competition manages the same workload with significantly less of these specific hardware resources, then the problem is in managing the resources, not in the amount of resources. This is a solid argument.

You were responding to "is memory starved". "Competition isn't, so it isn't either" is by no means a solid argument.
Vega isn't mem starved because tests of Vega itself show so, not because of whatever some other companies out there somewhere do.

nemesis.ie · May 3, 2018

Given it's an ES and clocks etc. are not really known, no, we have not got a slower than the older model product at the end of the day.

We have a test run that may appear slower due to this ES being potentially clocked lower for testing, nothing more, nothing less.

If the clock reported is to be believed, it actually should be faster than the current product at the same clock speed said current product runs at - and it may clock even faster than that.

So yes, I think there is a reason for some excitement.

InVasMani · May 3, 2018

T4C Fantasy said:
Believe it or not vega isnt rop starved at all, even w1zzard says this, if it was rop starved performance wouldnt close the gap like it does on higher resolutions, so i believe the rop argument is a myth

A great ROPs test would be benchmarking Vega 56 with the HBM memory clocked up to 1GHz versus 1080Ti down clocked core to match Vega 56's pixel rate since 1080Ti and Vega 56 share a identical number of TMU's and Shaders. It's a very good apples to apples comparison to test from 1080p up to 4K and including 99 percentiles as well. It would be revealing seeing how much impact the ROPs difference has and at which point in the pipeline. I'd monitor the VRAM usage though to ensure the VEGA 56 isn't starved in that area. The ROPs deficiency is larger between the Frontier Edition and 1080Ti you are talking 24 more ROPs versus 8 if comparing Vega 56 to a 1070/1080 and Vega FE gets killed bad it's a total buzzkill card for gaming purposes in comparison. The power consumption would also be intriguing in the 1080Ti vs Vega 56 ROPs test it might give some insight on how that might decrease moving forward as more ROPs are added to GPUs.

jabbadap · May 3, 2018

MSAA uses ROPs too so test that. If I remember correctly Vega takes a bigger hit compared to nvidia when using 2x, 4x, 8x MSAA, of course that depends on game too.

ValenOne · Aug 23, 2018

Slizzo said:
Not really.

VEGA 64 has a memory bandwidth of 484GB/s
GTX 1080 Ti has a memory bandwidth of 484GB/s.

VEGA 64 is on a 2048bit bus, with a memory speed of 1890MHz effective.
1080Ti is on a 384bit bus, with an 11,008MHz effective memory speed.

VEGA 64 has GP104 level classic GPU hardware with GP102 level compute hardware.

TFLOPS alone is useless without read/write units which is TMUs and ROPS.

Vya Domus · Aug 23, 2018

rvalencia said:
VEGA 64 has GP104 level classic GPU hardware with GP102 level compute hardware.

TFLOPS alone is useless without read/write units which is TMUs and ROPS.

That's not how it works, ALUs are ALUs. Vega 10 is GP100 class.

Captain_Tom · Aug 25, 2018

rvalencia said:
VEGA 64 has GP104 level classic GPU hardware with GP102 level compute hardware.

TFLOPS alone is useless without read/write units which is TMUs and ROPS.

That's actually a very good way to put it. Vega was built to compete with Volta at a lower price, and that's about it. It traded assets gaming cards usually have to be the most efficient/cheap at compute.

Now if you utilize Async compute, FP16, and specifically program for Vega's rasterization engine - It can certainly compete with the Titan Xp very well, especially if you overclock the HBM. However that's really a best case scenario, and if you don't use those extra assets it just treads water above the 1080.

System Name	Whaaaat Kiiiiiiid!
Processor	Intel Core i9-12900K @ Default
Motherboard	Gigabyte Z690 AORUS Elite AX
Cooling	Corsair H150i AIO Cooler
Memory	Corsair Dominator Platinum 32GB DDR4-3200
Video Card(s)	EVGA GeForce RTX 3080 FTW3 ULTRA @ Default
Storage	Samsung 970 PRO 512GB + Crucial MX500 2TB x3 + Crucial MX500 4TB + Samsung 980 PRO 1TB
Display(s)	27" LG 27MU67-B 4K, + 27" Acer Predator XB271HU 1440P
Case	Thermaltake Core X9 Snow
Audio Device(s)	Logitech G935 Headset
Power Supply	SeaSonic Platinum 1050W Snow Silent
Mouse	Logitech G903 Lightspeed
Keyboard	Logitech G915
Software	Windows 11 Pro
Benchmark Scores	FFXV: 19329

Processor	AMD Ryzen 9 5900X \|\|\| Intel Core i7-3930K
Motherboard	ASUS ProArt B550-CREATOR \|\|\| Asus P9X79 WS
Cooling	Noctua NH-U14S \|\|\| Be Quiet Pure Rock
Memory	Crucial 2 x 16 GB 3200 MHz \|\|\| Corsair 8 x 8 GB 1333 MHz
Video Card(s)	MSI GTX 1060 3GB \|\|\| MSI GTX 680 4GB
Storage	Samsung 970 PRO 512 GB + 1 TB \|\|\| Intel 545s 512 GB + 256 GB
Display(s)	Asus ROG Swift PG278QR 27" \|\|\| Eizo EV2416W 24"
Case	Fractal Design Define 7 XL x 2
Audio Device(s)	Cambridge Audio DacMagic Plus
Power Supply	Seasonic Focus PX-850 x 2
Mouse	Razer Abyssus
Keyboard	CM Storm QuickFire XT
Software	Ubuntu

System Name	M3401 notebook
Processor	5600H
Motherboard	NA
Memory	16GB
Video Card(s)	3050
Storage	500GB SSD
Display(s)	14" OLED screen of the laptop
Software	Windows 10
Benchmark Scores	3050 scores good 15-20% lower than average, despite ASUS's claims that it has uber cooling.

Processor	Ryzen 9 7950X3D
Motherboard	MSI X670E MPG Carbon Wifi
Cooling	Custom loop, 2x360mm radiator,Lian Li UNI, EK XRes140,EK Velocity2
Memory	2x16GB G.Skill DDR5-6400 @ 6400MHz C32
Video Card(s)	EVGA RTX 3080 Ti FTW3 Ultra OC Scanner core +750 mem
Storage	MP600 Pro 2TB,960 EVO 1TB,XPG SX8200 Pro 1TB,Micron 1100 2TB,1.5TB Caviar Green
Display(s)	Alienware AW3423DWF, Acer XB270HU
Case	LianLi O11 Dynamic White
Audio Device(s)	Logitech G-Pro X Wireless
Power Supply	EVGA P3 1200W
Mouse	Logitech G502X Lightspeed
Keyboard	Logitech G512 Carbon w/ GX Brown
VR HMD	HP Reverb G2 (V2)
Software	Win 11

Processor	AMD Ryzen 9 5900X \|\|\| Intel Core i7-3930K
Motherboard	ASUS ProArt B550-CREATOR \|\|\| Asus P9X79 WS
Cooling	Noctua NH-U14S \|\|\| Be Quiet Pure Rock
Memory	Crucial 2 x 16 GB 3200 MHz \|\|\| Corsair 8 x 8 GB 1333 MHz
Video Card(s)	MSI GTX 1060 3GB \|\|\| MSI GTX 680 4GB
Storage	Samsung 970 PRO 512 GB + 1 TB \|\|\| Intel 545s 512 GB + 256 GB
Display(s)	Asus ROG Swift PG278QR 27" \|\|\| Eizo EV2416W 24"
Case	Fractal Design Define 7 XL x 2
Audio Device(s)	Cambridge Audio DacMagic Plus
Power Supply	Seasonic Focus PX-850 x 2
Mouse	Razer Abyssus
Keyboard	CM Storm QuickFire XT
Software	Ubuntu

AMD "Vega 20" with 32 GB HBM2 3DMark 11 Score Surfaces

T4C Fantasy

CPU & GPU DB Maintainer

efikkan

medi01

Slizzo

efikkan

jabbadap

FordGT90Concept

"I go fast!1!11!1!"

jabbadap

efikkan

Captain_Tom

T4C Fantasy

CPU & GPU DB Maintainer

nemesis.ie

InVasMani

FordGT90Concept

"I go fast!1!11!1!"

Captain_Tom

T4C Fantasy

CPU & GPU DB Maintainer

Captain_Tom

T4C Fantasy

CPU & GPU DB Maintainer

medi01

nemesis.ie

InVasMani

jabbadap

ValenOne

Vya Domus

Captain_Tom

System Name	BY-2021
Processor	AMD Ryzen 7 5800X (65w eco profile)
Motherboard	MSI B550 Gaming Plus
Cooling	Scythe Mugen (rev 5)
Memory	2 x Kingston HyperX DDR4-3200 32 GiB
Video Card(s)	AMD Radeon RX 7900 XT
Storage	Samsung 980 Pro, Seagate Exos X20 TB 7200 RPM
Display(s)	Nixeus NX-EDG274K (3840x2160@144 DP) + Samsung SyncMaster 906BW (1440x900@60 HDMI-DVI)
Case	Coolermaster HAF 932 w/ USB 3.0 5.25" bay + USB 3.2 (A+C) 3.5" bay
Audio Device(s)	Realtek ALC1150, Micca OriGen+
Power Supply	Enermax Platimax 850w
Mouse	Nixeus REVEL-X
Keyboard	Tesoro Excalibur
Software	Windows 10 Home 64-bit
Benchmark Scores	Faster than the tortoise; slower than the hare.

System Name	"Run of the mill" (except GPU)
Processor	R9 3900X
Motherboard	ASRock X470 Taich Ultimate
Cooling	Cryorig (not recommended)
Memory	32GB (2 x 16GB) Team 3200 MT/s, CL14
Video Card(s)	Radeon RX6900XT
Storage	Samsung 970 Evo plus 1TB NVMe
Display(s)	Samsung Q95T
Case	Define R5
Audio Device(s)	On board
Power Supply	Seasonic Prime 1000W
Mouse	Roccat Leadr
Keyboard	K95 RGB
Software	Windows 11 Pro x64, insider preview dev channel
Benchmark Scores	#1 worldwide on 3D Mark 99, back in the (P133) days. :)

System Name	Eula
Processor	AMD Ryzen 9 7900X PBO
Motherboard	ASUS TUF Gaming X670E Plus Wifi
Cooling	Corsair H150i Elite LCD XT White
Memory	Trident Z5 Neo RGB DDR5-6000 64GB (4x16GB F5-6000J3038F16GX2-TZ5NR) EXPO II, OCCT Tested
Video Card(s)	Gigabyte GeForce RTX 4080 GAMING OC
Storage	Corsair MP600 XT NVMe 2TB, Samsung 980 Pro NVMe 2TB, Toshiba N300 10TB HDD, Seagate Ironwolf 4T HDD
Display(s)	Acer Predator X32FP 32in 160Hz 4K FreeSync/GSync DP, LG 32UL950 32in 4K HDR FreeSync/G-Sync DP
Case	Phanteks Eclipse P500A D-RGB White
Audio Device(s)	Creative Sound Blaster Z
Power Supply	Corsair HX1000 Platinum 1000W
Mouse	SteelSeries Prime Pro Gaming Mouse
Keyboard	SteelSeries Apex 5
Software	MS Windows 11 Pro

System Name	Good enough
Processor	AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard	ASRock B650 Pro RS
Cooling	2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory	32GB - FURY Beast RGB 5600 Mhz
Video Card(s)	Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage	1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s)	LG UltraGear 32GN650-B + 4K Samsung TV
Case	Phanteks NV7
Power Supply	GPS-750C