• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

AMD's Next-Generation Radeon Instinct "Arcturus" Test Board Features 120 CUs

AleksandarK

News Editor
Staff member
Joined
Aug 19, 2017
Messages
2,576 (0.97/day)
AMD is preparing to launch its next-generation of Radeon Instinct GPUs based on the new CDNA architecture designed for enterprise deployments. Thanks to the popular hardware leaker _rogame (@_rogame) we have some information about the configuration of the upcoming Radeon Instinct MI100 "Arcturus" server GPU. Previously, we obtained the BIOS of the Arcturus GPU that showed a configuration of 128 Compute Units (CUs), which resulted in 8,192 of CDNA cores. That configuration had a specific setup of 1334 MHz GPU clock, SoC frequency of 1091 MHz, and memory speed of 1000 MHz. However, there was another GPU test board spotted which featured a bit different specification.

The reported configuration is an Arcturus GPU with 120 CUs, resulting in a CDNA core count of 7,680 cores. These cores are running at frequencies of 878 MHz for the core clock, 750 MHz SoC clock, and a surprising 1200 MHz memory clock. While the SoC and core clocks are lower than the previous report, along with the CU count, the memory clock is up by 200 MHz. It is important to note that this is just a test board/variation of the MI100, and actual frequencies should be different.


View at TechPowerUp Main Site
 
Joined
Feb 19, 2009
Messages
1,162 (0.20/day)
Location
I live in Norway
Processor R9 5800x3d | R7 3900X | 4800H | 2x Xeon gold 6142
Motherboard Asrock X570M | AB350M Pro 4 | Asus Tuf A15
Cooling Air | Air | duh laptop
Memory 64gb G.skill SniperX @3600 CL16 | 128gb | 32GB | 192gb
Video Card(s) RTX 4080 |Quadro P5000 | RTX2060M
Storage Many drives
Display(s) AW3423dwf.
Case Jonsbo D41
Power Supply Corsair RM850x
Mouse g502 Lightspeed
Keyboard G913 tkl
Software win11, proxmox
I just hope the drivers get more polished... I'm still leaning toward Nvidia for the GPU, until YouTubers like GamersNexus say otherwise and that drivers have improved tenfold and are fully stable / equivalent to Nvidia.

this wont be in your hand, no problems :p
this is a datacenter only card.
 

ARF

Joined
Jan 28, 2020
Messages
4,670 (2.66/day)
Location
Ex-usa | slava the trolls
this wont be in your hand, no problems :p
this is a datacenter only card.

Yes, Navi 2X will be the gaming-centric lineup, while Arcturus is for High Performance Computing (HPC) only.

I just hope the drivers get more polished... I'm still leaning toward Nvidia for the GPU, until YouTubers like GamersNexus say otherwise and that drivers have improved tenfold and are fully stable / equivalent to Nvidia.

The drivers are fine.
 

Cheeseball

Not a Potato
Supporter
Joined
Jan 2, 2009
Messages
1,995 (0.34/day)
Location
Pittsburgh, PA
System Name Titan
Processor AMD Ryzen™ 7 7950X3D
Motherboard ASRock X870 Taichi Lite
Cooling Thermalright Phantom Spirit 120 EVO CPU
Memory TEAMGROUP T-Force Delta RGB 2x16GB DDR5-6000 CL30
Video Card(s) ASRock Radeon RX 7900 XTX 24 GB GDDR6 (MBA) / NVIDIA RTX 4090 Founder's Edition
Storage Crucial T500 2TB x 3
Display(s) LG 32GS95UE-B, ASUS ROG Swift OLED (PG27AQDP), LG C4 42" (OLED42C4PUA)
Case HYTE Hakos Baelz Y60
Audio Device(s) Kanto Audio YU2 and SUB8 Desktop Speakers and Subwoofer, Cloud Alpha Wireless
Power Supply Corsair SF1000L
Mouse Logitech Pro Superlight 2 (White), G303 Shroud Edition
Keyboard Wooting 60HE+ / 8BitDo Retro Mechanical Keyboard (N Edition) / NuPhy Air75 v2
VR HMD Occulus Quest 2 128GB
Software Windows 11 Pro 64-bit 23H2 Build 22631.4317
Joined
Nov 6, 2016
Messages
1,750 (0.60/day)
Location
NH, USA
System Name Lightbringer
Processor Ryzen 7 2700X
Motherboard Asus ROG Strix X470-F Gaming
Cooling Enermax Liqmax Iii 360mm AIO
Memory G.Skill Trident Z RGB 32GB (8GBx4) 3200Mhz CL 14
Video Card(s) Sapphire RX 5700XT Nitro+
Storage Hp EX950 2TB NVMe M.2, HP EX950 1TB NVMe M.2, Samsung 860 EVO 2TB
Display(s) LG 34BK95U-W 34" 5120 x 2160
Case Lian Li PC-O11 Dynamic (White)
Power Supply BeQuiet Straight Power 11 850w Gold Rated PSU
Mouse Glorious Model O (Matte White)
Keyboard Royal Kludge RK71
Software Windows 10
This possibly means that RDNA2 will be able to have more than 64/60 CUs, right? I just hope AMD releases some GPUs with CU counts of 80, 96, or even higher... I want them to release the beast
 

ARF

Joined
Jan 28, 2020
Messages
4,670 (2.66/day)
Location
Ex-usa | slava the trolls
Adrenalin 2020 is not fine yet. RTG still has a lot to improve on their end.

What do you mean? What problems do you have and have you reported them via the support centre?
 
Joined
Dec 16, 2017
Messages
2,910 (1.15/day)
System Name System V
Processor AMD Ryzen 5 3600
Motherboard Asus Prime X570-P
Cooling Cooler Master Hyper 212 // a bunch of 120 mm Xigmatek 1500 RPM fans (2 ins, 3 outs)
Memory 2x8GB Ballistix Sport LT 3200 MHz (BLS8G4D32AESCK.M8FE) (CL16-18-18-36)
Video Card(s) Gigabyte AORUS Radeon RX 580 8 GB
Storage SHFS37A240G / DT01ACA200 / ST10000VN0008 / ST8000VN004 / SA400S37960G / SNV21000G / NM620 2TB
Display(s) LG 22MP55 IPS Display
Case NZXT Source 210
Audio Device(s) Logitech G430 Headset
Power Supply Corsair CX650M
Software Whatever build of Windows 11 is being served in Canary channel at the time.
Benchmark Scores Corona 1.3: 3120620 r/s Cinebench R20: 3355 FireStrike: 12490 TimeSpy: 4624
What do you mean? What problems do you have and have you reported them via the support centre?

Probably the somewhat large list of known issues:
2020-04-23 09 17 39.png
 
Joined
Jul 19, 2016
Messages
482 (0.16/day)
This possibly means that RDNA2 will be able to have more than 64/60 CUs, right? I just hope AMD releases some GPUs with CU counts of 80, 96, or even higher... I want them to release the beast

This has so many CUs as they are more important for these types of card, and is where CDNA will differ from RDNA I suspect.

Higher clockspeeds with fewer CUs is the route they will take (up to 64) as they will also add accelerator engines inside RDNA2 GPUs at the expense of CUs like Nvidia did with Tensor Cores. Fixed function accelerators are more important to performance rather than just more CUs after a certain point.
 
Joined
Mar 6, 2011
Messages
155 (0.03/day)
Probably the somewhat large list of known issues:
View attachment 152494

Ever visited the NVIDIA forums? There are vast numbers of issues.

The 2020 drivers were initially terrible. They're very stable now, though the Radeon Software control panel still isn't 100% - why they feel the need to remove or rejig half its content with each Adrenaline release, then re-add it, I will never understand.

this wont be in your hand, no problems :p
this is a datacenter only card.

You could buy one. But it probably won't be much good to you. It's not like Radeon VII or Titan Volta .. there's no raster engine. These are dedicated HPC / ML cards. Not rendering / graphics acceleration.

It will be interesting to see what form future Fire or Pro cards take from AMD.

This possibly means that RDNA2 will be able to have more than 64/60 CUs, right? I just hope AMD releases some GPUs with CU counts of 80, 96, or even higher... I want them to release the beast


RDNA1 could have had more. This was confirmed almost a year ago at Computex. Changes in RDNA (& now CDNA) meant that configurations above 64 CUs would no longer suffer severe bottlenecks.
 
Joined
Mar 18, 2008
Messages
5,717 (0.94/day)
System Name Virtual Reality / Bioinformatics
Processor Undead CPU
Motherboard Undead TUF X99
Cooling Noctua NH-D15
Memory GSkill 128GB DDR4-3000
Video Card(s) EVGA RTX 3090 FTW3 Ultra
Storage Samsung 960 Pro 1TB + 860 EVO 2TB + WD Black 5TB
Display(s) 32'' 4K Dell
Case Fractal Design R5
Audio Device(s) BOSE 2.0
Power Supply Seasonic 850watt
Mouse Logitech Master MX
Keyboard Corsair K70 Cherry MX Blue
VR HMD HTC Vive + Oculus Quest 2
Software Windows 10 P
For GPU compute RTG has a tough battle to fight as Nvidia has already entrenched itself deep in the ML/DL/AI market. RTG need better software support, way better than the shit they have been using for the past several years. OpenCL for RTG GPU is zombie at best. Vulkan compute has yet to see any real momentum.

A good ecosystem is HW+SW. Get your shit together RTG on the software.
 
Joined
Mar 6, 2011
Messages
155 (0.03/day)
For GPU compute RTG has a tough battle to fight as Nvidia has already entrenched itself deep in the ML/DL/AI market. RTG need better software support, way better than the shit they have been using for the past several years. OpenCL for RTG GPU is zombie at best. Vulkan compute has yet to see any real momentum.

A good ecosystem is HW+SW. Get your shit together RTG on the software.

If it were as bad as you make out, I don't think they'd be winning the huge contracts that they are doing ...
 

ARF

Joined
Jan 28, 2020
Messages
4,670 (2.66/day)
Location
Ex-usa | slava the trolls
Probably the somewhat large list of known issues:
View attachment 152494


Almost all contain the word "may" which translated to you means that it either will happen or most likely won't happen.
I also asked another person about his own experience. I am not even sure he has a running Radeon card to report about.
 
Joined
Mar 9, 2020
Messages
80 (0.05/day)
Probably the somewhat large list of known issues:
View attachment 152494

i've been running an RX5700 for a couple of months now.
Zero issues. Best bang-for-buck at the moment.

My previous card was a GTX960 which was a bit flaky at first - but went on to give faultless service for 9 years.
I went the on the AMD forums to see any potential problems before I bought the rx5700, and most of the problems were down to Windows 10 silently updating the drivers
Or plain stupidity and user error.
 
Joined
Dec 16, 2017
Messages
2,910 (1.15/day)
System Name System V
Processor AMD Ryzen 5 3600
Motherboard Asus Prime X570-P
Cooling Cooler Master Hyper 212 // a bunch of 120 mm Xigmatek 1500 RPM fans (2 ins, 3 outs)
Memory 2x8GB Ballistix Sport LT 3200 MHz (BLS8G4D32AESCK.M8FE) (CL16-18-18-36)
Video Card(s) Gigabyte AORUS Radeon RX 580 8 GB
Storage SHFS37A240G / DT01ACA200 / ST10000VN0008 / ST8000VN004 / SA400S37960G / SNV21000G / NM620 2TB
Display(s) LG 22MP55 IPS Display
Case NZXT Source 210
Audio Device(s) Logitech G430 Headset
Power Supply Corsair CX650M
Software Whatever build of Windows 11 is being served in Canary channel at the time.
Benchmark Scores Corona 1.3: 3120620 r/s Cinebench R20: 3355 FireStrike: 12490 TimeSpy: 4624
Almost all contain the word "may" which translated to you means that it either will happen or most likely won't happen.
I also asked another person about his own experience. I am not even sure he has a running Radeon card to report about.
i've been running an RX5700 for a couple of months now.
Zero issues. Best bang-for-buck at the moment.

My previous card was a GTX960 which was a bit flaky at first - but went on to give faultless service for 9 years.
I went the on the AMD forums to see any potential problems before I bought the rx5700, and most of the problems were down to Windows 10 silently updating the drivers
Or plain stupidity and user error.

I was referencing mostly Cheeseball's post. Honestly, in spite of running a preview build of Windows 10 and using the Radeon beta drivers, I haven't run into problems, granted, it's an RX 580, so I guess it's mostly polished by now, but so far it's been rather solid (or I simply don't fit into the scenarios where problems happen)
 
Joined
May 2, 2017
Messages
7,762 (2.81/day)
Location
Back in Norway
System Name Hotbox
Processor AMD Ryzen 7 5800X, 110/95/110, PBO +150Mhz, CO -7,-7,-20(x6),
Motherboard ASRock Phantom Gaming B550 ITX/ax
Cooling LOBO + Laing DDC 1T Plus PWM + Corsair XR5 280mm + 2x Arctic P14
Memory 32GB G.Skill FlareX 3200c14 @3800c15
Video Card(s) PowerColor Radeon 6900XT Liquid Devil Ultimate, UC@2250MHz max @~200W
Storage 2TB Adata SX8200 Pro
Display(s) Dell U2711 main, AOC 24P2C secondary
Case SSUPD Meshlicious
Audio Device(s) Optoma Nuforce μDAC 3
Power Supply Corsair SF750 Platinum
Mouse Logitech G603
Keyboard Keychron K3/Cooler Master MasterKeys Pro M w/DSA profile caps
Software Windows 10 Pro
Adrenalin 2020 is not fine yet. RTG still has a lot to improve on their end.
Never had any issues with these drivers on my RX 570 or Fury X. YMMV, but the issues are way overblown. All GPU drivers at all times have significant lists of known bugs that might occur.
This possibly means that RDNA2 will be able to have more than 64/60 CUs, right? I just hope AMD releases some GPUs with CU counts of 80, 96, or even higher... I want them to release the beast
The 64CU hard architectural limit of GCN disappeared with the launch of RDNA, they simply haven't made a large enough GPU to demonstrate that yet. The top end RDNA 2 GPU will undoubtedly have more than 64 CUs.
This has so many CUs as they are more important for these types of card, and is where CDNA will differ from RDNA I suspect.

Higher clockspeeds with fewer CUs is the route they will take (up to 64) as they will also add accelerator engines inside RDNA2 GPUs at the expense of CUs like Nvidia did with Tensor Cores. Fixed function accelerators are more important to performance rather than just more CUs after a certain point.
That is pure nonsense. The 64 CU limit was the main reason why Vega lagged so far behind Nvidia in both absolute performance and efficiency - while Nvidia was consistently increasing core counts per generation, AMD couldn't and thus had to push clocks ever higher to eke out every last piece of performance they could no matter the power cost. If AMD could have made an 80 CU Vega card, it would have been much more competitive than the Vega 64 and 56 as they could have run it at much more efficient clocks while still performing better. While RDNA of course has increased per-CU performance significantly, there's no way 64 CUs with a clock bump will allow them to compete in the high end in the future. Going wider and keeping clocks reasonable is by far the superior way to create the most powerful GPU you can within a reasonable thermal envelope.

It sounds like you've bought into Mark Cerny's stupid "faster clocks outperforms more CUs!" marketing nonsense - a statement disproved by any OC GPU benchmark (performance doesn't even increase linearly with clocks, while Cerny's statement would need performance to increase by more than clock speeds to be true). Look at how an 80W 2080 Max-Q performs compared to a 2060 (not max-Q) mobile - the 2080 is way faster at much lower clocks.
 
Joined
Apr 10, 2010
Messages
1,858 (0.35/day)
Location
London
System Name Jaspe
Processor Ryzen 1500X
Motherboard Asus ROG Strix X370-F Gaming
Cooling Stock
Memory 16Gb Corsair 3000mhz
Video Card(s) EVGA GTS 450
Storage Crucial M500
Display(s) Philips 1080 24'
Case NZXT
Audio Device(s) Onboard
Power Supply Enermax 425W
Software Windows 10 Pro
My previous card was a GTX960 which was a bit flaky at first - but went on to give faultless service for 9 years.

Wasn't the GTX 960 released in 2015?
 

M2B

Joined
Jun 2, 2017
Messages
284 (0.10/day)
Location
Iran
Processor Intel Core i5-8600K @4.9GHz
Motherboard MSI Z370 Gaming Pro Carbon
Cooling Cooler Master MasterLiquid ML240L RGB
Memory XPG 8GBx2 - 3200MHz CL16
Video Card(s) Asus Strix GTX 1080 OC Edition 8G 11Gbps
Storage 2x Samsung 850 EVO 1TB
Display(s) BenQ PD3200U
Case Thermaltake View 71 Tempered Glass RGB Edition
Power Supply EVGA 650 P2
Never had any issues with these drivers on my RX 570 or Fury X. YMMV, but the issues are way overblown. All GPU drivers at all times have significant lists of known bugs that might occur.

The 64CU hard architectural limit of GCN disappeared with the launch of RDNA, they simply haven't made a large enough GPU to demonstrate that yet. The top end RDNA 2 GPU will undoubtedly have more than 64 CUs.

That is pure nonsense. The 64 CU limit was the main reason why Vega lagged so far behind Nvidia in both absolute performance and efficiency - while Nvidia was consistently increasing core counts per generation, AMD couldn't and thus had to push clocks ever higher to eke out every last piece of performance they could no matter the power cost. If AMD could have made an 80 CU Vega card, it would have been much more competitive than the Vega 64 and 56 as they could have run it at much more efficient clocks while still performing better. While RDNA of course has increased per-CU performance significantly, there's no way 64 CUs with a clock bump will allow them to compete in the high end in the future. Going wider and keeping clocks reasonable is by far the superior way to create the most powerful GPU you can within a reasonable thermal envelope.

It sounds like you've bought into Mark Cerny's stupid "faster clocks outperforms more CUs!" marketing nonsense - a statement disproved by any OC GPU benchmark (performance doesn't even increase linearly with clocks, while Cerny's statement would need performance to increase by more than clock speeds to be true). Look at how an 80W 2080 Max-Q performs compared to a 2060 (not max-Q) mobile - the 2080 is way faster at much lower clocks.

That's not entirely true.
Vega scales really bad with higher shader count after a certain point.
To run a 80CU Vega (14nm) GPU at reasonable power levels you probably have to clock it at around 1.1 to 1.15GHz range which is just stupid and unbalanced for a gaming GPU, and the worse part is that instead of a 480~ mm squared die you have a much more expensive to make 600~ mm squared die. (and at the end of the day all you got was maybe 5-10% better gaming performance)
CU count was not the reason why Vega lagged so much behind Nvidia, It was purely down to efficiency deficit.
Just look at how good Radeon VII with fewer shaders performs in comparison to the Vega 64, that extra memory bandwidth helps of course but the extra 300MHz higher clock is the primary reason.
 
Last edited:

Cheeseball

Not a Potato
Supporter
Joined
Jan 2, 2009
Messages
1,995 (0.34/day)
Location
Pittsburgh, PA
System Name Titan
Processor AMD Ryzen™ 7 7950X3D
Motherboard ASRock X870 Taichi Lite
Cooling Thermalright Phantom Spirit 120 EVO CPU
Memory TEAMGROUP T-Force Delta RGB 2x16GB DDR5-6000 CL30
Video Card(s) ASRock Radeon RX 7900 XTX 24 GB GDDR6 (MBA) / NVIDIA RTX 4090 Founder's Edition
Storage Crucial T500 2TB x 3
Display(s) LG 32GS95UE-B, ASUS ROG Swift OLED (PG27AQDP), LG C4 42" (OLED42C4PUA)
Case HYTE Hakos Baelz Y60
Audio Device(s) Kanto Audio YU2 and SUB8 Desktop Speakers and Subwoofer, Cloud Alpha Wireless
Power Supply Corsair SF1000L
Mouse Logitech Pro Superlight 2 (White), G303 Shroud Edition
Keyboard Wooting 60HE+ / 8BitDo Retro Mechanical Keyboard (N Edition) / NuPhy Air75 v2
VR HMD Occulus Quest 2 128GB
Software Windows 11 Pro 64-bit 23H2 Build 22631.4317
Never had any issues with these drivers on my RX 570 or Fury X. YMMV, but the issues are way overblown. All GPU drivers at all times have significant lists of known bugs that might occur.

This is in regards to the newer Navi cards. I can attest that the drivers work for my RX 5700 XT now but as you can see in various forums there are still various reports of TDR/blackscreen issues that AMD is doing their best to address. The older Polaris and Vega cards should be doing fine now but I have also seen some recent reports that say otherwise.

I've experienced the same issues when I had a HD 7870 "XT" (1536 shaders) Tahiti LE back in 2013. They eventually got better over time (addressing bugs) and had further improvements in some games ("Fine Wine").
 
Joined
Dec 16, 2017
Messages
2,910 (1.15/day)
System Name System V
Processor AMD Ryzen 5 3600
Motherboard Asus Prime X570-P
Cooling Cooler Master Hyper 212 // a bunch of 120 mm Xigmatek 1500 RPM fans (2 ins, 3 outs)
Memory 2x8GB Ballistix Sport LT 3200 MHz (BLS8G4D32AESCK.M8FE) (CL16-18-18-36)
Video Card(s) Gigabyte AORUS Radeon RX 580 8 GB
Storage SHFS37A240G / DT01ACA200 / ST10000VN0008 / ST8000VN004 / SA400S37960G / SNV21000G / NM620 2TB
Display(s) LG 22MP55 IPS Display
Case NZXT Source 210
Audio Device(s) Logitech G430 Headset
Power Supply Corsair CX650M
Software Whatever build of Windows 11 is being served in Canary channel at the time.
Benchmark Scores Corona 1.3: 3120620 r/s Cinebench R20: 3355 FireStrike: 12490 TimeSpy: 4624

Cheeseball

Not a Potato
Supporter
Joined
Jan 2, 2009
Messages
1,995 (0.34/day)
Location
Pittsburgh, PA
System Name Titan
Processor AMD Ryzen™ 7 7950X3D
Motherboard ASRock X870 Taichi Lite
Cooling Thermalright Phantom Spirit 120 EVO CPU
Memory TEAMGROUP T-Force Delta RGB 2x16GB DDR5-6000 CL30
Video Card(s) ASRock Radeon RX 7900 XTX 24 GB GDDR6 (MBA) / NVIDIA RTX 4090 Founder's Edition
Storage Crucial T500 2TB x 3
Display(s) LG 32GS95UE-B, ASUS ROG Swift OLED (PG27AQDP), LG C4 42" (OLED42C4PUA)
Case HYTE Hakos Baelz Y60
Audio Device(s) Kanto Audio YU2 and SUB8 Desktop Speakers and Subwoofer, Cloud Alpha Wireless
Power Supply Corsair SF1000L
Mouse Logitech Pro Superlight 2 (White), G303 Shroud Edition
Keyboard Wooting 60HE+ / 8BitDo Retro Mechanical Keyboard (N Edition) / NuPhy Air75 v2
VR HMD Occulus Quest 2 128GB
Software Windows 11 Pro 64-bit 23H2 Build 22631.4317
What do you mean? What problems do you have and have you reported them via the support centre?

I've had problems with my RX 5700 XT last year, and with @INSTG8R and other Vanguard members help I was able to provide information to AMD's development team directly, which may have contributed to the TDR/blackscreen fixes in 19.10.x on-wards.

I have no problems with the earlier 20.x releases, except for Enhanced Sync which AMD keeps messing up for some reason. However I've foregone the mainline release for the more stable Radeon Pro drivers (20.Q1.2) since I don't have to deal with any of the extra bloat that Adrenalin installs (which I've been reporting to keep separate during install).

Like I said, Adrenalin 2020 is not fine yet. Since you're running on GCN5 hardware, you shouldn't be experiencing any major issues compared to some of the Navi owners.

Almost all contain the word "may" which translated to you means that it either will happen or most likely won't happen.
I also asked another person about his own experience. I am not even sure he has a running Radeon card to report about.

Are you talking about me? Please review my System Specs if you're in doubt. I even have a "unique" configuration.
 

ARF

Joined
Jan 28, 2020
Messages
4,670 (2.66/day)
Location
Ex-usa | slava the trolls
That's not entirely true.
Vega scales really bad with higher shader count after a certain point.
To run a 80CU Vega (14nm) GPU at reasonable power levels you probably have to clock it at around 1.1 to 1.15GHz range which is just stupid and unbalanced for a gaming GPU, and the worse part is that instead of a 480~ mm squared die you have a much more expensive to make 600~ mm squared die. (and at the end of the day all you got was maybe 5-10% better gaming performance)
CU count was not the reason why Vega lagged so much behind Nvidia, It was purely down to efficiency deficit.
Just look at how good Radeon VII with fewer shaders performs in comparison to the Vega 64, that extra memory bandwidth helps of course but the extra 300MHz higher clock is the primary reason.


Vega 64 is bad because its shaders are not fed properly. You have 40-50% of its shaders not receiving any work to do and thus sit idle.
This is part because of its design with many compromises - it was designed for higher throughput which is good for pure number crunching in high performance computing loads but games unfortunately don't care too much about it.
 
Joined
May 2, 2017
Messages
7,762 (2.81/day)
Location
Back in Norway
System Name Hotbox
Processor AMD Ryzen 7 5800X, 110/95/110, PBO +150Mhz, CO -7,-7,-20(x6),
Motherboard ASRock Phantom Gaming B550 ITX/ax
Cooling LOBO + Laing DDC 1T Plus PWM + Corsair XR5 280mm + 2x Arctic P14
Memory 32GB G.Skill FlareX 3200c14 @3800c15
Video Card(s) PowerColor Radeon 6900XT Liquid Devil Ultimate, UC@2250MHz max @~200W
Storage 2TB Adata SX8200 Pro
Display(s) Dell U2711 main, AOC 24P2C secondary
Case SSUPD Meshlicious
Audio Device(s) Optoma Nuforce μDAC 3
Power Supply Corsair SF750 Platinum
Mouse Logitech G603
Keyboard Keychron K3/Cooler Master MasterKeys Pro M w/DSA profile caps
Software Windows 10 Pro
That's not entirely true.
Vega scales really bad with higher shader count after a certain point.
To run a 80CU Vega (14nm) GPU at reasonable power levels you probably have to clock it at around 1.1 to 1.15GHz range which is just stupid and unbalanced for a gaming GPU, and the worse part is that instead of a 480~ mm squared die you have a much more expensive to make 600~ mm squared die. (and at the end of the day all you got was maybe 5-10% better gaming performance)
Just look at how good Radeon VII with fewer shaders performs in comparison to the Vega 64, that extra memory bandwidth helps of course but the extra 300MHz higher clock is the primary reason.
Please explain how lower clocks make a GPU "unbalanced" for gaming. Because physics and real performance data significantly disagrees with you. If that was indeed the case, how does an RTX 2080 Max-Q (with very low clocks) at 80W perform ~50% faster than an RTX 2060 (mobile, non Max-Q, with higher clocks) at the same power draw? Again, you seem to be presenting thinking related to that baseless and false Mark Cerny argument. Wide and slow is nearly always more performant than narrow and fast in the GPU space.
CU count was not the reason why Vega lagged so much behind Nvidia, It was purely down to efficiency deficit.
Please explain how the "efficiency deficit" is somehow unrelated to clock being pushed far past the ideal operating range of this architecture on the node in question - because if you're disagreeing with me (which it seems you are), that must somehow be the case. After all, I did say
AMD couldn't [increase the number of CUs] and thus had to push clocks ever higher to eke out every last piece of performance they could no matter the power cost.
This is directly due to the hard CU limit in GCN, and due to voltage and power draw increasing nonlinearly alongside frequency, even small clock speed drops can lead to dramatic improvements in efficiency. A cursory search at people experimenting with downclocking and undervolting their Vega cards will show you that these cards can be much, much more efficient even with minor clock speed drops. Of course one could argue that undervolting results are unrealistic when speaking of a theoretical mass produced card, but the counterargument then is that the existing cards are effectively factory overvolted, as the chips are pushed so high up their clock/voltage curve that a lot of extra voltage is needed to ensure sufficient yields at those clocks, as a significant portion of GPUs would otherwise fail to reach the necessary speeds. Dropping clocks even by 200MHz would allow for quite dramatic voltage drops. 200MHz would be a drop of about 13%, but would likely lead to a power drop of ~25-30%. Which would allow for ~25-30% more CUs within the same power budget (if that was architecturally possible), which would then increase performance far beyond what is lost from the clock speed drop. That is how you make the best performing GPU within a given power budget: by making it as wide as possible (while ensuring it has the memory to keep it fed) while keeping clock speeds around peak efficiency levels.

Now this is of course not to say that Vega doesn't also have an architectural efficiency disadvantage to Pascal and Turing - it definitely does - but pushing it way past its efficiency sweet spot just compounded this issue, making it far worse than it might have been.

And of course this would also lead to increased die sizes - but then they would have some actual performance gains when moving to a new node, rather than the outright flop that was the Radeon VII. Now that GPU was never meant for gaming at all, but it nonetheless performs terribly for what it is - a full node shrink and then some over its predecessor. Why? Again, because the architecture didn't allow them to build a wider die, meaning that the only way of increasing performance was pushing clocks as high as they could. Now, the VII has 60 CUs and not 64, that is true, but that is solely down to it being a short-term niche card made for salvaging faulty chips, with all fully enabled dice going to the datacenter/HPC market where this GPU actually had some qualities.

If AMD could have moved past 64 CUs with Vega, that lineup would have been much more competitive in terms of absolute performance. It wouldn't have been cheap, but it would have been better than what we got - and we could have gotten a proper full lineup rather than the two GPUs AMD ended up making. Luckily it looks like this is happening with RDNA 2 now that the 64 CU limit is gone.

So, tl;dr: the 64 CU architectural limit of GCN has been a major problem for AMD GPUs ever since they maxed it out back in 2015 - it left them with no way forward outside of sacrificing efficiency at every turn.
 

M2B

Joined
Jun 2, 2017
Messages
284 (0.10/day)
Location
Iran
Processor Intel Core i5-8600K @4.9GHz
Motherboard MSI Z370 Gaming Pro Carbon
Cooling Cooler Master MasterLiquid ML240L RGB
Memory XPG 8GBx2 - 3200MHz CL16
Video Card(s) Asus Strix GTX 1080 OC Edition 8G 11Gbps
Storage 2x Samsung 850 EVO 1TB
Display(s) BenQ PD3200U
Case Thermaltake View 71 Tempered Glass RGB Edition
Power Supply EVGA 650 P2
Please explain how lower clocks make a GPU "unbalanced" for gaming. Because physics and real performance data significantly disagrees with you. If that was indeed the case, how does an RTX 2080 Max-Q (with very low clocks) at 80W perform ~50% faster than an RTX 2060 (mobile, non Max-Q, with higher clocks) at the same power draw? Again, you seem to be presenting thinking related to that baseless and false Mark Cerny argument. Wide and slow is nearly always more performant than narrow and fast in the GPU space.

Please explain how the "efficiency deficit" is somehow unrelated to clock being pushed far past the ideal operating range of this architecture on the node in question - because if you're disagreeing with me (which it seems you are), that must somehow be the case. After all, I did say

This is directly due to the hard CU limit in GCN, and due to voltage and power draw increasing nonlinearly alongside frequency, even small clock speed drops can lead to dramatic improvements in efficiency. A cursory search at people experimenting with downclocking and undervolting their Vega cards will show you that these cards can be much, much more efficient even with minor clock speed drops. Of course one could argue that undervolting results are unrealistic when speaking of a theoretical mass produced card, but the counterargument then is that the existing cards are effectively factory overvolted, as the chips are pushed so high up their clock/voltage curve that a lot of extra voltage is needed to ensure sufficient yields at those clocks, as a significant portion of GPUs would otherwise fail to reach the necessary speeds. Dropping clocks even by 200MHz would allow for quite dramatic voltage drops. 200MHz would be a drop of about 13%, but would likely lead to a power drop of ~25-30%. Which would allow for ~25-30% more CUs within the same power budget (if that was architecturally possible), which would then increase performance far beyond what is lost from the clock speed drop. That is how you make the best performing GPU within a given power budget: by making it as wide as possible (while ensuring it has the memory to keep it fed) while keeping clock speeds around peak efficiency levels.

Now this is of course not to say that Vega doesn't also have an architectural efficiency disadvantage to Pascal and Turing - it definitely does - but pushing it way past its efficiency sweet spot just compounded this issue, making it far worse than it might have been.

And of course this would also lead to increased die sizes - but then they would have some actual performance gains when moving to a new node, rather than the outright flop that was the Radeon VII. Now that GPU was never meant for gaming at all, but it nonetheless performs terribly for what it is - a full node shrink and then some over its predecessor. Why? Again, because the architecture didn't allow them to build a wider die, meaning that the only way of increasing performance was pushing clocks as high as they could. Now, the VII has 60 CUs and not 64, that is true, but that is solely down to it being a short-term niche card made for salvaging faulty chips, with all fully enabled dice going to the datacenter/HPC market where this GPU actually had some qualities.

If AMD could have moved past 64 CUs with Vega, that lineup would have been much more competitive in terms of absolute performance. It wouldn't have been cheap, but it would have been better than what we got - and we could have gotten a proper full lineup rather than the two GPUs AMD ended up making. Luckily it looks like this is happening with RDNA 2 now that the 64 CU limit is gone.

So, tl;dr: the 64 CU architectural limit of GCN has been a major problem for AMD GPUs ever since they maxed it out back in 2015 - it left them with no way forward outside of sacrificing efficiency at every turn.

First of all, AMD themselves stated that there was no such a thing as 64CU architectural limit on GCN, secondly, If there was indeed such a limitation AMD could have solved that in so many years.
My comment on Performance scaling with higher clocks doesn't have anything to do with what cerny says.

You clearly don't properly understand how GPUs work.
And no, The 80W 2080 Max-Q is nowhere near 50% faster than a Non-MaxQ mobile 2060, where are you getting that from? even if true, a big part of that efficiency difference could be due to binning and using higher quality chips for the higher-end GPU.

Just take a look at how an RX 5700 performs in comparison to the 5700 XT at similar clocks, the 5700 XT ends up being 6% faster while having 11% more shaders.
Another good example is how the 2080Ti with 41% more shaders performs compared the 2080 Super, only 20% faster at 4K.
Performance scaling with higher clocks have always been and will be more linear than performance increase with more shaders in gaming-like workloads. If not, the GTX 1070 with a massive 14CU deficit couldn't match or beat the 980Ti with a similar architecture.
Just do the math:
1070 => 1920 * 1800MHz*2= 6.9 TFLOPS
980Ti => 2816 * 1250MHz*2= 7 TFLOPS
Yet the 1070 performs around 12% better (when comparing reference vs reference, obviously the 980Ti has more OC headroom)


relative-performance_2560-1440.png
 
Last edited:
Joined
May 2, 2017
Messages
7,762 (2.81/day)
Location
Back in Norway
System Name Hotbox
Processor AMD Ryzen 7 5800X, 110/95/110, PBO +150Mhz, CO -7,-7,-20(x6),
Motherboard ASRock Phantom Gaming B550 ITX/ax
Cooling LOBO + Laing DDC 1T Plus PWM + Corsair XR5 280mm + 2x Arctic P14
Memory 32GB G.Skill FlareX 3200c14 @3800c15
Video Card(s) PowerColor Radeon 6900XT Liquid Devil Ultimate, UC@2250MHz max @~200W
Storage 2TB Adata SX8200 Pro
Display(s) Dell U2711 main, AOC 24P2C secondary
Case SSUPD Meshlicious
Audio Device(s) Optoma Nuforce μDAC 3
Power Supply Corsair SF750 Platinum
Mouse Logitech G603
Keyboard Keychron K3/Cooler Master MasterKeys Pro M w/DSA profile caps
Software Windows 10 Pro
First of all, AMD themselves stated that there was no such a thing as 64CU architectural limit on GCN, secondly, If there was indeed such a limitation AMD could have solved that in so many years.
My comment on Performance scaling with higher clocks doesn't have anything to do with what cerny says.

You clearly don't properly understand how GPUs work.
And no, The 80W 2080 Max-Q is nowhere near 50% faster than a Non-MaxQ mobile 2060, where are you getting that from? even if true, a big part of that efficiency difference could be due to binning and using higher quality chips for the higher-end GPU.

Just take a look at how an RX 5700 performs in comparison to the 5700 XT at similar clocks, the 5700 XT ends up being 6% faster while having 11% more shaders.
Another good example is how the 2080Ti with 41% more shaders performs compared the 2080 Super, only 20% faster at 4K.
Performance scaling with higher clocks have always been and will be more linear than performance increase with more shaders in gaming-like workloads. If not, the GTX 1070 with a massive 14CU deficit couldn't match or beat the 980Ti with a similar architecture.
Just do the math:
1070 => 1920 * 1800MHz*2= 6.9 TFLOPS
980Ti => 2816 * 1250MHz*2= 7 TFLOPS
Yet the 1070 performs around 12% better (when comparing reference vs reference, obviously the 980Ti has more OC headroom)


View attachment 152596
You're right about that 50% number - I mixed up the numbers for the 2060 Max-Q and the normal mobile 2060 - too many open tabs at once I guess. The 2060 MQ also seems to run abnormally slow compared to other MQ models. The normal 2060 is not that far behind an 80W 2080 MQ - about 10% . Still, if higher clocks improved performance more than more CUDA cores, at 80W for both the 2060 ought then to outperform the 2080 Max-Q, which it doesn't - it's noticeably behind. After all, in that comparison you have two GPUs with the same architecture, with the smaller GPU having more memory bandwidth per shader and higher clocks, so if clock speeds improved performance more than a wider GPU layout, the smaller GPU would thus be faster. Binning of course has some effect on this, but not to the tune of explaining away a performance difference of that size.

And while I never said that scaling with more shaders is even close to linear, increased shader counts is responsible for the majority of GPU performance uplift over the past decade - far more than clock speeds, which have increased by less than 3x while shader counts have increased by ~8x and total GPU performance by >5x. Any GPU OC exercise will show that performance scaling with clock speed increases is far below linear - often to the tune of half or less than half in terms of perf % increase vs. clock % increase even when also OC'ing memory. The best balance for increasing performance generation over generation is obviously a combination of both, but in the case of Vega AMD couldn't do that, and instead only pushed clocks higher. The lack of shader count increases forced them to push clocks far past the efficiency sweet spot of that arch+node combo, tanking efficiency in an effort to maximize absolute performance - they had nowhere else to go and a competitor with a significant lead in absolute performance, after all.

The same happened again with the VII; clocks were pushed far beyond the efficiency sweet spot as they couldn't increase the CU count (at this point we were looking at a 331 mm2 die, so they could easily have added 10-20 CUs if they had the ability and stayed within a reasonable die size). Now, the VII has 60 CUs active and not 64, but that is down to nothing more than this GPU being a PR move with likely zero margins utilizing salvaged dice, with all fully enabled Vega 20 dice going to compute accelerators where there was actual money to be made. On the other hand, comparing it with the Vega 56, you have 9.3% more shaders, ~20% higher clock speeds (base - the boost speed difference is larger) and >2x the memory bandwidth, yet it only delivers ~33% more performance. For a severely memory limited arch like Vega, that is a rather poor showing. And again, at the same wattage they could undoubtedly have increased performance more with some more shaders running at a lower speed.

As for AMD saying there was no hard architectural limit of 64 shaders: source, please? Not increasing shader counts at all across three generations and three production nodes while the competition increases theirs by 55% is proof enough that AMD couldn't increase theirs without moving away from GCN.
 
Top