- Joined
- Jul 15, 2006
- Messages
- 1,248 (0.19/day)
- Location
- Noir York
Processor | AMD Ryzen 7 5700G |
---|---|
Motherboard | ASUS A520M-K |
Cooling | Scythe Kotetsu Mark II |
Memory | 2 x 16GB SK Hynix CJR OEM DDR4-3200 @ 4000 20-22-20-48 |
Video Card(s) | Colorful RTX 2060 SUPER 8GB GDDR6 |
Storage | 250GB WD BLACK SN750 M.2 + 4TB WD Red Plus + 4TB WD Purple |
Display(s) | AOpen 27HC5R 27" 1080p 165Hz curved VA |
Case | AIGO Darkflash C285 |
Audio Device(s) | Creative SoundBlaster Z + Kurtzweil KS-40A bookshelf / Sennheiser HD555 |
Power Supply | Great Wall GW-EPS1000DA 1kW |
Mouse | Razer Deathadder Essential |
Keyboard | Cougar Attack2 Cherry MX Black |
Software | Windows 10 Pro x64 22H2 |
I have been thinking about this for a long time, so I just wanted to vent it here and see what other people is thinking. I am no GPU designer by a long shot just merely seeing the trend on GCN iteration and its limitations from various reviews.
If you see the move from RX 560 to RX 570 the performance difference is literally doubled in any sort of situation. Why is that? Left is Polaris 11 (RX 560) and right is Polaris 10 (RX 5/470 with 4 CU disabled)
As you can see, the number of Shader engine is doubled thus the geometry processor is doubled, and everything else too is doubled like L2 cache and IMC. This is what I think that impact the performance the most.
So now lets compare it to older GCN, primarily their last flagship that gives nvidia some trouble, R9 290X or codenamed Hawaii to strengthen my theory. I know I shouldn't compare two different generations of cards but bear with me on this:
Both are very similar at a glance but three major difference between two cards: Hawaii have massive 512-bit memory controller, more CU per Shader Engine and double the ROP count per Shader Engine. But from the reviews for example from our local TPU RX 470 review the performance difference between RX 470/480 with 390/390X (overclocked 290/X) is basically only a few frames. Trying to omit the generation difference and the small tweaks AMD done for GCN4 I have three hypothesis; one is optimal number of ROP per Shader Engine is around 2, optimal number of CU per Shader engine is around 1024SP and 256-bit bus is enough to feed 4 Shader engine despite double ROP count. I have R9 290X before and done some testing, lowering memory clock to 4GHz (1GHz) which equates to 256GB/s bandwidth doesn't really affect much performance difference in gaming (at 1080p) but it does lower temperature.
So here comes what my version of Vega AMD should have done, here's why. I have read they removed the dreaded 4 Shader Engine limitation of what GCN have with NCU thus potentially removing the geometry bottleneck that plagued AMD cards. So here it is, my version of 'small Vega' block diagram supposed to look like (it was made in Paint so dont mock on it )
It have 3072 SP with 8 CU per Shader engine, 6 geometry processor which 2 more than what current Vega have, 48 ROP and 384-bit memory bus width (or 2048-bit HBM2) and 3MB L2 cache. I think this is a much more balanced design than spamming as much CU in Shader Engine and cramming more ROPs, thus properly feeding them and better utilization.
From there AMD could make Big Vega by expanding it more with 8 Shader Engine which made it 4096 SP with 8 geometry processor, 64 ROPs and 512-bit memory bus (or 4096-bit HBM2) and 4MB L2 cache.
I don't know what stopping AMD from making this design, the jump from 1024 SP Polaris 11 to 2304 SP Polaris 10 is close to doubling its transistor count (3 billion vs 5.6 billion), by a simple math the transistor count of my design should be around 9 billion for small Vega and around the same figure of what AMD did for big Vega with 4096 SP.
P.S: Is it just me or I just found out that Polaris 10 block diagram have 512-bit bus, same as Hawaii? Something is wrong somewhere
If you see the move from RX 560 to RX 570 the performance difference is literally doubled in any sort of situation. Why is that? Left is Polaris 11 (RX 560) and right is Polaris 10 (RX 5/470 with 4 CU disabled)
As you can see, the number of Shader engine is doubled thus the geometry processor is doubled, and everything else too is doubled like L2 cache and IMC. This is what I think that impact the performance the most.
So now lets compare it to older GCN, primarily their last flagship that gives nvidia some trouble, R9 290X or codenamed Hawaii to strengthen my theory. I know I shouldn't compare two different generations of cards but bear with me on this:
Both are very similar at a glance but three major difference between two cards: Hawaii have massive 512-bit memory controller, more CU per Shader Engine and double the ROP count per Shader Engine. But from the reviews for example from our local TPU RX 470 review the performance difference between RX 470/480 with 390/390X (overclocked 290/X) is basically only a few frames. Trying to omit the generation difference and the small tweaks AMD done for GCN4 I have three hypothesis; one is optimal number of ROP per Shader Engine is around 2, optimal number of CU per Shader engine is around 1024SP and 256-bit bus is enough to feed 4 Shader engine despite double ROP count. I have R9 290X before and done some testing, lowering memory clock to 4GHz (1GHz) which equates to 256GB/s bandwidth doesn't really affect much performance difference in gaming (at 1080p) but it does lower temperature.
So here comes what my version of Vega AMD should have done, here's why. I have read they removed the dreaded 4 Shader Engine limitation of what GCN have with NCU thus potentially removing the geometry bottleneck that plagued AMD cards. So here it is, my version of 'small Vega' block diagram supposed to look like (it was made in Paint so dont mock on it )
It have 3072 SP with 8 CU per Shader engine, 6 geometry processor which 2 more than what current Vega have, 48 ROP and 384-bit memory bus width (or 2048-bit HBM2) and 3MB L2 cache. I think this is a much more balanced design than spamming as much CU in Shader Engine and cramming more ROPs, thus properly feeding them and better utilization.
From there AMD could make Big Vega by expanding it more with 8 Shader Engine which made it 4096 SP with 8 geometry processor, 64 ROPs and 512-bit memory bus (or 4096-bit HBM2) and 4MB L2 cache.
I don't know what stopping AMD from making this design, the jump from 1024 SP Polaris 11 to 2304 SP Polaris 10 is close to doubling its transistor count (3 billion vs 5.6 billion), by a simple math the transistor count of my design should be around 9 billion for small Vega and around the same figure of what AMD did for big Vega with 4096 SP.
P.S: Is it just me or I just found out that Polaris 10 block diagram have 512-bit bus, same as Hawaii? Something is wrong somewhere