• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

AMD RDNA3 Navi 31 GPU Block Diagram Leaked, Confirmed to be PCIe Gen 4

Joined
Oct 26, 2022
Messages
57 (0.09/day)
Meh. As long as the bottom cards have a good media engine…….unlike the 6400…….
6400 & 6500 XT's true potentials were nerfed a lot by PCIe 4.0 x4, 4GB VRAM & 64-bit Bus.
I personally won't trade tiny amount of RT Cores, newer Encoder-Decoder & HDMI 2.1 40Gbps for their predecessor 5500 XT 8GB. Even 470 4GB from 2016 brutalises 6400 in all scenarios.
 

ARF

Joined
Jan 28, 2020
Messages
4,337 (2.66/day)
Location
Ex-usa | slava the trolls
I definitely hope they will care again, even though it seems unlikely.

Next year I am planning on building a separate strictly gaming PC and I want to convert my current build to an HTPC that will handle everything else, including recording with a capture card. I would definitely want a cheap graphics card with AV1 encoding. If NVIDIA or AMD do not offer such a product, I might go with Intel (A380 or whatever).

If AMD wants to save its market share, it must restructure its strategy and start to offer small GPUs with maximum media qualities - otherwise they will never save their faces in front of the larger audience and will always be felt as the underdogs by the vast majority of users.

One can hope that Navi 33 (June 2023?) will fix these problems but I think they need Navi 34 with PCIe 4.0 x8, AV1, DisplayPort 2.1 and a price tag of $129.
 
Joined
Feb 20, 2019
Messages
7,651 (3.88/day)
System Name Bragging Rights
Processor Atom Z3735F 1.33GHz
Motherboard It has no markings but it's green
Cooling No, it's a 2.2W processor
Memory 2GB DDR3L-1333
Video Card(s) Gen7 Intel HD (4EU @ 311MHz)
Storage 32GB eMMC and 128GB Sandisk Extreme U3
Display(s) 10" IPS 1280x800 60Hz
Case Veddha T2
Audio Device(s) Apparently, yes
Power Supply Samsung 18W 5V fast-charger
Mouse MX Anywhere 2
Keyboard Logitech MX Keys (not Cherry MX at all)
VR HMD Samsung Oddyssey, not that I'd plug it into this though....
Software W10 21H1, barely
Benchmark Scores I once clocked a Celeron-300A to 564MHz on an Abit BE6 and it scored over 9000.
Meh. As long as the bottom cards have a good media engine…….unlike the 6400…….
I'm pretty certain that the 6400 and 6500 only exist as desktop cards at all because of the disastrous coincidence of ETH mining boom and the pandemic screwing with production/supply

They weren't designed to be standalone products - they're literally in the AMD roadmaps from almost 4 years ago as laptop-only auxilliary GPU cores to complement an AMD IGP.
 

ARF

Joined
Jan 28, 2020
Messages
4,337 (2.66/day)
Location
Ex-usa | slava the trolls
I'm pretty certain that the 6400 and 6500 only exist as desktop cards at all because of the disastrous coincidence of ETH mining boom and the pandemic screwing with production/supply

They weren't designed to be standalone products - they're literally in the AMD roadmaps from almost 4 years ago as laptop-only auxilliary GPU cores to complement an AMD IGP.

I think AMD's general management capabilities are quite screwed, to begin with.
No pandemics can erase certain plans - and when there had never been plans for decent offers, you can't blame the external factors.

Its roadmap is lackluster.
 
Joined
Nov 26, 2021
Messages
1,415 (1.47/day)
Location
Mississauga, Canada
Processor Ryzen 7 5700X
Motherboard ASUS TUF Gaming X570-PRO (WiFi 6)
Cooling Noctua NH-C14S (two fans)
Memory 2x16GB DDR4 3200
Video Card(s) Reference Vega 64
Storage Intel 665p 1TB, WD Black SN850X 2TB, Crucial MX300 1TB SATA, Samsung 830 256 GB SATA
Display(s) Nixeus NX-EDG27, and Samsung S23A700
Case Fractal Design R5
Power Supply Seasonic PRIME TITANIUM 850W
Mouse Logitech
VR HMD Oculus Rift
Software Windows 11 Pro, and Ubuntu 20.04
Availability is irrelevant when 80% of people still choose their competition.

Their top tier product competes in raster to Nvidia's 3rd/4th product down the stack (4090ti, 4090, 4080ti, 4080), therefore the fact it's cheaper is borderline irrelevant. That's without getting started on the non-raster advantages NVIDIA has.

In other words, the situation hasn't changed since the 6950xt and the 3090ti/6900 and 3090.
I think you're wrong about the competitive position; the 7900 XTX should blow the 4080 away when it comes to rasterization and since the 4080 Ti doesn't exist, we shouldn't be considering that. All this has been achieved with a miniscule die compared to the 4090 and the power penalty of off-chip interconnects.

If AMD wants to save its market share, it must restructure its strategy and start to offer small GPUs with maximum media qualities - otherwise they will never save their faces in front of the larger audience and will always be felt as the underdogs by the vast majority of users.

One can hope that Navi 33 (June 2023?) will fix these problems but I think they need Navi 34 with PCIe 4.0 x8, AV1, DisplayPort 2.1 and a price tag of $129.
This market has almost entirely been taken over by the increasingly more capable IGPs. Wasting expensive 5 nm wafers on a 7400 or 7500 would be unwise. They might only offer those SKUs if Intel decides to release an Arc B380.
 

ARF

Joined
Jan 28, 2020
Messages
4,337 (2.66/day)
Location
Ex-usa | slava the trolls
This market has almost entirely been taken over by the increasingly more capable IGPs. Wasting expensive 5 nm wafers on a 7400 or 7500 would be unwise. They might only offer those SKUs if Intel decides to release an Arc B380.

Are there iGPUs which have gaming performance equal to RX 6400's?

There are market segments which require improving gaming performance (RX 6500 XT is stupid to begin with because it has 0% improvement over the older RX 5500 XT) on the entry discrete cards, which can go with any system Intel or AMD...
 
Joined
Nov 26, 2021
Messages
1,415 (1.47/day)
Location
Mississauga, Canada
Processor Ryzen 7 5700X
Motherboard ASUS TUF Gaming X570-PRO (WiFi 6)
Cooling Noctua NH-C14S (two fans)
Memory 2x16GB DDR4 3200
Video Card(s) Reference Vega 64
Storage Intel 665p 1TB, WD Black SN850X 2TB, Crucial MX300 1TB SATA, Samsung 830 256 GB SATA
Display(s) Nixeus NX-EDG27, and Samsung S23A700
Case Fractal Design R5
Power Supply Seasonic PRIME TITANIUM 850W
Mouse Logitech
VR HMD Oculus Rift
Software Windows 11 Pro, and Ubuntu 20.04
Are there iGPUs which have gaming performance equal to RX 6400's?

There are market segments which require improving gaming performance (RX 6500 XT is stupid to begin with because it has 0% improvement over the older RX 5500 XT) on the entry discrete cards, which can go with any system Intel or AMD...
No, there aren't any in the x86 world, but the IGP in the mobile Zen 3 refresh is equivalent to the 1050 Ti. Keep in mind that it's hobbled by ho-hum DDR5 4800 and lower clock speeds due to the low TDP of the platform. A similar desktop APU would be faster, though it probably won't be a 6400 competitor.
 
Joined
Dec 12, 2012
Messages
728 (0.17/day)
Location
Poland
System Name THU
Processor Intel Core i5-13600KF
Motherboard ASUS PRIME Z790-P D4
Cooling SilentiumPC Fortis 3 v2 + Arctic Cooling MX-2
Memory Crucial Ballistix 2x16 GB DDR4-3600 CL16 (dual rank)
Video Card(s) MSI GeForce RTX 4070 Ventus 3X OC 12 GB GDDR6X (2610/21000 @ 0.91 V)
Storage Lexar NM790 2 TB + Corsair MP510 960 GB + PNY XLR8 CS3030 500 GB + Toshiba E300 3 TB
Display(s) LG OLED C8 55" + ASUS VP229Q
Case Fractal Design Define R6
Audio Device(s) Yamaha RX-V381 + Monitor Audio Bronze 6 + Bronze FX | FiiO E10K-TC + Sony MDR-7506
Power Supply Corsair RM650
Mouse Logitech M705 Marathon
Keyboard Corsair K55 RGB PRO
Software Windows 10 Home
Benchmark Scores Benchmarks in 2024?
This market has almost entirely been taken over by the increasingly more capable IGPs. Wasting expensive 5 nm wafers on a 7400 or 7500 would be unwise. They might only offer those SKUs if Intel decides to release an Arc B380.

They do not have to be made on 5 nm, though. They can make them on 7 nm or anything else. It is about the feature-set, not the manufacturing process. A low-end GPU does not have to be made on the newest process, power consumption is not really an issue.

iGPUs are cool, but they require you to buy the entire platform. You can have an older PC and you might want to upgrade just the graphics card to get some new features and a bit more performance. They do not have to make a lot of those cards.
 
  • Like
Reactions: ARF

ARF

Joined
Jan 28, 2020
Messages
4,337 (2.66/day)
Location
Ex-usa | slava the trolls
They do not have to be made on 5 nm, though. They can make them on 7 nm or anything else. It is about the feature-set, not the manufacturing process. A low-end GPU does not have to be made on the newest process, power consumption is not really an issue.

iGPUs are cool, but they require you to buy the entire platform. You can have an older PC and you might want to upgrade just the graphics card to get some new features and a bit more performance. They do not have to make a lot of those cards.

100%.

The performance gap between RX 6400 and RTX 4090 is too wide.
AMD can be back in business if it succeeds to move this down to only 400%.

AMD also needs to start making pipecleaners. How about a 50 sq. mm GPU on TSMC N3 with chiplets on N7 made now, NOW?

1667857453740.png
 
Joined
Nov 26, 2021
Messages
1,415 (1.47/day)
Location
Mississauga, Canada
Processor Ryzen 7 5700X
Motherboard ASUS TUF Gaming X570-PRO (WiFi 6)
Cooling Noctua NH-C14S (two fans)
Memory 2x16GB DDR4 3200
Video Card(s) Reference Vega 64
Storage Intel 665p 1TB, WD Black SN850X 2TB, Crucial MX300 1TB SATA, Samsung 830 256 GB SATA
Display(s) Nixeus NX-EDG27, and Samsung S23A700
Case Fractal Design R5
Power Supply Seasonic PRIME TITANIUM 850W
Mouse Logitech
VR HMD Oculus Rift
Software Windows 11 Pro, and Ubuntu 20.04
They do not have to be made on 5 nm, though. They can make them on 7 nm or anything else. It is about the feature-set, not the manufacturing process. A low-end GPU does not have to be made on the newest process, power consumption is not really an issue.

iGPUs are cool, but they require you to buy the entire platform. You can have an older PC and you might want to upgrade just the graphics card to get some new features and a bit more performance. They do not have to make a lot of those cards.
Those are all valid points. Unfortunately, it doesn't seem that Nvidia and AMD are listening. Nvidia hasn't bothered releasing anything newer than the 1650 for this segment, and the 6400 and 6500 wouldn't have existed without the crypto boom.
 
  • Angry
Reactions: ARF
Joined
Dec 12, 2012
Messages
728 (0.17/day)
Location
Poland
System Name THU
Processor Intel Core i5-13600KF
Motherboard ASUS PRIME Z790-P D4
Cooling SilentiumPC Fortis 3 v2 + Arctic Cooling MX-2
Memory Crucial Ballistix 2x16 GB DDR4-3600 CL16 (dual rank)
Video Card(s) MSI GeForce RTX 4070 Ventus 3X OC 12 GB GDDR6X (2610/21000 @ 0.91 V)
Storage Lexar NM790 2 TB + Corsair MP510 960 GB + PNY XLR8 CS3030 500 GB + Toshiba E300 3 TB
Display(s) LG OLED C8 55" + ASUS VP229Q
Case Fractal Design Define R6
Audio Device(s) Yamaha RX-V381 + Monitor Audio Bronze 6 + Bronze FX | FiiO E10K-TC + Sony MDR-7506
Power Supply Corsair RM650
Mouse Logitech M705 Marathon
Keyboard Corsair K55 RGB PRO
Software Windows 10 Home
Benchmark Scores Benchmarks in 2024?
Nvidia hasn't bothered releasing anything newer than the 1650 for this segment, and the 6400 and 6500 wouldn't have existed without the crypto boom.

That is true. Although the RTX 3050 is a fine card in my opinion if it actually cost $250. But today it already has an outdated feature-set. If they made a 4050 for actual $250, I would probably pay the extra money. I could even do some light gaming on it and have it as a backup in case my main gaming PC has a problem.
 
Joined
Oct 30, 2020
Messages
102 (0.08/day)
Availability is irrelevant when 80% of people still choose their competition.

Their top tier product competes in raster to Nvidia's 3rd/4th product down the stack (4090ti, 4090, 4080ti, 4080), therefore the fact it's cheaper is borderline irrelevant. That's without getting started on the non-raster advantages NVIDIA has.

In other words, the situation hasn't changed since the 6950xt and the 3090ti/6900 and 3090.
There's no possible way the 7900 xtx competes with the 4080 in raster. It's predicted to be within 10-15% of the 4090. The 4080 can't possibly be that close to the 4090 by looking at the specs. RT - yeah fair point it's a lot slower and they don't have CUDA. I just don't see too many other advantages of Nvidia - DLSS 2.0 is pretty similar to FSR 2 and their NVENC is matched now by Navi 31 on paper. On the other hand AMD have a much better control panel, don't use the terrible 12VHPWR connector, much better form factor and is 20-60% cheaper. At best, I can see a 4080 somewhat comparable to 7900xt on raster.

I don't care much for RT anyway, and prefer 240hz on a 3k screen. Time to retire the 3090 and 7900xtx here I come.
 
Joined
Apr 30, 2020
Messages
886 (0.58/day)
System Name S.L.I + RTX research rig
Processor Ryzen 7 5800X 3D.
Motherboard MSI MEG ACE X570
Cooling Corsair H150i Cappellx
Memory Corsair Vengeance pro RGB 3200mhz 16Gbs
Video Card(s) 2x Dell RTX 2080 Ti in S.L.I
Storage Western digital Sata 6.0 SDD 500gb + fanxiang S660 4TB PCIe 4.0 NVMe M.2
Display(s) HP X24i
Case Corsair 7000D Airflow
Power Supply EVGA G+1600watts
Mouse Corsair Scimitar
Keyboard Cosair K55 Pro RGB
From What I can Read the lack of TMU's is the main problem with this desgin.
the RX 6950 has 320 Texture units along with 128 R.O.P's, which gives it 739.2 GTexel/s at 2100mhz-2300mhz
While the RX 7900 XTX has 384 Texture units along with 192 R.O.P's
The Texture fill rate should be 887.04GTexel/s, not 961.9 GTexel/s There is no way this GPU is a that.

View attachment 268855
^ this part should be bigger. They double Instructions in preclock.
I feel like the should have just double the RT core count in the compute part instead of increasing Shader count by 2.7 times.
arch4.jpg


This setup was fine from RDNA2. It just needed refinement RDNA3 doesn't refine this setup at all it slow it down, espscially in raytracing. If AMD added in a raytracing instruction cache & raytracing data cache to RDNA2 that's seprate for the RT core on the back end. The Rt cores would be unbottlenecked by the texture being linked to them. You could unlink the shader clocks from the RT cores then size it appropriately. Instead of a single double size front end being unlinked from shaders like it is in RDNA3. The shaders are under clocked, because the Instruction cache can't feed them fast enough at the same clock since they're more them now too. Or just double the instruction cache like they did to improve the raytracing on the backend without the adding of the shaders to each unit. If you want proof All you have to do is look a Navi vs Navi 2. They doubled textures units, double R.O.P's, doubled Shaders, & double Compute units, added infinity Cache all on the same 7nm node,Plus the power going from 225 watts to 330 watts or a 50% increase in power. They couldn't do it in Navi 3 at all, because it's front end can't feed the increase in shaders & doubled RT cores now It wouldn't be able to support double the texture units inside the Wgp/compute". That is why there is only a 20% increase in texture units this time.

P.S Anyone feel like modifying this in photoshop toshow the new way RDNA3 is setup?
 
Last edited:
Joined
Feb 22, 2022
Messages
101 (0.12/day)
System Name Lexx
Processor Threadripper 2950X
Motherboard Asus ROG Zenith Extreme
Cooling Custom Water
Memory 32/64GB Corsair 3200MHz
Video Card(s) Liquid Devil 6900XT
Storage 4TB Solid State PCI/NVME/M.2
Display(s) LG 34" Curved Ultrawide 160Hz
Case Thermaltake View T71
Audio Device(s) Onboard
Power Supply Corsair 1000W
Mouse Logitech G502
Keyboard Asus
VR HMD NA
Software Windows 10 Pro
Availability is irrelevant when 80% of people still choose their competition.

Their top tier product competes in raster to Nvidia's 3rd/4th product down the stack (4090ti, 4090, 4080ti, 4080), therefore the fact it's cheaper is borderline irrelevant. That's without getting started on the non-raster advantages NVIDIA has.

In other words, the situation hasn't changed since the 6950xt and the 3090ti/6900 and 3090.
You're talking about an alleged top tier AMD product that hasn't actually been released vs 3 out of 4 Nvidia products that haven't been released / bench-marked by tech-websites. This doesn't make much sense to me...
 
Joined
Feb 20, 2019
Messages
7,651 (3.88/day)
System Name Bragging Rights
Processor Atom Z3735F 1.33GHz
Motherboard It has no markings but it's green
Cooling No, it's a 2.2W processor
Memory 2GB DDR3L-1333
Video Card(s) Gen7 Intel HD (4EU @ 311MHz)
Storage 32GB eMMC and 128GB Sandisk Extreme U3
Display(s) 10" IPS 1280x800 60Hz
Case Veddha T2
Audio Device(s) Apparently, yes
Power Supply Samsung 18W 5V fast-charger
Mouse MX Anywhere 2
Keyboard Logitech MX Keys (not Cherry MX at all)
VR HMD Samsung Oddyssey, not that I'd plug it into this though....
Software W10 21H1, barely
Benchmark Scores I once clocked a Celeron-300A to 564MHz on an Abit BE6 and it scored over 9000.
Those are all valid points. Unfortunately, it doesn't seem that Nvidia and AMD are listening. Nvidia hasn't bothered releasing anything newer than the 1650 for this segment, and the 6400 and 6500 wouldn't have existed without the crypto boom.
RTX 3050's raytracing performance is already too bad to be relevant in many games.

Nvidia will never replace the 1650 because nothing at that size has a snowball's chance in hell of raytracing or running DLSS acceptably. RTX and DLSS are Nvidia's competitive edge and they won't undermine that no matter what.

I'd love to see a GA108 die, maybe 1920 CUDA cores, clocked at ~1.8GHz, 6GB of cheap, slow 12Gbps GDDR6, and all running at <75W on Samsung N8 or TSMC6. There'd be no need for Tensor cores or RT cores and the aim would be to make the die as small and cheap to produce as possible.

It'll never happen though. Nvidia have abandoned the low-end to AMD and Intel IGPs, just above that they can still make profit for zero effort by churning out 1050Ti and 1650. They're neither modern nor efficient but Nvidia doesn't care about that, they only care about profit.
 
Joined
Mar 7, 2007
Messages
1,421 (0.22/day)
Processor E5-1680 V2
Motherboard Rampage IV black
Video Card(s) Asrock 7900 xtx
Storage 500 gb sd
Software windows 10 64 bit
Benchmark Scores 29,433 3dmark06 score
The "advanced chiplets design" would only be "disruptive" vs Monolithic if AMD either
1. Passed on the cost savings to consumers (prices are same for xtx, and 7900xt is worse cu count % wise to top sku than 6800xt was to 6900xt)
2. Actually used the chiplets to have more cu, instead of just putting memory controllers on there. Thereby maybe actually competing in RTRT performance or vs the 4090.

AMD is neither taking advantage of their cost savings and going the value route, or competing vs their competition's best. They're releasing a product months later, with zero performance advantages vs the competition? Potential 4080ti will still be faster and they've just given nvidia free reign (once again) to charge whatever they want for flagships since there's zero competition.

7950xtx could have two 7900xtx dies, but I doubt it
Interesting…so you know this? Even if it hits 3 ghz? Also does anyone know when embargo lifts and we get actual reviews of these cards?
 
Joined
May 30, 2015
Messages
1,900 (0.57/day)
Location
Seattle, WA
Actually used the chiplets to have more cu, instead of just putting memory controllers on there. Thereby maybe actually competing in RTRT performance or vs the 4090.

Memory bandwidth IS performance, though. Those extra CUs do you no good at all if they're sitting idle with no work distribution. (Look back at GCN's headaches regarding idle logic via bandwidth starvation.) Shoving hardware in with no improvement to your memory and fabric bandwidth is the actual waste of die space.
 
Joined
Apr 30, 2020
Messages
886 (0.58/day)
System Name S.L.I + RTX research rig
Processor Ryzen 7 5800X 3D.
Motherboard MSI MEG ACE X570
Cooling Corsair H150i Cappellx
Memory Corsair Vengeance pro RGB 3200mhz 16Gbs
Video Card(s) 2x Dell RTX 2080 Ti in S.L.I
Storage Western digital Sata 6.0 SDD 500gb + fanxiang S660 4TB PCIe 4.0 NVMe M.2
Display(s) HP X24i
Case Corsair 7000D Airflow
Power Supply EVGA G+1600watts
Mouse Corsair Scimitar
Keyboard Cosair K55 Pro RGB
Memory bandwidth IS performance, though. Those extra CUs do you no good at all if they're sitting idle with no work distribution. (Look back at GCN's headaches regarding idle logic via bandwidth starvation.) Shoving hardware in with no improvement to your memory and fabric bandwidth is the actual waste of die space.
What? :confused: Vega was Based on GCN & couldn't even use the bandwith from HBM2.
 
Joined
May 30, 2015
Messages
1,900 (0.57/day)
Location
Seattle, WA
What? :confused: Vega was Based on GCN & couldn't even use the bandwith from HBM2.

You've correctly isolated one generation of GCN where bandwidth exceeded the front end and shaders (though Vega's ROPs absolutely use that bandwidth, look at Vega 64 at stock HBM2 clocks vs 1200MHz, 30% improvement to pixel fill in isolated tests.) Now if we look back at the other 5 generations of GCN that exist.

An incredibly important distinction to make is fabric bandwidth from VRAM theoretical bandwidth. Getting 960GB/s at the PHYs does NOT equate to pushing terabytes per second of data across the die. One of the major steps forward for RDNA3's design is that increased fabric bandwidth. This is something that other companies are absolutely killing right now, not to name any fruit related names, and leaving people confused about how they can manage near linear scaling with increased shader grouping.
 
Last edited:
Joined
Apr 30, 2020
Messages
886 (0.58/day)
System Name S.L.I + RTX research rig
Processor Ryzen 7 5800X 3D.
Motherboard MSI MEG ACE X570
Cooling Corsair H150i Cappellx
Memory Corsair Vengeance pro RGB 3200mhz 16Gbs
Video Card(s) 2x Dell RTX 2080 Ti in S.L.I
Storage Western digital Sata 6.0 SDD 500gb + fanxiang S660 4TB PCIe 4.0 NVMe M.2
Display(s) HP X24i
Case Corsair 7000D Airflow
Power Supply EVGA G+1600watts
Mouse Corsair Scimitar
Keyboard Cosair K55 Pro RGB
You've correctly isolated one generation of GCN where bandwidth exceeded the front end and shaders (though Vega's ROPs absolutely use that bandwidth, look at Vega 64 at stock HBM2 clocks vs 1200MHz, 30% improvement to pixel fill in isolated tests.) Now if we look back at the other 5 generations of GCN that exist.
It dose not. Nore does any older GCN.
Why do you try posting fact & prove it.
"Shader utilization" on RDNA 3 will be less than what it is on RDNA2. I already shown why, with facts.
 
Joined
May 30, 2015
Messages
1,900 (0.57/day)
Location
Seattle, WA
"Shader utilization" on RDNA 3 will be less than what it is on RDNA2. I already shown why, with facts.

The only 'facts' I could find were your opinions on how they could have doubled the core count? Let's dissect:

It just needed refinement RDNA3 doesn't refine this setup at all it slow it down, espscially in raytracing. If AMD added in a raytracing instruction cache & raytracing data cache to RDNA2 that's seprate for the RT core on the back end.

So you want them to remove the ray accelerators from the CUs and put them on the backend as a dedicated block? That's expensive for die area since you now need to physically map and wire out all their supporting cache twice instead of giving them the shared data cache. Also, BVH is a shader-parallel operation, so why would you remove the ray accelerators from the CU and have to store ray data in a secondary location to fetch half way through? Just to increase latency for funsies? It makes no sense.

You could unlink the shader clocks from the RT cores then size it appropriately. Instead of a single double size front end being unlinked from shaders like it is in RDNA3.

You want to create a new clock domain ONLY for the ray accelerators within the shader engine, and keep the shader engine clock domain linked to the front end. Sounds great, but those ray accelerators are with utmost certainty NOT what drives up core power draw. So pushing them off onto their own domain isn't going to save you any power. You'll go back to having the problem RDNA2 has where the front end is outrun by the CUs, and entire units stall for multiple cycles waiting for wavefronts.

The shaders are under clocked, because the Instruction cache can't feed them fast enough at the same clock since they're more them now too. Or just double the instruction cache like they did to improve the raytracing on the backend without the adding of the shaders to each unit.

The shaders are underclocked to save power. I guarantee you when reviews come out, people will attempt to overclock the shader engines and they will see performance scale with clocks, as well as significant increases in load power draw. The front end has been left alone because it needs the extra bandwidth to feed the CUs, this so far is true. Double the I$ to improve ray tracing on the backend (ray tracing happens in the CUs, not on the backend) without adding shaders to each unit? What? The ray accelerators reside within the CU, how would they add more ray accelerators without adding more shaders? Empty CUs? Oh, wait I think you're back on that idea of removing the ray accelerators from the CUs and grouping them to the backend again. We've been over this.

If you want proof All you have to do is look a Navi vs Navi 2. They doubled textures units, double R.O.P's, doubled Shaders, & double Compute units, added infinity Cache all on the same 7nm node,Plus the power going from 225 watts to 330 watts or a 50% increase in power.

Correct, and they more than doubled the die size in turn. The beauty in this comparison is that the 5700 XT was pushed to the absolute limit of what it could endure, and a properly tuned (980mV-1050mV) 5700 XT runs more like 170W. Unsurprisingly if you look at the 6900 XT it's tuned for that exact voltage range, and uses almost exacty 2x the power of the 5700 XT at the same voltage.

1667887261936.png1667887285003.png

5700 XT Reference on the left, 6900 XT reference on the right.

They couldn't do it in Navi 3 at all, because it's front end can't feed the increase in shaders & doubled RT cores now It wouldn't be able to support double the texture units inside the Wgp/compute". That is why there is only a 20% increase in texture units this time.

The front end CAN feed the shader engines because it runs at higher clocks. Also, doubled RT cores? Where did you read that? Navi 31's ray accelerators were only increased by 50% according to the top post detailing the block diagram, and AMD themselves don't even mention a full 2x performance increase of the cores. Where did you get "double" from?

The reason there has only been an increase of 20% to TMU count is because there has only been an increase of 20% to CU count. From 80 to 96. The same reason why TMU count from 5700 XT to 6900 XT doubled, because they went from 40 to 80 CUs. That's how the architecture is laid out, I'm sorry you don't like it?
 
Joined
Sep 3, 2019
Messages
3,045 (1.71/day)
Location
Thessaloniki, Greece
System Name PC on since Aug 2019, 1st CPU R5 3600 + ASUS ROG RX580 8GB >> MSI Gaming X RX5700XT (Jan 2020)
Processor Ryzen 9 5900X (July 2022), 160W PPT limit, 75C temp limit, CO -9~14
Motherboard Gigabyte X570 Aorus Pro (Rev1.0), BIOS F37h, AGESA V2 1.2.0.B
Cooling Arctic Liquid Freezer II 420mm Rev7 (Jan 2024) with off center mount for Ryzen, TIM: Kryonaut
Memory 2x16GB G.Skill Trident Z Neo GTZN (July 2022) 3600MHz 1.42V CL16-16-16-16-32-48 1T, tRFC:280, B-die
Video Card(s) Sapphire Nitro+ RX 7900XTX (Dec 2023) 314~465W (390W current) PowerLimit, 1060mV, Adrenalin v24.5.1
Storage Samsung NVMe: 980Pro 1TB(OS 2022), 970Pro 512GB(2019) / SATA-III: 850Pro 1TB(2015) 860Evo 1TB(2020)
Display(s) Dell Alienware AW3423DW 34" QD-OLED curved (1800R), 3440x1440 144Hz (max 175Hz) HDR1000, VRR on
Case None... naked on desk
Audio Device(s) Astro A50 headset
Power Supply Corsair HX750i, 80+ Platinum, 93% (250~700W), modular, single/dual rail (switch)
Mouse Logitech MX Master (Gen1)
Keyboard Logitech G15 (Gen2) w/ LCDSirReal applet
Software Windows 11 Home 64bit (v23H2, OSB 22631.3737)
The only 'facts' I could find were your opinions on how they could have doubled the core count? Let's dissect:



So you want them to remove the ray accelerators from the CUs and put them on the backend as a dedicated block? That's expensive for die area since you now need to physically map and wire out all their supporting cache twice instead of giving them the shared data cache. Also, BVH is a shader-parallel operation, so why would you remove the ray accelerators from the CU and have to store ray data in a secondary location to fetch half way through? Just to increase latency for funsies? It makes no sense.



You want to create a new clock domain ONLY for the ray accelerators within the shader engine, and keep the shader engine clock domain linked to the front end. Sounds great, but those ray accelerators are with utmost certainty NOT what drives up core power draw. So pushing them off onto their own domain isn't going to save you any power. You'll go back to having the problem RDNA2 has where the front end is outrun by the CUs, and entire units stall for multiple cycles waiting for wavefronts.



The shaders are underclocked to save power. I guarantee you when reviews come out, people will attempt to overclock the shader engines and they will see performance scale with clocks, as well as significant increases in load power draw. The front end has been left alone because it needs the extra bandwidth to feed the CUs, this so far is true. Double the I$ to improve ray tracing on the backend (ray tracing happens in the CUs, not on the backend) without adding shaders to each unit? What? The ray accelerators reside within the CU, how would they add more ray accelerators without adding more shaders? Empty CUs? Oh, wait I think you're back on that idea of removing the ray accelerators from the CUs and grouping them to the backend again. We've been over this.



Correct, and they more than doubled the die size in turn. The beauty in this comparison is that the 5700 XT was pushed to the absolute limit of what it could endure, and a properly tuned (980mV-1050mV) 5700 XT runs more like 170W. Unsurprisingly if you look at the 6900 XT it's tuned for that exact voltage range, and uses almost exacty 2x the power of the 5700 XT at the same voltage.

View attachment 269003View attachment 269004

5700 XT Reference on the left, 6900 XT reference on the right.



The front end CAN feed the shader engines because it runs at higher clocks. Also, doubled RT cores? Where did you read that? Navi 31's ray accelerators were only increased by 50% according to the top post detailing the block diagram, and AMD themselves don't even mention a full 2x performance increase of the cores. Where did you get "double" from?

The reason there has only been an increase of 20% to TMU count is because there has only been an increase of 20% to CU count. From 80 to 96. The same reason why TMU count from 5700 XT to 6900 XT doubled, because they went from 40 to 80 CUs. That's how the architecture is laid out, I'm sorry you don't like it?
Yeah, Its almost hilarious and sad all together how a lot(?), some(?) people think they know what's needed to make a GPU faster/efficient/cheaper and so on...

Great analysis BTW. Looks like quite a few people can read behind the slides AMD shown. I've been reading/hearing it on a few analysis of RDNA3.

Indeed it looks like that AMD's "worst" spot on their design is the front end.
They know it, hence the decoupled FE/Shader clocks. They are trying to distribute die space, and keeping cost low by not using dirty expensive (on power and cost) and far too complex components = GDDR6X/HighBandwidthMemoryControllers*/Dedicated RT-AI cores. But they use an enhanced/larger cache system to substitute GDDR6X/HBMC*.
Their strongest point is cache system and MCD >> GCD feed as inside the GCD also (L1/L2). In fact it got so much stronger (effective bandwidth) that they even reduced the L3 infinity cache count (128MB >> 96MB) while its faster overall from RDNA2.

If in future (2023) we see a 7950XTX it will probably be a larger GCD with larger Front-End at least, besides maybe a few more CUs (for example +100~150mm²).
They already know how to make a 520mm² GCD.

They can match or pass the 600mm² of ADA in total die area (GCD+MCDs) with less cost still. They can also move to 4nm for this one
Imagine a 1500$ (450~500W) 7950XTX competing with a 2500$ (550~600W) 4090Ti.

Pure speculation, but not far fetched IMHO.

This is exactly what Coreteks stated in his latest video.
I saw that... he also said that the 7900XTX should cost 500$ just to make sense... :kookoo::banghead:

Wishful thinking or just pure absurdity?
 
Joined
Apr 1, 2017
Messages
420 (0.16/day)
System Name The Cum Blaster
Processor R9 5900x
Motherboard Gigabyte X470 Aorus Gaming 7 Wifi
Cooling Alphacool Eisbaer LT360
Memory 4x8GB Crucial Ballistix @ 3800C16
Video Card(s) 7900 XTX Nitro+
Storage Lots
Display(s) 4k60hz, 4k144hz
Case Obsidian 750D Airflow Edition
Power Supply EVGA SuperNOVA G3 750W
I saw that... he also said that the 7900XTX should cost 500$ just to make sense... :kookoo::banghead:

Wishful thinking or just pure absurdity?
coreteks is a channel that huffs his own farts too much and often says really stupid shit... from what I've seen he's #1 in the "look mom, I'm a pc tech analyst, here's me talking about how right I was when I got that one leak right" category
not to mention trying to emulate a british accent while obviously being an ESL... plus the voice; I can't stand his manner of speaking
 
Joined
Apr 30, 2020
Messages
886 (0.58/day)
System Name S.L.I + RTX research rig
Processor Ryzen 7 5800X 3D.
Motherboard MSI MEG ACE X570
Cooling Corsair H150i Cappellx
Memory Corsair Vengeance pro RGB 3200mhz 16Gbs
Video Card(s) 2x Dell RTX 2080 Ti in S.L.I
Storage Western digital Sata 6.0 SDD 500gb + fanxiang S660 4TB PCIe 4.0 NVMe M.2
Display(s) HP X24i
Case Corsair 7000D Airflow
Power Supply EVGA G+1600watts
Mouse Corsair Scimitar
Keyboard Cosair K55 Pro RGB
The only 'facts' I could find were your opinions on how they could have doubled the core count? Let's dissect:



So you want them to remove the ray accelerators from the CUs and put them on the backend as a dedicated block? That's expensive for die area since you now need to physically map and wire out all their supporting cache twice instead of giving them the shared data cache. Also, BVH is a shader-parallel operation, so why would you remove the ray accelerators from the CU and have to store ray data in a secondary location to fetch half way through? Just to increase latency for funsies? It makes no sense.



You want to create a new clock domain ONLY for the ray accelerators within the shader engine, and keep the shader engine clock domain linked to the front end. Sounds great, but those ray accelerators are with utmost certainty NOT what drives up core power draw. So pushing them off onto their own domain isn't going to save you any power. You'll go back to having the problem RDNA2 has where the front end is outrun by the CUs, and entire units stall for multiple cycles waiting for wavefronts.



The shaders are underclocked to save power. I guarantee you when reviews come out, people will attempt to overclock the shader engines and they will see performance scale with clocks, as well as significant increases in load power draw. The front end has been left alone because it needs the extra bandwidth to feed the CUs, this so far is true. Double the I$ to improve ray tracing on the backend (ray tracing happens in the CUs, not on the backend) without adding shaders to each unit? What? The ray accelerators reside within the CU, how would they add more ray accelerators without adding more shaders? Empty CUs? Oh, wait I think you're back on that idea of removing the ray accelerators from the CUs and grouping them to the backend again. We've been over this.



Correct, and they more than doubled the die size in turn. The beauty in this comparison is that the 5700 XT was pushed to the absolute limit of what it could endure, and a properly tuned (980mV-1050mV) 5700 XT runs more like 170W. Unsurprisingly if you look at the 6900 XT it's tuned for that exact voltage range, and uses almost exacty 2x the power of the 5700 XT at the same voltage.

View attachment 269003View attachment 269004

5700 XT Reference on the left, 6900 XT reference on the right.



The front end CAN feed the shader engines because it runs at higher clocks. Also, doubled RT cores? Where did you read that? Navi 31's ray accelerators were only increased by 50% according to the top post detailing the block diagram, and AMD themselves don't even mention a full 2x performance increase of the cores. Where did you get "double" from?

The reason there has only been an increase of 20% to TMU count is because there has only been an increase of 20% to CU count. From 80 to 96. The same reason why TMU count from 5700 XT to 6900 XT doubled, because they went from 40 to 80 CUs. That's how the architecture is laid out, I'm sorry you don't like it?

The RX 7900 XTX will be only 20% faster on average than a RX 6950 XT in raytracing.
 
Last edited:
Joined
Nov 26, 2021
Messages
1,415 (1.47/day)
Location
Mississauga, Canada
Processor Ryzen 7 5700X
Motherboard ASUS TUF Gaming X570-PRO (WiFi 6)
Cooling Noctua NH-C14S (two fans)
Memory 2x16GB DDR4 3200
Video Card(s) Reference Vega 64
Storage Intel 665p 1TB, WD Black SN850X 2TB, Crucial MX300 1TB SATA, Samsung 830 256 GB SATA
Display(s) Nixeus NX-EDG27, and Samsung S23A700
Case Fractal Design R5
Power Supply Seasonic PRIME TITANIUM 850W
Mouse Logitech
VR HMD Oculus Rift
Software Windows 11 Pro, and Ubuntu 20.04
The RX 7900 XTX is only 20% faster on average than a RX 6950 XT in raytracing.
I think you're mistaken. The claim is up to 50% faster per CU and since it has 20% more CUs, that means up to 80% faster.
 
Top