Sunday, November 6th 2022

AMD RDNA3 Navi 31 GPU Block Diagram Leaked, Confirmed to be PCIe Gen 4

Nov 6th, 2022 23:28 Discuss (79 Comments)

An alleged leaked company slide details AMD's upcoming 5 nm "Navi 31" GPU powering the next-generation Radeon RX 7900 XTX and RX 7900 XT graphics cards. The slide details the "Navi 31" MCM, with its central graphics compute die (GCD) chiplet that's built on the 5 nm EUV silicon fabrication process, surrounded by six memory cache dies (MCDs), each built on the 6 nm process. The GCD interfaces with the system over a PCI-Express 4.0 x16 host interface. It features the latest-generation multimedia engine with dual-stream encoders; and the new Radiance display engine with DisplayPort 2.1 and HDMI 2.1a support. Custom interconnects tie it with the six MCDs.

Each MCD has 16 MB of Infinity Cache (L3 cache); and a 64-bit GDDR6 memory interface (two 32-bit GDDR6 paths). Six of these add up to the GPU's 384-bit GDDR6 memory interface. In the scheme of things, the GPU has a contiguous and monolithic 384-bit wide memory bus, because every modern GPU uses multiple on-die memory controllers to achieve a wide memory bus. "Navi 31" hence has a total Infinity Cache size of 96 MB—which may be less in comparison to the 128 MB on "Navi 21," but AMD has shored up cache sizes across the GPU. The L0 caches on the compute units is now increased numerically by 240%. The L1 caches by 300%, and the L2 cache shared among the shader engines, by 50%. The RX 7900 XTX is confirmed to use 20 Gbps GDDR6 memory in this slide, for 960 GB/s of memory bandwidth.

The GCD features six Shader Engines, each with 16 compute units (or 8 dual compute units), which work out to 1,024 stream processors. AMD claims to have doubled the IPC of these stream processors over RDNA2. The new RDNA3 ALUs also support BF16 instructions. The SIMD engine of "Navi 31" has an FP32 throughput of 61.6 TFLOP/s, a 168% increase over the 23 TFLOP/s of the "Navi 21." The slide doesn't quite detail the new Ray Tracing engine, but references new RT features, larger caches, 50% higher ray intersection rate, for an up to 1.8X RT performance increase at 2.505 GHz engine clocks; over the RX 6950 XT. There are other major upgrades to the GPU's raster 3D capabilities, including a 50% increase in prim/clk rates, and 100% increase in prim/vertex cull rates. The pixel pipeline sees similar 50% increases in rasterized prims/clock and pixels/clock; and synchronous pixel-wait.

Source: VideoCardz

Add your own comment

79 Comments on AMD RDNA3 Navi 31 GPU Block Diagram Leaked, Confirmed to be PCIe Gen 4

#51

ARF

THU31I definitely hope they will care again, even though it seems unlikely.

Next year I am planning on building a separate strictly gaming PC and I want to convert my current build to an HTPC that will handle everything else, including recording with a capture card. I would definitely want a cheap graphics card with AV1 encoding. If NVIDIA or AMD do not offer such a product, I might go with Intel (A380 or whatever).

If AMD wants to save its market share, it must restructure its strategy and start to offer small GPUs with maximum media qualities - otherwise they will never save their faces in front of the larger audience and will always be felt as the underdogs by the vast majority of users.

One can hope that Navi 33 (June 2023?) will fix these problems but I think they need Navi 34 with PCIe 4.0 x8, AV1, DisplayPort 2.1 and a price tag of $129.

#52

Chrispy_

mechtechMeh. As long as the bottom cards have a good media engine…….unlike the 6400…….

I'm pretty certain that the 6400 and 6500 only exist as desktop cards at all because of the disastrous coincidence of ETH mining boom and the pandemic screwing with production/supply

They weren't designed to be standalone products - they're literally in the AMD roadmaps from almost 4 years ago as laptop-only auxilliary GPU cores to complement an AMD IGP.

#53

ARF

Chrispy_I'm pretty certain that the 6400 and 6500 only exist as desktop cards at all because of the disastrous coincidence of ETH mining boom and the pandemic screwing with production/supply

They weren't designed to be standalone products - they're literally in the AMD roadmaps from almost 4 years ago as laptop-only auxilliary GPU cores to complement an AMD IGP.

I think AMD's general management capabilities are quite screwed, to begin with.
No pandemics can erase certain plans - and when there had never been plans for decent offers, you can't blame the external factors.

Its roadmap is lackluster.

#54

AnotherReader

dgianstefaniAvailability is irrelevant when 80% of people still choose their competition.

Their top tier product competes in raster to Nvidia's 3rd/4th product down the stack (4090ti, 4090, 4080ti, 4080), therefore the fact it's cheaper is borderline irrelevant. That's without getting started on the non-raster advantages NVIDIA has.

In other words, the situation hasn't changed since the 6950xt and the 3090ti/6900 and 3090.

I think you're wrong about the competitive position; the 7900 XTX should blow the 4080 away when it comes to rasterization and since the 4080 Ti doesn't exist, we shouldn't be considering that. All this has been achieved with a miniscule die compared to the 4090 and the power penalty of off-chip interconnects.

ARFIf AMD wants to save its market share, it must restructure its strategy and start to offer small GPUs with maximum media qualities - otherwise they will never save their faces in front of the larger audience and will always be felt as the underdogs by the vast majority of users.

One can hope that Navi 33 (June 2023?) will fix these problems but I think they need Navi 34 with PCIe 4.0 x8, AV1, DisplayPort 2.1 and a price tag of $129.

This market has almost entirely been taken over by the increasingly more capable IGPs. Wasting expensive 5 nm wafers on a 7400 or 7500 would be unwise. They might only offer those SKUs if Intel decides to release an Arc B380.

#55

ARF

AnotherReaderThis market has almost entirely been taken over by the increasingly more capable IGPs. Wasting expensive 5 nm wafers on a 7400 or 7500 would be unwise. They might only offer those SKUs if Intel decides to release an Arc B380.

Are there iGPUs which have gaming performance equal to RX 6400's?

There are market segments which require improving gaming performance (RX 6500 XT is stupid to begin with because it has 0% improvement over the older RX 5500 XT) on the entry discrete cards, which can go with any system Intel or AMD...

#56

AnotherReader

ARFAre there iGPUs which have gaming performance equal to RX 6400's?

There are market segments which require improving gaming performance (RX 6500 XT is stupid to begin with because it has 0% improvement over the older RX 5500 XT) on the entry discrete cards, which can go with any system Intel or AMD...

No, there aren't any in the x86 world, but the IGP in the mobile Zen 3 refresh is equivalent to the 1050 Ti. Keep in mind that it's hobbled by ho-hum DDR5 4800 and lower clock speeds due to the low TDP of the platform. A similar desktop APU would be faster, though it probably won't be a 6400 competitor.

#57

THU31

AnotherReaderThis market has almost entirely been taken over by the increasingly more capable IGPs. Wasting expensive 5 nm wafers on a 7400 or 7500 would be unwise. They might only offer those SKUs if Intel decides to release an Arc B380.

They do not have to be made on 5 nm, though. They can make them on 7 nm or anything else. It is about the feature-set, not the manufacturing process. A low-end GPU does not have to be made on the newest process, power consumption is not really an issue.

iGPUs are cool, but they require you to buy the entire platform. You can have an older PC and you might want to upgrade just the graphics card to get some new features and a bit more performance. They do not have to make a lot of those cards.

#58

ARF

THU31They do not have to be made on 5 nm, though. They can make them on 7 nm or anything else. It is about the feature-set, not the manufacturing process. A low-end GPU does not have to be made on the newest process, power consumption is not really an issue.

iGPUs are cool, but they require you to buy the entire platform. You can have an older PC and you might want to upgrade just the graphics card to get some new features and a bit more performance. They do not have to make a lot of those cards.

100%.

The performance gap between RX 6400 and RTX 4090 is too wide.
AMD can be back in business if it succeeds to move this down to only 400%.

AMD also needs to start making pipecleaners. How about a 50 sq. mm GPU on TSMC N3 with chiplets on N7 made now, NOW?

#59

AnotherReader

THU31They do not have to be made on 5 nm, though. They can make them on 7 nm or anything else. It is about the feature-set, not the manufacturing process. A low-end GPU does not have to be made on the newest process, power consumption is not really an issue.

iGPUs are cool, but they require you to buy the entire platform. You can have an older PC and you might want to upgrade just the graphics card to get some new features and a bit more performance. They do not have to make a lot of those cards.

Those are all valid points. Unfortunately, it doesn't seem that Nvidia and AMD are listening. Nvidia hasn't bothered releasing anything newer than the 1650 for this segment, and the 6400 and 6500 wouldn't have existed without the crypto boom.

#60

THU31

AnotherReaderNvidia hasn't bothered releasing anything newer than the 1650 for this segment, and the 6400 and 6500 wouldn't have existed without the crypto boom.

That is true. Although the RTX 3050 is a fine card in my opinion if it actually cost $250. But today it already has an outdated feature-set. If they made a 4050 for actual $250, I would probably pay the extra money. I could even do some light gaming on it and have it as a backup in case my main gaming PC has a problem.

#61

mkppo

dgianstefaniAvailability is irrelevant when 80% of people still choose their competition.

Their top tier product competes in raster to Nvidia's 3rd/4th product down the stack (4090ti, 4090, 4080ti, 4080), therefore the fact it's cheaper is borderline irrelevant. That's without getting started on the non-raster advantages NVIDIA has.

In other words, the situation hasn't changed since the 6950xt and the 3090ti/6900 and 3090.

There's no possible way the 7900 xtx competes with the 4080 in raster. It's predicted to be within 10-15% of the 4090. The 4080 can't possibly be that close to the 4090 by looking at the specs. RT - yeah fair point it's a lot slower and they don't have CUDA. I just don't see too many other advantages of Nvidia - DLSS 2.0 is pretty similar to FSR 2 and their NVENC is matched now by Navi 31 on paper. On the other hand AMD have a much better control panel, don't use the terrible 12VHPWR connector, much better form factor and is 20-60% cheaper. At best, I can see a 4080 somewhat comparable to 7900xt on raster.

I don't care much for RT anyway, and prefer 240hz on a 3k screen. Time to retire the 3090 and 7900xtx here I come.

#62

DemonicRyzen666

DemonicRyzen666From What I can Read the lack of TMU's is the main problem with this desgin.
the RX 6950 has 320 Texture units along with 128 R.O.P's, which gives it 739.2 GTexel/s at 2100mhz-2300mhz
While the RX 7900 XTX has 384 Texture units along with 192 R.O.P's
The Texture fill rate should be 887.04GTexel/s, not 961.9 GTexel/s There is no way this GPU is a that.

^ this part should be bigger. They double Instructions in preclock.
I feel like the should have just double the RT core count in the compute part instead of increasing Shader count by 2.7 times.

This setup was fine from RDNA2. It just needed refinement RDNA3 doesn't refine this setup at all it slow it down, espscially in raytracing. If AMD added in a raytracing instruction cache & raytracing data cache to RDNA2 that's seprate for the RT core on the back end. The Rt cores would be unbottlenecked by the texture being linked to them. You could unlink the shader clocks from the RT cores then size it appropriately. Instead of a single double size front end being unlinked from shaders like it is in RDNA3. The shaders are under clocked, because the Instruction cache can't feed them fast enough at the same clock since they're more them now too. Or just double the instruction cache like they did to improve the raytracing on the backend without the adding of the shaders to each unit. If you want proof All you have to do is look a Navi vs Navi 2. They doubled textures units, double R.O.P's, doubled Shaders, & double Compute units, added infinity Cache all on the same 7nm node,Plus the power going from 225 watts to 330 watts or a 50% increase in power. They couldn't do it in Navi 3 at all, because it's front end can't feed the increase in shaders & doubled RT cores now It wouldn't be able to support double the texture units inside the Wgp/compute". That is why there is only a 20% increase in texture units this time.

P.S Anyone feel like modifying this in photoshop toshow the new way RDNA3 is setup?

#63

beedoo

dgianstefaniAvailability is irrelevant when 80% of people still choose their competition.

Their top tier product competes in raster to Nvidia's 3rd/4th product down the stack (4090ti, 4090, 4080ti, 4080), therefore the fact it's cheaper is borderline irrelevant. That's without getting started on the non-raster advantages NVIDIA has.

In other words, the situation hasn't changed since the 6950xt and the 3090ti/6900 and 3090.

You're talking about an alleged top tier AMD product that hasn't actually been released vs 3 out of 4 Nvidia products that haven't been released / bench-marked by tech-websites. This doesn't make much sense to me...

#64

Chrispy_

AnotherReaderThose are all valid points. Unfortunately, it doesn't seem that Nvidia and AMD are listening. Nvidia hasn't bothered releasing anything newer than the 1650 for this segment, and the 6400 and 6500 wouldn't have existed without the crypto boom.

RTX 3050's raytracing performance is already too bad to be relevant in many games.

Nvidia will never replace the 1650 because nothing at that size has a snowball's chance in hell of raytracing or running DLSS acceptably. RTX and DLSS are Nvidia's competitive edge and they won't undermine that no matter what.

I'd love to see a GA108 die, maybe 1920 CUDA cores, clocked at ~1.8GHz, 6GB of cheap, slow 12Gbps GDDR6, and all running at <75W on Samsung N8 or TSMC6. There'd be no need for Tensor cores or RT cores and the aim would be to make the die as small and cheap to produce as possible.

It'll never happen though. Nvidia have abandoned the low-end to AMD and Intel IGPs, just above that they can still make profit for zero effort by churning out 1050Ti and 1650. They're neither modern nor efficient but Nvidia doesn't care about that, they only care about profit.

#65

dalekdukesboy

dgianstefaniThe "advanced chiplets design" would only be "disruptive" vs Monolithic if AMD either
1. Passed on the cost savings to consumers (prices are same for xtx, and 7900xt is worse cu count % wise to top sku than 6800xt was to 6900xt)
2. Actually used the chiplets to have more cu, instead of just putting memory controllers on there. Thereby maybe actually competing in RTRT performance or vs the 4090.

AMD is neither taking advantage of their cost savings and going the value route, or competing vs their competition's best. They're releasing a product months later, with zero performance advantages vs the competition? Potential 4080ti will still be faster and they've just given nvidia free reign (once again) to charge whatever they want for flagships since there's zero competition.

7950xtx could have two 7900xtx dies, but I doubt it

Interesting…so you know this? Even if it hits 3 ghz? Also does anyone know when embargo lifts and we get actual reviews of these cards?

#66

Fouquin

dgianstefaniActually used the chiplets to have more cu, instead of just putting memory controllers on there. Thereby maybe actually competing in RTRT performance or vs the 4090.

Memory bandwidth IS performance, though. Those extra CUs do you no good at all if they're sitting idle with no work distribution. (Look back at GCN's headaches regarding idle logic via bandwidth starvation.) Shoving hardware in with no improvement to your memory and fabric bandwidth is the actual waste of die space.

#67

DemonicRyzen666

FouquinMemory bandwidth IS performance, though. Those extra CUs do you no good at all if they're sitting idle with no work distribution. (Look back at GCN's headaches regarding idle logic via bandwidth starvation.) Shoving hardware in with no improvement to your memory and fabric bandwidth is the actual waste of die space.

What? :confused: Vega was Based on GCN & couldn't even use the bandwith from HBM2.

#68

Fouquin

DemonicRyzen666What? :confused: Vega was Based on GCN & couldn't even use the bandwith from HBM2.

You've correctly isolated one generation of GCN where bandwidth exceeded the front end and shaders (though Vega's ROPs absolutely use that bandwidth, look at Vega 64 at stock HBM2 clocks vs 1200MHz, 30% improvement to pixel fill in isolated tests.) Now if we look back at the other 5 generations of GCN that exist.

An incredibly important distinction to make is fabric bandwidth from VRAM theoretical bandwidth. Getting 960GB/s at the PHYs does NOT equate to pushing terabytes per second of data across the die. One of the major steps forward for RDNA3's design is that increased fabric bandwidth. This is something that other companies are absolutely killing right now, not to name any fruit related names, and leaving people confused about how they can manage near linear scaling with increased shader grouping.

#69

DemonicRyzen666

FouquinYou've correctly isolated one generation of GCN where bandwidth exceeded the front end and shaders (though Vega's ROPs absolutely use that bandwidth, look at Vega 64 at stock HBM2 clocks vs 1200MHz, 30% improvement to pixel fill in isolated tests.) Now if we look back at the other 5 generations of GCN that exist.

It dose not. Nore does any older GCN.
Why do you try posting fact & prove it.
"Shader utilization" on RDNA 3 will be less than what it is on RDNA2. I already shown why, with facts.

#70

Fouquin

DemonicRyzen666"Shader utilization" on RDNA 3 will be less than what it is on RDNA2. I already shown why, with facts.

The only 'facts' I could find were your opinions on how they could have doubled the core count? Let's dissect:

DemonicRyzen666It just needed refinement RDNA3 doesn't refine this setup at all it slow it down, espscially in raytracing. If AMD added in a raytracing instruction cache & raytracing data cache to RDNA2 that's seprate for the RT core on the back end.

So you want them to remove the ray accelerators from the CUs and put them on the backend as a dedicated block? That's expensive for die area since you now need to physically map and wire out all their supporting cache twice instead of giving them the shared data cache. Also, BVH is a shader-parallel operation, so why would you remove the ray accelerators from the CU and have to store ray data in a secondary location to fetch half way through? Just to increase latency for funsies? It makes no sense.

DemonicRyzen666You could unlink the shader clocks from the RT cores then size it appropriately. Instead of a single double size front end being unlinked from shaders like it is in RDNA3.

You want to create a new clock domain ONLY for the ray accelerators within the shader engine, and keep the shader engine clock domain linked to the front end. Sounds great, but those ray accelerators are with utmost certainty NOT what drives up core power draw. So pushing them off onto their own domain isn't going to save you any power. You'll go back to having the problem RDNA2 has where the front end is outrun by the CUs, and entire units stall for multiple cycles waiting for wavefronts.

DemonicRyzen666The shaders are under clocked, because the Instruction cache can't feed them fast enough at the same clock since they're more them now too. Or just double the instruction cache like they did to improve the raytracing on the backend without the adding of the shaders to each unit.

The shaders are underclocked to save power. I guarantee you when reviews come out, people will attempt to overclock the shader engines and they will see performance scale with clocks, as well as significant increases in load power draw. The front end has been left alone because it needs the extra bandwidth to feed the CUs, this so far is true. Double the I$ to improve ray tracing on the backend (ray tracing happens in the CUs, not on the backend) without adding shaders to each unit? What? The ray accelerators reside within the CU, how would they add more ray accelerators without adding more shaders? Empty CUs? Oh, wait I think you're back on that idea of removing the ray accelerators from the CUs and grouping them to the backend again. We've been over this.

DemonicRyzen666If you want proof All you have to do is look a Navi vs Navi 2. They doubled textures units, double R.O.P's, doubled Shaders, & double Compute units, added infinity Cache all on the same 7nm node,Plus the power going from 225 watts to 330 watts or a 50% increase in power.

Correct, and they more than doubled the die size in turn. The beauty in this comparison is that the 5700 XT was pushed to the absolute limit of what it could endure, and a properly tuned (980mV-1050mV) 5700 XT runs more like 170W. Unsurprisingly if you look at the 6900 XT it's tuned for that exact voltage range, and uses almost exacty 2x the power of the 5700 XT at the same voltage.

5700 XT Reference on the left, 6900 XT reference on the right.

DemonicRyzen666They couldn't do it in Navi 3 at all, because it's front end can't feed the increase in shaders & doubled RT cores now It wouldn't be able to support double the texture units inside the Wgp/compute". That is why there is only a 20% increase in texture units this time.

The front end CAN feed the shader engines because it runs at higher clocks. Also, doubled RT cores? Where did you read that? Navi 31's ray accelerators were only increased by 50% according to the top post detailing the block diagram, and AMD themselves don't even mention a full 2x performance increase of the cores. Where did you get "double" from?

The reason there has only been an increase of 20% to TMU count is because there has only been an increase of 20% to CU count. From 80 to 96. The same reason why TMU count from 5700 XT to 6900 XT doubled, because they went from 40 to 80 CUs. That's how the architecture is laid out, I'm sorry you don't like it?

#71

Zach_01

FouquinThe only 'facts' I could find were your opinions on how they could have doubled the core count? Let's dissect:

So you want them to remove the ray accelerators from the CUs and put them on the backend as a dedicated block? That's expensive for die area since you now need to physically map and wire out all their supporting cache twice instead of giving them the shared data cache. Also, BVH is a shader-parallel operation, so why would you remove the ray accelerators from the CU and have to store ray data in a secondary location to fetch half way through? Just to increase latency for funsies? It makes no sense.

You want to create a new clock domain ONLY for the ray accelerators within the shader engine, and keep the shader engine clock domain linked to the front end. Sounds great, but those ray accelerators are with utmost certainty NOT what drives up core power draw. So pushing them off onto their own domain isn't going to save you any power. You'll go back to having the problem RDNA2 has where the front end is outrun by the CUs, and entire units stall for multiple cycles waiting for wavefronts.

The shaders are underclocked to save power. I guarantee you when reviews come out, people will attempt to overclock the shader engines and they will see performance scale with clocks, as well as significant increases in load power draw. The front end has been left alone because it needs the extra bandwidth to feed the CUs, this so far is true. Double the I$ to improve ray tracing on the backend (ray tracing happens in the CUs, not on the backend) without adding shaders to each unit? What? The ray accelerators reside within the CU, how would they add more ray accelerators without adding more shaders? Empty CUs? Oh, wait I think you're back on that idea of removing the ray accelerators from the CUs and grouping them to the backend again. We've been over this.

Correct, and they more than doubled the die size in turn. The beauty in this comparison is that the 5700 XT was pushed to the absolute limit of what it could endure, and a properly tuned (980mV-1050mV) 5700 XT runs more like 170W. Unsurprisingly if you look at the 6900 XT it's tuned for that exact voltage range, and uses almost exacty 2x the power of the 5700 XT at the same voltage.

5700 XT Reference on the left, 6900 XT reference on the right.

The front end CAN feed the shader engines because it runs at higher clocks. Also, doubled RT cores? Where did you read that? Navi 31's ray accelerators were only increased by 50% according to the top post detailing the block diagram, and AMD themselves don't even mention a full 2x performance increase of the cores. Where did you get "double" from?

The reason there has only been an increase of 20% to TMU count is because there has only been an increase of 20% to CU count. From 80 to 96. The same reason why TMU count from 5700 XT to 6900 XT doubled, because they went from 40 to 80 CUs. That's how the architecture is laid out, I'm sorry you don't like it?

Yeah, Its almost hilarious and sad all together how a lot(?), some(?) people think they know what's needed to make a GPU faster/efficient/cheaper and so on...

Great analysis BTW. Looks like quite a few people can read behind the slides AMD shown. I've been reading/hearing it on a few analysis of RDNA3.

Indeed it looks like that AMD's "worst" spot on their design is the front end.
They know it, hence the decoupled FE/Shader clocks. They are trying to distribute die space, and keeping cost low by not using dirty expensive (on power and cost) and far too complex components = GDDR6X/HighBandwidthMemoryControllers*/Dedicated RT-AI cores. But they use an enhanced/larger cache system to substitute GDDR6X/HBMC*.
Their strongest point is cache system and MCD >> GCD feed as inside the GCD also (L1/L2). In fact it got so much stronger (effective bandwidth) that they even reduced the L3 infinity cache count (128MB >> 96MB) while its faster overall from RDNA2.

If in future (2023) we see a 7950XTX it will probably be a larger GCD with larger Front-End at least, besides maybe a few more CUs (for example +100~150mm²).
They already know how to make a 520mm² GCD.

They can match or pass the 600mm² of ADA in total die area (GCD+MCDs) with less cost still. They can also move to 4nm for this one
Imagine a 1500$ (450~500W) 7950XTX competing with a 2500$ (550~600W) 4090Ti.

Pure speculation, but not far fetched IMHO.

NopaThis is exactly what Coreteks stated in his latest video.

I saw that... he also said that the 7900XTX should cost 500$ just to make sense... :kookoo::banghead:

Wishful thinking or just pure absurdity?

#72

spnidel

Zach_01I saw that... he also said that the 7900XTX should cost 500$ just to make sense... :kookoo::banghead:

Wishful thinking or just pure absurdity?

coreteks is a channel that huffs his own farts too much and often says really stupid shit... from what I've seen he's #1 in the "look mom, I'm a pc tech analyst, here's me talking about how right I was when I got that one leak right" category
not to mention trying to emulate a british accent while obviously being an ESL... plus the voice; I can't stand his manner of speaking

#73

DemonicRyzen666

FouquinThe only 'facts' I could find were your opinions on how they could have doubled the core count? Let's dissect:

So you want them to remove the ray accelerators from the CUs and put them on the backend as a dedicated block? That's expensive for die area since you now need to physically map and wire out all their supporting cache twice instead of giving them the shared data cache. Also, BVH is a shader-parallel operation, so why would you remove the ray accelerators from the CU and have to store ray data in a secondary location to fetch half way through? Just to increase latency for funsies? It makes no sense.

You want to create a new clock domain ONLY for the ray accelerators within the shader engine, and keep the shader engine clock domain linked to the front end. Sounds great, but those ray accelerators are with utmost certainty NOT what drives up core power draw. So pushing them off onto their own domain isn't going to save you any power. You'll go back to having the problem RDNA2 has where the front end is outrun by the CUs, and entire units stall for multiple cycles waiting for wavefronts.

The shaders are underclocked to save power. I guarantee you when reviews come out, people will attempt to overclock the shader engines and they will see performance scale with clocks, as well as significant increases in load power draw. The front end has been left alone because it needs the extra bandwidth to feed the CUs, this so far is true. Double the I$ to improve ray tracing on the backend (ray tracing happens in the CUs, not on the backend) without adding shaders to each unit? What? The ray accelerators reside within the CU, how would they add more ray accelerators without adding more shaders? Empty CUs? Oh, wait I think you're back on that idea of removing the ray accelerators from the CUs and grouping them to the backend again. We've been over this.

Correct, and they more than doubled the die size in turn. The beauty in this comparison is that the 5700 XT was pushed to the absolute limit of what it could endure, and a properly tuned (980mV-1050mV) 5700 XT runs more like 170W. Unsurprisingly if you look at the 6900 XT it's tuned for that exact voltage range, and uses almost exacty 2x the power of the 5700 XT at the same voltage.

5700 XT Reference on the left, 6900 XT reference on the right.

The front end CAN feed the shader engines because it runs at higher clocks. Also, doubled RT cores? Where did you read that? Navi 31's ray accelerators were only increased by 50% according to the top post detailing the block diagram, and AMD themselves don't even mention a full 2x performance increase of the cores. Where did you get "double" from?

The reason there has only been an increase of 20% to TMU count is because there has only been an increase of 20% to CU count. From 80 to 96. The same reason why TMU count from 5700 XT to 6900 XT doubled, because they went from 40 to 80 CUs. That's how the architecture is laid out, I'm sorry you don't like it?

The RX 7900 XTX will be only 20% faster on average than a RX 6950 XT in raytracing.

#74

AnotherReader

DemonicRyzen666The RX 7900 XTX is only 20% faster on average than a RX 6950 XT in raytracing.

I think you're mistaken. The claim is up to 50% faster per CU and since it has 20% more CUs, that means up to 80% faster.

#75

Nopa

AnotherReaderI think you're mistaken. The claim is up to 50% faster per CU and since it has 20% more CUs, that means up to 80% faster.

7900 XTX (final retail version with Adrenalin WHQL uplifts) matching/slightly beaten 4090 FE in both RT and Rasterisation, is this outside the realm of possibility ?

Add your own comment

AMD RDNA3 Navi 31 GPU Block Diagram Leaked, Confirmed to be PCIe Gen 4

79 Comments on AMD RDNA3 Navi 31 GPU Block Diagram Leaked, Confirmed to be PCIe Gen 4

Latest GPU Drivers

New Forum Posts

Popular Reviews

TPU on YouTube

Controversial News Posts

AMD RDNA3 Navi 31 GPU Block Diagram Leaked, Confirmed to be PCIe Gen 4

Related News

79 Comments on AMD RDNA3 Navi 31 GPU Block Diagram Leaked, Confirmed to be PCIe Gen 4

Latest GPU Drivers

New Forum Posts

Popular Reviews

TPU on YouTube

Controversial News Posts