• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

AMD RDNA3 Navi 31 GPU Block Diagram Leaked, Confirmed to be PCIe Gen 4

btarunr

Editor & Senior Moderator
Staff member
Joined
Oct 9, 2007
Messages
47,670 (7.43/day)
Location
Dublin, Ireland
System Name RBMK-1000
Processor AMD Ryzen 7 5700G
Motherboard Gigabyte B550 AORUS Elite V2
Cooling DeepCool Gammax L240 V2
Memory 2x 16GB DDR4-3200
Video Card(s) Galax RTX 4070 Ti EX
Storage Samsung 990 1TB
Display(s) BenQ 1440p 60 Hz 27-inch
Case Corsair Carbide 100R
Audio Device(s) ASUS SupremeFX S1220A
Power Supply Cooler Master MWE Gold 650W
Mouse ASUS ROG Strix Impact
Keyboard Gamdias Hermes E2
Software Windows 11 Pro
An alleged leaked company slide details AMD's upcoming 5 nm "Navi 31" GPU powering the next-generation Radeon RX 7900 XTX and RX 7900 XT graphics cards. The slide details the "Navi 31" MCM, with its central graphics compute die (GCD) chiplet that's built on the 5 nm EUV silicon fabrication process, surrounded by six memory cache dies (MCDs), each built on the 6 nm process. The GCD interfaces with the system over a PCI-Express 4.0 x16 host interface. It features the latest-generation multimedia engine with dual-stream encoders; and the new Radiance display engine with DisplayPort 2.1 and HDMI 2.1a support. Custom interconnects tie it with the six MCDs.

Each MCD has 16 MB of Infinity Cache (L3 cache); and a 64-bit GDDR6 memory interface (two 32-bit GDDR6 paths). Six of these add up to the GPU's 384-bit GDDR6 memory interface. In the scheme of things, the GPU has a contiguous and monolithic 384-bit wide memory bus, because every modern GPU uses multiple on-die memory controllers to achieve a wide memory bus. "Navi 31" hence has a total Infinity Cache size of 96 MB—which may be less in comparison to the 128 MB on "Navi 21," but AMD has shored up cache sizes across the GPU. The L0 caches on the compute units is now increased numerically by 240%. The L1 caches by 300%, and the L2 cache shared among the shader engines, by 50%. The RX 7900 XTX is confirmed to use 20 Gbps GDDR6 memory in this slide, for 960 GB/s of memory bandwidth.



The GCD features six Shader Engines, each with 16 compute units (or 8 dual compute units), which work out to 1,024 stream processors. AMD claims to have doubled the IPC of these stream processors over RDNA2. The new RDNA3 ALUs also support BF16 instructions. The SIMD engine of "Navi 31" has an FP32 throughput of 61.6 TFLOP/s, a 168% increase over the 23 TFLOP/s of the "Navi 21." The slide doesn't quite detail the new Ray Tracing engine, but references new RT features, larger caches, 50% higher ray intersection rate, for an up to 1.8X RT performance increase at 2.505 GHz engine clocks; over the RX 6950 XT. There are other major upgrades to the GPU's raster 3D capabilities, including a 50% increase in prim/clk rates, and 100% increase in prim/vertex cull rates. The pixel pipeline sees similar 50% increases in rasterized prims/clock and pixels/clock; and synchronous pixel-wait.

View at TechPowerUp Main Site | Source
 
Since gen4 never got saturated anyway I mean this doesn't really matter either way, thats my understanding. So this really isnt news?
 
This doesn't look like a new design from RDNA2, they just split cache and memory controller into more chiplets the MCD's and crammed more CU into GCD.

1667800814061.png
 
Since gen4 never got saturated anyway I mean this doesn't really matter either way, thats my understanding. So this really isnt news?
+1
Don't really think this deserves a news headline in the front page.
 
I thought everyone knew this already, this is like Zen1, 8900 will be much better maybe with double cache (196 :eek:).

And isn't the shader running slower this time around ?, (2.3). Be interesting what AIB's will end up doing with it.
 
This doesn't look like a new design from RDNA2, they just split cache and memory controller into more chiplets the MCD's and crammed more CU into GCD.

View attachment 268826
Crammed, I'm sure that word is used a lot in cutting edge technology conversations.

cram.jpg
 
The RX 7900 XTX is confirmed to use 20 Gbps GDDR6 memory in this slide, for 960 GB/s of memory bandwidth.

I thought the 7900 XTX used 24 GB of DDR6. Slide shows up to 24. Think this might need edited.
 
I thought they stopped calling it a "Dual Compute Unit" but now supposed to call them "Workgroup Processors" (WGP) ??

Ah well, whatever. I know what they mean. I just... thought they changed the name already.
 
The RX 7900 XTX is confirmed to use 20 Gbps GDDR6 memory in this slide, for 960 GB/s of memory bandwidth.

I thought the 7900 XTX used 24 GB of DDR6. Slide shows up to 24. Think this might need edited.
You are mixing your Gbps and GB. Speed and capacity
 
The other hidden tidbit is that the boost clock will be 2.5 Ghz
 
let AMD undercut NV by 50% $ for 15% performance in the top, everyone (both 'camps') will be very happy with it.
The only relevant sku for Ada should be 4090/ti. Push the 'un-luche' for 4080 16GB as well, the confusion is still here.
May all high to low tier dominate by Radeon for this gen.
I very much hope AMD will not fallow NV on the race to the top Wattage consumption with 7950XTX with 4/5*8PIN.
Focus on efficiency and cost and you will see your market share grows like never before.
And for heaven's sake, give already us a proper CUDA competitor for video encode and AI computation.
 
Being PCI-E 4.0 only shows consumer graphics cards really dont need 5.0 or 6.0 for that matter.
Yes but PCIe 5.0 would come in handy in cases when you can't have all 16 lanes for the GPU, and these cases are not so rare in latest motherboards.
 
The "advanced chiplets design" would only be "disruptive" vs Monolithic if AMD either
1. Passed on the cost savings to consumers (prices are same for xtx, and 7900xt is worse cu count % wise to top sku than 6800xt was to 6900xt)
2. Actually used the chiplets to have more cu, instead of just putting memory controllers on there. Thereby maybe actually competing in RTRT performance or vs the 4090.

AMD is neither taking advantage of their cost savings and going the value route, or competing vs their competition's best. They're releasing a product months later, with zero performance advantages vs the competition? Potential 4080ti will still be faster and they've just given nvidia free reign (once again) to charge whatever they want for flagships since there's zero competition.

7950xtx could have two 7900xtx dies, but I doubt it
 
This doesn't look like a new design from RDNA2, they just split cache and memory controller into more chiplets the MCD's and crammed more CU into GCD.

View attachment 268826

From What I can Read the lack of TMU's is the main problem with this desgin.
the RX 6950 has 320 Texture units along with 128 R.O.P's, which gives it 739.2 GTexel/s at 2100mhz-2300mhz
While the RX 7900 XTX has 384 Texture units along with 192 R.O.P's
The Texture fill rate should be 887.04GTexel/s, not 961.9 GTexel/s There is no way this GPU is a that.

2020-12-01-image-2.png

^ this part should be bigger. They double Instructions in preclock.
I feel like the should have just double the RT core count in the compute part instead of increasing Shader count by 2.7 times.
 
Yes but PCIe 5.0 would come in handy in cases when you can't have all 16 lanes for the GPU, and these cases are not so rare in latest motherboards.
I get you but a 800/1000£ GPU should never be in anything but the top x16 on a motherboard for typically usage.

And it is rare, no unheard of for a consumer motherboard to have no x16 slot.
 
Yes but PCIe 5.0 would come in handy in cases when you can't have all 16 lanes for the GPU, and these cases are not so rare in latest motherboards.

What cases would that be, other than Intel boards when using a Gen5 NVMe drive, which would probably affect a tiny percentage of users?

But they definitely missed out on the marketing possibilities. Sales of Zen 4 are already terrible. They lined up the naming schemes for Zen 4 and RDNA3, but RDNA3 does not support one of Zen 4's main features, being PCI-E 5.0. So why should anyone buy Zen 4 right now, when they can just wait for lower prices and X3D CPUs? They really shot themselves in the foot.
 
After the 7000 series official announcement, every passing hour, TRUE facts destroys every AMD's nuts high-praised gibberish right into the toliet, even AMD had to come out and refute dumb nuts metrics now against the 4090. :nutkick:
 
Yes but PCIe 5.0 would come in handy in cases when you can't have all 16 lanes for the GPU, and these cases are not so rare in latest motherboards.
Mostly on Intel boards.
With AMD you don't really end up with that dilema, unless you bought a really odd motherboard where the board maker decided to do something stupid.

I get you but a 800/1000£ GPU should never be in anything but the top x16 on a motherboard for typically usage.

And it is rare, no unheard of for a consumer motherboard to have no x16 slot.
Not at all, most Z790 and many Z690 boards split the x16 slot with M.2 slots, just so the boards can be marketed as having PCIe 5.0 M.2 support, just as on AMD boards.
 
let AMD undercut NV by 50% $ for 15% performance in the top, everyone (both 'camps') will be very happy with it.
The only relevant sku for Ada should be 4090/ti. Push the 'un-luche' for 4080 16GB as well, the confusion is still here.
May all high to low tier dominate by Radeon for this gen.
I very much hope AMD will not fallow NV on the race to the top Wattage consumption with 7950XTX with 4/5*8PIN.
Focus on efficiency and cost and you will see your market share grows like never before.
And for heaven's sake, give already us a proper CUDA competitor for video encode and AI computation.

AMD already has a solution to CUDA called HIP. Here is a FAQ from AMD's ROCm documentation:


Is HIP a drop-in replacement for CUDA?

No. HIP provides porting tools which do most of the work to convert CUDA code into portable C++ code that uses the HIP APIs. Most developers will port their code from CUDA to HIP and then maintain the HIP version. HIP code provides the same performance as native CUDA code, plus the benefits of running on AMD platforms.

source: https://rocmdocs.amd.com/en/latest/Programming_Guides/HIP-FAQ.html
 
Mostly on Intel boards.
With AMD you don't really end up with that dilema, unless you bought a really odd motherboard where the board maker decided to do something stupid.


Not at all, most Z790 and many Z690 boards split the x16 slot with M.2 slots, just so the boards can be marketed as having PCIe 5.0 M.2 support, just as on AMD boards.
This modern trend needs to die or I'm not updating my pc, simple.

Though to be fair x8 doesn't loose that much performance.

Also that's not right, you get 1x16 ,1x nvme and the southbridge unless you want two nvme attached direct to CPU no?!.
 
Last edited:
Mostly on Intel boards.
With AMD you don't really end up with that dilema, unless you bought a really odd motherboard where the board maker decided to do something stupid.
You could have 2-3 NVMe drives in the same system. That said, you'd also need a motherboard that only wires 4-8 PCIe5 lanes to the GPU slot, freeing the rest for your SSDs.

And really, we've had this discussion since PCIe2: every time a new PCIe revision comes out, it offers more bandwidth than the GPUs need. It's ok to not use the latest and greatest.

Edit: Also, PCIe5 speeds are useless on SSDs, too. It'll give you (maybe) a higher number for large, sequential operations (which you rarely do), but keep the same numbers for 4k random reads (which you always do). So really, nothing to see here.
 
Last edited:
Completely seperate 1* 5.0 x16 for GPU, 2* 5.0 x4 for NVMe would be the best solution, whether it's AMD's or Intel's boards.
 
Completely seperate 1* 5.0 x16 for GPU, 2* 5.0 x4 for NVMe would be the best solution, whether it's AMD's or Intel's boards.

That is exactly how it is on Zen 4. The CPUs have 28 Gen5 lanes. 16 for GPU, 8 for two NVMe slots and 4 for the chipset link (which actually runs at Gen4 speeds for some reason, maybe future chipsets will have a Gen5 link which will work with current CPUs).
Two NVMe slots are connected to the CPU, all additional slots are connected to the chipset (and limited by the Gen4 CPU-chipset link).

Alder/Raptor Lake only have 16 Gen5 lanes. 4 lanes for NVMe are only Gen4. I expect Meteor Lake will only have Gen5 lanes in the CPU.
 
What cases would that be, other than Intel boards when using a Gen5 NVMe drive, which would probably affect a tiny percentage of users?

But they definitely missed out on the marketing possibilities. Sales of Zen 4 are already terrible. They lined up the naming schemes for Zen 4 and RDNA3, but RDNA3 does not support one of Zen 4's main features, being PCI-E 5.0. So why should anyone buy Zen 4 right now, when they can just wait for lower prices and X3D CPUs? They really shot themselves in the foot.
Correct! You can pay $1000 for an AM5 platform or you can keep the AM4 system and spend $350 on a 5800X3D. You have a top system for gaming (which surpasses the 7600X/7700X) and a decent one for other activities.
 
Back
Top