Monday, November 7th 2022
AMD RDNA3 Navi 31 GPU Block Diagram Leaked, Confirmed to be PCIe Gen 4
An alleged leaked company slide details AMD's upcoming 5 nm "Navi 31" GPU powering the next-generation Radeon RX 7900 XTX and RX 7900 XT graphics cards. The slide details the "Navi 31" MCM, with its central graphics compute die (GCD) chiplet that's built on the 5 nm EUV silicon fabrication process, surrounded by six memory cache dies (MCDs), each built on the 6 nm process. The GCD interfaces with the system over a PCI-Express 4.0 x16 host interface. It features the latest-generation multimedia engine with dual-stream encoders; and the new Radiance display engine with DisplayPort 2.1 and HDMI 2.1a support. Custom interconnects tie it with the six MCDs.
Each MCD has 16 MB of Infinity Cache (L3 cache); and a 64-bit GDDR6 memory interface (two 32-bit GDDR6 paths). Six of these add up to the GPU's 384-bit GDDR6 memory interface. In the scheme of things, the GPU has a contiguous and monolithic 384-bit wide memory bus, because every modern GPU uses multiple on-die memory controllers to achieve a wide memory bus. "Navi 31" hence has a total Infinity Cache size of 96 MB—which may be less in comparison to the 128 MB on "Navi 21," but AMD has shored up cache sizes across the GPU. The L0 caches on the compute units is now increased numerically by 240%. The L1 caches by 300%, and the L2 cache shared among the shader engines, by 50%. The RX 7900 XTX is confirmed to use 20 Gbps GDDR6 memory in this slide, for 960 GB/s of memory bandwidth.The GCD features six Shader Engines, each with 16 compute units (or 8 dual compute units), which work out to 1,024 stream processors. AMD claims to have doubled the IPC of these stream processors over RDNA2. The new RDNA3 ALUs also support BF16 instructions. The SIMD engine of "Navi 31" has an FP32 throughput of 61.6 TFLOP/s, a 168% increase over the 23 TFLOP/s of the "Navi 21." The slide doesn't quite detail the new Ray Tracing engine, but references new RT features, larger caches, 50% higher ray intersection rate, for an up to 1.8X RT performance increase at 2.505 GHz engine clocks; over the RX 6950 XT. There are other major upgrades to the GPU's raster 3D capabilities, including a 50% increase in prim/clk rates, and 100% increase in prim/vertex cull rates. The pixel pipeline sees similar 50% increases in rasterized prims/clock and pixels/clock; and synchronous pixel-wait.
Source:
VideoCardz
Each MCD has 16 MB of Infinity Cache (L3 cache); and a 64-bit GDDR6 memory interface (two 32-bit GDDR6 paths). Six of these add up to the GPU's 384-bit GDDR6 memory interface. In the scheme of things, the GPU has a contiguous and monolithic 384-bit wide memory bus, because every modern GPU uses multiple on-die memory controllers to achieve a wide memory bus. "Navi 31" hence has a total Infinity Cache size of 96 MB—which may be less in comparison to the 128 MB on "Navi 21," but AMD has shored up cache sizes across the GPU. The L0 caches on the compute units is now increased numerically by 240%. The L1 caches by 300%, and the L2 cache shared among the shader engines, by 50%. The RX 7900 XTX is confirmed to use 20 Gbps GDDR6 memory in this slide, for 960 GB/s of memory bandwidth.The GCD features six Shader Engines, each with 16 compute units (or 8 dual compute units), which work out to 1,024 stream processors. AMD claims to have doubled the IPC of these stream processors over RDNA2. The new RDNA3 ALUs also support BF16 instructions. The SIMD engine of "Navi 31" has an FP32 throughput of 61.6 TFLOP/s, a 168% increase over the 23 TFLOP/s of the "Navi 21." The slide doesn't quite detail the new Ray Tracing engine, but references new RT features, larger caches, 50% higher ray intersection rate, for an up to 1.8X RT performance increase at 2.505 GHz engine clocks; over the RX 6950 XT. There are other major upgrades to the GPU's raster 3D capabilities, including a 50% increase in prim/clk rates, and 100% increase in prim/vertex cull rates. The pixel pipeline sees similar 50% increases in rasterized prims/clock and pixels/clock; and synchronous pixel-wait.
79 Comments on AMD RDNA3 Navi 31 GPU Block Diagram Leaked, Confirmed to be PCIe Gen 4
Don't really think this deserves a news headline in the front page.
And isn't the shader running slower this time around ?, (2.3). Be interesting what AIB's will end up doing with it.
I thought the 7900 XTX used 24 GB of DDR6. Slide shows up to 24. Think this might need edited.
Ah well, whatever. I know what they mean. I just... thought they changed the name already.
The only relevant sku for Ada should be 4090/ti. Push the 'un-luche' for 4080 16GB as well, the confusion is still here.
May all high to low tier dominate by Radeon for this gen.
I very much hope AMD will not fallow NV on the race to the top Wattage consumption with 7950XTX with 4/5*8PIN.
Focus on efficiency and cost and you will see your market share grows like never before.
And for heaven's sake, give already us a proper CUDA competitor for video encode and AI computation.
1. Passed on the cost savings to consumers (prices are same for xtx, and 7900xt is worse cu count % wise to top sku than 6800xt was to 6900xt)
2. Actually used the chiplets to have more cu, instead of just putting memory controllers on there. Thereby maybe actually competing in RTRT performance or vs the 4090.
AMD is neither taking advantage of their cost savings and going the value route, or competing vs their competition's best. They're releasing a product months later, with zero performance advantages vs the competition? Potential 4080ti will still be faster and they've just given nvidia free reign (once again) to charge whatever they want for flagships since there's zero competition.
7950xtx could have two 7900xtx dies, but I doubt it
the RX 6950 has 320 Texture units along with 128 R.O.P's, which gives it 739.2 GTexel/s at 2100mhz-2300mhz
While the RX 7900 XTX has 384 Texture units along with 192 R.O.P's
The Texture fill rate should be 887.04GTexel/s, not 961.9 GTexel/s There is no way this GPU is a that.
^ this part should be bigger. They double Instructions in preclock.
I feel like the should have just double the RT core count in the compute part instead of increasing Shader count by 2.7 times.
And it is rare, no unheard of for a consumer motherboard to have no x16 slot.
But they definitely missed out on the marketing possibilities. Sales of Zen 4 are already terrible. They lined up the naming schemes for Zen 4 and RDNA3, but RDNA3 does not support one of Zen 4's main features, being PCI-E 5.0. So why should anyone buy Zen 4 right now, when they can just wait for lower prices and X3D CPUs? They really shot themselves in the foot.
With AMD you don't really end up with that dilema, unless you bought a really odd motherboard where the board maker decided to do something stupid. Not at all, most Z790 and many Z690 boards split the x16 slot with M.2 slots, just so the boards can be marketed as having PCIe 5.0 M.2 support, just as on AMD boards.
[URL='https://rocmdocs.amd.com/en/latest/Programming_Guides/HIP-FAQ.html#id7']Is HIP a drop-in replacement for CUDA?[/URL]
No. HIP provides porting tools which do most of the work to convert CUDA code into portable C++ code that uses the HIP APIs. Most developers will port their code from CUDA to HIP and then maintain the HIP version. HIP code provides the same performance as native CUDA code, plus the benefits of running on AMD platforms.source: rocmdocs.amd.com/en/latest/Programming_Guides/HIP-FAQ.html
Though to be fair x8 doesn't loose that much performance.
Also that's not right, you get 1x16 ,1x nvme and the southbridge unless you want two nvme attached direct to CPU no?!.
And really, we've had this discussion since PCIe2: every time a new PCIe revision comes out, it offers more bandwidth than the GPUs need. It's ok to not use the latest and greatest.
Edit: Also, PCIe5 speeds are useless on SSDs, too. It'll give you (maybe) a higher number for large, sequential operations (which you rarely do), but keep the same numbers for 4k random reads (which you always do). So really, nothing to see here.
Two NVMe slots are connected to the CPU, all additional slots are connected to the chipset (and limited by the Gen4 CPU-chipset link).
Alder/Raptor Lake only have 16 Gen5 lanes. 4 lanes for NVMe are only Gen4. I expect Meteor Lake will only have Gen5 lanes in the CPU.
As AMD has two times x4 "spare" PCIe lanes (plus four to the chipset), it's not an issue on most AM5 boards, but there are some odd boards still split the PCIe 5.0 lanes to the "GPU" and use them for M.2 slots. Yes, x8 PCIe 5.0 lanes are in theory x16 PCIe 4.0 lanes, the issue is that a x16 PCIe 4.0 card, ends up as a PCIe 4.0 x8 card in a PCIe 5.0 x8 slot.
The PCIe spec doesn't allow for eight lanes to magically end up being 16 or 32 lanes of a lower speed grade, it requires an additional, costly chip.
As for the slot mix, with bifurcation it's not a problem to split the lanes on the fly, so as long as you don't use the M.2 slots, your x16 slot remains x16. AMD allows that, Intel does not as yet.