• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

AMD Granite Ridge "Zen 5" Processor Annotated

btarunr

Editor & Senior Moderator
Staff member
Joined
Oct 9, 2007
Messages
47,307 (7.52/day)
Location
Hyderabad, India
System Name RBMK-1000
Processor AMD Ryzen 7 5700G
Motherboard ASUS ROG Strix B450-E Gaming
Cooling DeepCool Gammax L240 V2
Memory 2x 8GB G.Skill Sniper X
Video Card(s) Palit GeForce RTX 2080 SUPER GameRock
Storage Western Digital Black NVMe 512GB
Display(s) BenQ 1440p 60 Hz 27-inch
Case Corsair Carbide 100R
Audio Device(s) ASUS SupremeFX S1220A
Power Supply Cooler Master MWE Gold 650W
Mouse ASUS ROG Strix Impact
Keyboard Gamdias Hermes E2
Software Windows 11 Pro
High-resolution die-shots of the AMD "Zen 5" 8-core CCD were released and annotated by Nemez, Fitzchens Fitz, and HighYieldYT. These provide a detailed view of how the silicon and its various components appear, particularly the new "Zen 5" CPU core with its 512-bit FPU. The "Granite Ridge" package looks similar to "Raphael," with up to two 8-core CPU complex dies (CCDs) depending on the processor model, and a centrally located client I/O die (cIOD). This cIOD is carried over from "Raphael," which minimizes product development costs for AMD at least for the uncore portion of the processor. The "Zen 5" CCD is built on the TSMC N4P (4 nm) foundry node.

The "Granite Ridge" package sees the up to two "Zen 5" CCDs snuck up closer to each other than the "Zen 4" CCDs on "Raphael." In the picture above, you can see the pad of the absent CCD behind the solder mask of the fiberglass substrate, close to the present CCD. The CCD contains 8 full-sized "Zen 5" CPU cores, each with 1 MB of L2 cache, and a centrally located 32 MB L3 cache that's shared among all eight cores. The only other components are an SMU (system management unit), and the Infinity Fabric over Package (IFoP) PHYs, which connect the CCD to the cIOD.



Each "Zen 5" CPU core is physically larger than the "Zen 4" core (built on the TSMC N5 process), due to its 512-bit floating point data-path. The core's Vector Engine is pushed to the very edge of the core. On the CCD, these should be the edges of the die. FPUs tend to be the hottest components on a CPU core, so this makes sense. The innermost component (facing the shared L3 cache) is the 1 MB L2 cache. AMD has doubled the bandwidth and associativity of this 1 MB L2 cache compared to the one on the "Zen 4" core.

The central region of the "Zen 5" core has the 32 KB L1I cache, 48 KB L1D cache, the Integer Execution engine, and the all important front-end of the processor, with its Instruction Fetch & Decode, the Branch Prediction unit, micro-op cache, and Scheduler.

The 32 MB on-die L3 cache has rows of TSVs (through-silicon vias) that act as provision for stacked 3D V-cache. The 64 MB L3D (L3 cache die) connects with the CCD's ringbus using these TSVs, making the 64 MB 3D V-cache contiguous with the 32 MB on-die L3 cache.

Lastly, there's the client I/O die (cIOD). There's nothing new to report here, the chip is carried over from "Raphael." It is built on the TSMC N6 (6 nm) node. Nearly 1/3rd of the die-area is taken up by the iGPU and its allied components, such as the media acceleration engine, and display engine. The iGPU is based on the RDNA 2 graphics architecture, and has just one workgroup processor (WGP), for two compute units (CU), or 128 stream processors. Other key components on the cIOD are the 28-lane PCIe Gen 5 interface, the two IFoP ports for the CCDs, a fairly large SoC I/O consisting of USB 3.x and legacy connectivity, and the all important DDR5 memory controller with its dual-channel (four sub-channel) memory interface.

View at TechPowerUp Main Site | Source
 

btarunr

Editor & Senior Moderator
Staff member
Joined
Oct 9, 2007
Messages
47,307 (7.52/day)
Location
Hyderabad, India
System Name RBMK-1000
Processor AMD Ryzen 7 5700G
Motherboard ASUS ROG Strix B450-E Gaming
Cooling DeepCool Gammax L240 V2
Memory 2x 8GB G.Skill Sniper X
Video Card(s) Palit GeForce RTX 2080 SUPER GameRock
Storage Western Digital Black NVMe 512GB
Display(s) BenQ 1440p 60 Hz 27-inch
Case Corsair Carbide 100R
Audio Device(s) ASUS SupremeFX S1220A
Power Supply Cooler Master MWE Gold 650W
Mouse ASUS ROG Strix Impact
Keyboard Gamdias Hermes E2
Software Windows 11 Pro
It's time for new cIOd, new IMC, new infinity fabric and faster and wider connection between cIOd and south bridge(MB chipset).
They created the cIOD so it spares them development costs for the uncore for at least 2 generations (worked for Ryzen 3000 and Ryzen 5000).

So, if they stick with AM5 for Zen 6, they might develop a new cIOD. Maybe switch to N5, give it an RDNA 3.5 iGPU, faster memory controllers, and maybe even an NPU.
 
Joined
May 22, 2024
Messages
414 (1.89/day)
System Name Kuro
Processor AMD Ryzen 7 7800X3D@65W
Motherboard MSI MAG B650 Tomahawk WiFi
Cooling Thermalright Phantom Spirit 120 EVO
Memory Corsair DDR5 6000C30 2x48GB (Hynix M)@6000 30-36-36-76 1.36V
Video Card(s) PNY XLR8 RTX 4070 Ti SUPER 16G@200W
Storage Crucial T500 2TB + WD Blue 8TB
Case Lian Li LANCOOL 216
Power Supply MSI MPG A850G
Software Ubuntu 24.04 LTS + Windows 10 Home Build 19045
Benchmark Scores 17761 C23 Multi@65W
Interesting how according to these, the CCD already has two IFoP PHY. Presumably enough to saturate the theoretical bandwidth of dual-channel/quad-sub-channel DDR5-8000, if both are implemented with current sweet spot IF frequency.

Though if things keep going on like this, Zen 6 desktop might well end up getting more than two memory channels if it gets another socket, as long as nature abhors mobile chips significantly more powerful than desktop ones in the same segment like it abhorred vacuum. That is a silver lining of the AI boom and mania.

A Zen 6 on AM5 that scales up to DDR5-8000 and faster would do just fine too. So would a new chipset that runs off PCIe 5.0.
 
Joined
Jan 14, 2019
Messages
12,605 (5.80/day)
Location
Midlands, UK
System Name Nebulon B
Processor AMD Ryzen 7 7800X3D
Motherboard MSi PRO B650M-A WiFi
Cooling be quiet! Dark Rock 4
Memory 2x 24 GB Corsair Vengeance DDR5-4800
Video Card(s) AMD Radeon RX 6750 XT 12 GB
Storage 2 TB Corsair MP600 GS, 2 TB Corsair MP600 R2
Display(s) Dell S3422DWG, 7" Waveshare touchscreen
Case Kolink Citadel Mesh black
Audio Device(s) Logitech Z333 2.1 speakers, AKG Y50 headphones
Power Supply Seasonic Prime GX-750
Mouse Logitech MX Master 2S
Keyboard Logitech G413 SE
Software Bazzite (Fedora Linux) KDE
"Magical SRAM of mystery"... Love it! :roll:
 
Joined
Jul 24, 2024
Messages
301 (1.93/day)
System Name AM4_TimeKiller
Processor AMD Ryzen 5 5600X @ all-core 4.7 GHz
Motherboard ASUS ROG Strix B550-E Gaming
Cooling Arctic Freezer II 420 rev.7 (push-pull)
Memory G.Skill TridentZ RGB, 2x16 GB DDR4, B-Die, 3800 MHz @ CL14-15-14-29-43 1T, 53.2 ns
Video Card(s) ASRock Radeon RX 7800 XT Phantom Gaming
Storage Samsung 990 PRO 1 TB, Kingston KC3000 1 TB, Kingston KC3000 2 TB
Case Corsair 7000D Airflow
Audio Device(s) Creative Sound Blaster X-Fi Titanium
Power Supply Seasonic Prime TX-850
Mouse Logitech wireless mouse
Keyboard Logitech wireless keyboard
Are there really 3x PCIe 5.0 x4? That would mean that Gen 5 CPU-Chipset interconnection is already ready CPU-side.
 
Joined
Jan 3, 2021
Messages
3,616 (2.49/day)
Location
Slovenia
Processor i5-6600K
Motherboard Asus Z170A
Cooling some cheap Cooler Master Hyper 103 or similar
Memory 16GB DDR4-2400
Video Card(s) IGP
Storage Samsung 850 EVO 250GB
Display(s) 2x Oldell 24" 1920x1200
Case Bitfenix Nova white windowless non-mesh
Audio Device(s) E-mu 1212m PCI
Power Supply Seasonic G-360
Mouse Logitech Marble trackball, never had a mouse
Keyboard Key Tronic KT2000, no Win key because 1994
Software Oldwin
Are there really 3x PCIe 5.0 x4? That would mean that Gen 5 CPU-Chipset interconnection is already ready CPU-side.
Yes, that has been the case since the beginning. The ports on the IOD are all Gen 5. But at the other end of the wires there's the 14nm/12nm chipset, and we have seen how great this node is at handling Gen 5. (Think of the Phison E26 SSD controller.)
 
Joined
Apr 1, 2020
Messages
31 (0.02/day)
I think that the original 3D V-cache supported up to 5 layers.
It has (to my knowledge) never been implemented, maybe because the increased costs and minor performance uplift.
 
Joined
May 22, 2024
Messages
414 (1.89/day)
System Name Kuro
Processor AMD Ryzen 7 7800X3D@65W
Motherboard MSI MAG B650 Tomahawk WiFi
Cooling Thermalright Phantom Spirit 120 EVO
Memory Corsair DDR5 6000C30 2x48GB (Hynix M)@6000 30-36-36-76 1.36V
Video Card(s) PNY XLR8 RTX 4070 Ti SUPER 16G@200W
Storage Crucial T500 2TB + WD Blue 8TB
Case Lian Li LANCOOL 216
Power Supply MSI MPG A850G
Software Ubuntu 24.04 LTS + Windows 10 Home Build 19045
Benchmark Scores 17761 C23 Multi@65W
Yes, that has been the case since the beginning. The ports on the IOD are all Gen 5. But at the other end of the wires there's the 14nm/12nm chipset, and we have seen how great this node is at handling Gen 5. (Think of the Phison E26 SSD controller.)
Is that really a node problem or an (admittedly node-dictated) thermal problem, though? To be fair, that 16x PEG bus would mostly never be used to capacity until the next generation of video cards come out, either.

I do agree that there really should be a new chipset next generation.

I think that the original 3D V-cache supported up to 5 layers.
It has (to my knowledge) never been implemented, maybe because the increased costs and minor performance uplift.
Larger cache also means more latency, both at cycle-level and on maximum clock reduction for this sort of setup. I think anything more would probably be a net penalty for most workloads.
 
Joined
Jan 3, 2021
Messages
3,616 (2.49/day)
Location
Slovenia
Processor i5-6600K
Motherboard Asus Z170A
Cooling some cheap Cooler Master Hyper 103 or similar
Memory 16GB DDR4-2400
Video Card(s) IGP
Storage Samsung 850 EVO 250GB
Display(s) 2x Oldell 24" 1920x1200
Case Bitfenix Nova white windowless non-mesh
Audio Device(s) E-mu 1212m PCI
Power Supply Seasonic G-360
Mouse Logitech Marble trackball, never had a mouse
Keyboard Key Tronic KT2000, no Win key because 1994
Software Oldwin
I'm amazed to see how much information these annotators dig up from who knows where. Sure it's possible to recognise repeating structures such as the L3, and count cores and PCIe PHY logic, and even estimate the number of transistors. But how do you identify the "Scaler Unit" or the "L2 iTLB", or even larger units like "Load/Store" without a lot of inside info? I think there's quite a bit of speculation necessary here (not that it hurts anyone).

Interesting how according to these, the CCD already has two IFoP PHY.
This has been inherited from the Zen 4 CCD too. It's for servers. The server I/O die does not have enough IFOP connections though, so a compromise had to be made:
The I/O die used in all 4th Gen AMD EPYC processors has 12 Infinity Fabric connections to CPU dies. Our CPU dies can support one or two connections to the I/O die. In processor models with four CPU dies, two connections can be used to optimize bandwidth to each CPU die. This is the case for some EPYC 9004 Series CPUs and all EPYC 8004 Series CPUs. In processor models with more than four CPU dies, such as in the EPYC 9004 Series, one Infinity Fabric connection ties each CPU die to the I/O die. - source

I think that the original 3D V-cache supported up to 5 layers.
It has (to my knowledge) never been implemented, maybe because the increased costs and minor performance uplift.
TSMC mentioned somewhere (I usually learned that sort of things from Anandtech, now no more) that their glue, the hybrid bonding technology, could be used to stack more than two dies. Memory manufacturers are planning to use it for HBM4 or maybe even HBM3.
 
Joined
Jul 24, 2024
Messages
301 (1.93/day)
System Name AM4_TimeKiller
Processor AMD Ryzen 5 5600X @ all-core 4.7 GHz
Motherboard ASUS ROG Strix B550-E Gaming
Cooling Arctic Freezer II 420 rev.7 (push-pull)
Memory G.Skill TridentZ RGB, 2x16 GB DDR4, B-Die, 3800 MHz @ CL14-15-14-29-43 1T, 53.2 ns
Video Card(s) ASRock Radeon RX 7800 XT Phantom Gaming
Storage Samsung 990 PRO 1 TB, Kingston KC3000 1 TB, Kingston KC3000 2 TB
Case Corsair 7000D Airflow
Audio Device(s) Creative Sound Blaster X-Fi Titanium
Power Supply Seasonic Prime TX-850
Mouse Logitech wireless mouse
Keyboard Logitech wireless keyboard
Is that really a node problem or an (admittedly node-dictated) thermal problem, though? To be fair, that 16x PEG bus would mostly never be used to capacity until the next generation of video cards come out, either.

I do agree that there really should be a new chipset next generation.
6/7 nm is much better in terms of efficiency and thus also thermals. Do you recall X570 with active cooling? That was first AMD's PCIe 4.0 x4 chipset made on 14 nm with around 12W TDP. So, I believe it's true, that with chipset supporting PCIe Gen 5.0 there might be thermal-related difficulties. Just look at those chunks of metal put onto X670(E)/B650(E) chipset to cool that 14W passively. From this point of view, it's better that they didn't release new chipset for X870(E)/B800 boards. From the another point of view, they had 7 nm process at their disposal and they had 2 years to invent a chipset with support for PCIe Gen 5.0 for at least CPU-Chipset interconnection. Yet, for the three generation of AMD chipsets (X570, X670(E), X870(E)), we haven't moved anywhere in terms of this interconnection capabilities. On the top of that, we have moved literally nowhere between X670(E) and X870(E).

As for the PEG bus, those 16 PCIe Gen 5.0 lanes are not strictly intended for GPU usage scenarios only. Another expansion cards benefit from this, e.g. x8 NVMe RAID cards. Or you may have 2 GPUs with unlimited bandwidth (bus-wise) even for upcoming few years. Although, having 2 GPUs is a luxury nowadays, especially in terms of requirements for power (PSU) and space (case).

Larger cache also means more latency, both at cycle-level and on maximum clock reduction for this sort of setup. I think anything more would probably be a net penalty for most workloads.
Not so much. Have a look at 7800X3D or 5800X3D. Their biggest penalty is not in latency but in lower clocks (compared to regular non-3DX counterparts). While those few hundred MHz lower clocks don't matter much at games, they have noticeably impact in applications.
 
Joined
Apr 1, 2020
Messages
31 (0.02/day)
I'm amazed to see how much information these annotators dig up from who knows where. Sure it's possible to recognise repeating structures such as the L3, and count cores and PCIe PHY logic, and even estimate the number of transistors. But how do you identify the "Scaler Unit" or the "L2 iTLB", or even larger units like "Load/Store" without a lot of inside info? I think there's quite a bit of speculation necessary here (not that it hurts anyone).


This has been inherited from the Zen 4 CCD too. It's for servers. The server I/O die does not have enough IFOP connections though, so a compromise had to be made:



TSMC mentioned somewhere (I usually learned that sort of things from Anandtech, now no more) that their glue, the hybrid bonding technology, could be used to stack more than two dies. Memory manufacturers are planning to use it for HBM4 or maybe even HBM3.
1728299223127.png
 
Joined
Aug 25, 2021
Messages
1,183 (0.97/day)
Interesting how according to these, the CCD already has two IFoP PHY. Presumably enough to saturate the theoretical bandwidth of dual-channel/quad-sub-channel DDR5-8000, if both are implemented with current sweet spot IF frequency.

Though if things keep going on like this, Zen 6 desktop might well end up getting more than two memory channels if it gets another socket, as long as nature abhors mobile chips significantly more powerful than desktop ones in the same segment like it abhorred vacuum. That is a silver lining of the AI boom and mania.

A Zen 6 on AM5 that scales up to DDR5-8000 and faster would do just fine too. So would a new chipset that runs off PCIe 5.0.
The second IF port is more for inter-chiplet communication.
Four channels are very unlikely for desktop.
They can introduce a new IOD and upgraded chipset, most probably will.

EDIT: To be more precise in wording, second GMI increases the bandwidth from 36 GB/s to 72 GB/s and thus allows more data to flow between chiplets via IF.
 
Last edited:
Joined
Jul 24, 2024
Messages
301 (1.93/day)
System Name AM4_TimeKiller
Processor AMD Ryzen 5 5600X @ all-core 4.7 GHz
Motherboard ASUS ROG Strix B550-E Gaming
Cooling Arctic Freezer II 420 rev.7 (push-pull)
Memory G.Skill TridentZ RGB, 2x16 GB DDR4, B-Die, 3800 MHz @ CL14-15-14-29-43 1T, 53.2 ns
Video Card(s) ASRock Radeon RX 7800 XT Phantom Gaming
Storage Samsung 990 PRO 1 TB, Kingston KC3000 1 TB, Kingston KC3000 2 TB
Case Corsair 7000D Airflow
Audio Device(s) Creative Sound Blaster X-Fi Titanium
Power Supply Seasonic Prime TX-850
Mouse Logitech wireless mouse
Keyboard Logitech wireless keyboard
The second IF port is for inter-chiplet communication.
Then, what is that 3rd PCIe Gen 5.0 x4 used for? Two are used for M.2 NVMe, that's pretty clear.

EDIT: Sorry, my bad, I read it as "for inter chipset communication". Everything clear now.
 
Last edited:
Joined
Jan 3, 2021
Messages
3,616 (2.49/day)
Location
Slovenia
Processor i5-6600K
Motherboard Asus Z170A
Cooling some cheap Cooler Master Hyper 103 or similar
Memory 16GB DDR4-2400
Video Card(s) IGP
Storage Samsung 850 EVO 250GB
Display(s) 2x Oldell 24" 1920x1200
Case Bitfenix Nova white windowless non-mesh
Audio Device(s) E-mu 1212m PCI
Power Supply Seasonic G-360
Mouse Logitech Marble trackball, never had a mouse
Keyboard Key Tronic KT2000, no Win key because 1994
Software Oldwin
The second IF port is for inter-chiplet communication.
Are you sure about that? I remember one detail from Zen 4 Epyc block diagrams: there are no CCD-to-CCD interconnects in 8- and 12-CCD processors. One port from each CCD remains unused. I was wondering why AMD didn't use the remaining ports for what you're implying. I had to assume the CCDs don't include the switching logic to make use of that.
 
Joined
Oct 30, 2008
Messages
1,768 (0.30/day)
System Name Lailalo
Processor Ryzen 9 5900X Boosts to 4.95Ghz
Motherboard Asus TUF Gaming X570-Plus (WIFI
Cooling Noctua
Memory 32GB DDR4 3200 Corsair Vengeance
Video Card(s) XFX 7900XT 20GB
Storage Samsung 970 Pro Plus 1TB, Crucial 1TB MX500 SSD, Segate 3TB
Display(s) LG Ultrawide 29in @ 2560x1080
Case Coolermaster Storm Sniper
Power Supply XPG 1000W
Mouse G602
Keyboard G510s
Software Windows 10 Pro / Windows 10 Home
Great but, at this point AMD has a problem because consumers are starting to get wise to their 3D cache releases. People aren't as interested in the initial release because they know the 3D cache version is coming which will blow it out of the water. They've been pumping a lot into trying to make the 9000 series seem interesting but the core issue is still there in the minds.
 
Joined
Jan 14, 2019
Messages
12,605 (5.80/day)
Location
Midlands, UK
System Name Nebulon B
Processor AMD Ryzen 7 7800X3D
Motherboard MSi PRO B650M-A WiFi
Cooling be quiet! Dark Rock 4
Memory 2x 24 GB Corsair Vengeance DDR5-4800
Video Card(s) AMD Radeon RX 6750 XT 12 GB
Storage 2 TB Corsair MP600 GS, 2 TB Corsair MP600 R2
Display(s) Dell S3422DWG, 7" Waveshare touchscreen
Case Kolink Citadel Mesh black
Audio Device(s) Logitech Z333 2.1 speakers, AKG Y50 headphones
Power Supply Seasonic Prime GX-750
Mouse Logitech MX Master 2S
Keyboard Logitech G413 SE
Software Bazzite (Fedora Linux) KDE
Then, what is that 3rd PCIe Gen 5.0 x4 used for? Two are used for M.2 NVMe, that's pretty clear.
For communication with the chipset.
 
Joined
May 22, 2010
Messages
399 (0.07/day)
Processor R7-7700X
Motherboard Gigabyte X670 Aorus Elite AX
Cooling Scythe Fuma 2 rev B
Memory no name DDR5-5200
Video Card(s) Some 3080 10GB
Storage dual Intel DC P4610 1.6TB
Display(s) Gigabyte G34MQ + Dell 2708WFP
Case Lian-Li Lancool III black no rgb
Power Supply CM UCP 750W
Software Win 10 Pro x64
Are you sure about that? I remember one detail from Zen 4 Epyc block diagrams: there are no CCD-to-CCD interconnects in 8- and 12-CCD processors. One port from each CCD remains unused. I was wondering why AMD didn't use the remaining ports for what you're implying. I had to assume the CCDs don't include the switching logic to make use of that.
Indeed it is so, AMD does NOt have inter-chiplet comms, everything passes through the IOD and IF.

and yes, i've always said that the client processors not using wide GMI3 is a waste of performance where it's most needed (as AMD is very sensitive to RAM BW), specially on single-die models they could use both IFOP links on the IOD and the CCD, and only one per CCD for the dual-die models, after all you already have the links there, BUT it would require a different substrate for both models...
 
Joined
Sep 9, 2017
Messages
246 (0.09/day)
System Name B20221017 Pro SP1 R2 Gaming Edition
Processor AMD Ryzen 7900X3D
Motherboard Asus ProArt X670E-Creator
Cooling NZXT Kraken Z73
Memory G.Skill Trident Z DDR5-6000 CL30 64GB
Video Card(s) NVIDIA RTX 3090 Founders Edition
Storage Samsung 980 Pro 2TB + Samsung 870 Evo 4TB
Display(s) Samsung CF791 Curved Ultrawide
Case NZXT H7 Flow
Power Supply Corsair HX1000i
VR HMD Meta Quest 3
Software Windows 11
I don't know why, but it's still so weird to me that the GPU is in the I/O die and separate from the CCD.

Brilliant stuff.
 
Joined
Dec 12, 2016
Messages
1,958 (0.67/day)
It looks like changes to the TSVs will allow more and stacked cache. It is possible that AMD will move to an L3-cacheless CCD design with all of it coming from the stacked cache. If they can fit the TSVs into the dense versions, we are looking at a lot of freed up real estate on a dense CCD chiplet. AMD might be abandoning clock speed increases (even resetting them to below 5 GHz much like when Pentium M reset clocks after the Netburst era) and going for high core counts, large/stacked L3 cache sizes and continued IPC increases while maintaining power budgets at the same/current level.

I welcome this approach if that's what happens.
 
Joined
May 22, 2010
Messages
399 (0.07/day)
Processor R7-7700X
Motherboard Gigabyte X670 Aorus Elite AX
Cooling Scythe Fuma 2 rev B
Memory no name DDR5-5200
Video Card(s) Some 3080 10GB
Storage dual Intel DC P4610 1.6TB
Display(s) Gigabyte G34MQ + Dell 2708WFP
Case Lian-Li Lancool III black no rgb
Power Supply CM UCP 750W
Software Win 10 Pro x64
It looks like changes to the TSVs will allow more and stacked cache. It is possible that AMD will move to an L3-cacheless CCD design with all of it coming from the stacked cache. If they can fit the TSVs into the dense versions, we are looking at a lot of freed up real estate on a dense CCD chiplet. AMD might be abandoning clock speed increases (even resetting them to below 5 GHz much like when Pentium M reset clocks after the Netburst era) and going for high core counts, large/stacked L3 cache sizes and continued IPC increases while maintaining power budgets at the same/current level.

I welcome this approach if that's what happens.
I also always wondered with the chiplet design is that they could make what is essentially a full CCD-sized SRAM(or superfast DRAM) die and place it there as a monster L4/system cache, you could easily fit 512MB+ in that size, the problem is that it would be connected through IF which is "slow" and would need extra IF ports on the IOD....

food for thought
 
Joined
Dec 26, 2020
Messages
382 (0.26/day)
System Name Incomplete thing 1.0
Processor Ryzen 2600
Motherboard B450 Aorus Elite
Cooling Gelid Phantom Black
Memory HyperX Fury RGB 3200 CL16 16GB
Video Card(s) Gigabyte 2060 Gaming OC PRO
Storage Dual 1TB 970evo
Display(s) AOC G2U 1440p 144hz, HP e232
Case CM mb511 RGB
Audio Device(s) Reloop ADM-4
Power Supply Sharkoon WPM-600
Mouse G502 Hero
Keyboard Sharkoon SGK3 Blue
Software W10 Pro
Benchmark Scores 2-5% over stock scores
Very interesting, seems like they optimized the die's very well this time. only problem is the aged and slow way they are connected. Why are the two dies so far from eachother when it would seem be faster and more efficient to be close... On TR/Epyc it's acceptable because of the heat and the much more capable IO die.
 
Joined
Aug 25, 2021
Messages
1,183 (0.97/day)
Are you sure about that? I remember one detail from Zen 4 Epyc block diagrams: there are no CCD-to-CCD interconnects in 8- and 12-CCD processors. One port from each CCD remains unused. I was wondering why AMD didn't use the remaining ports for what you're implying. I had to assume the CCDs don't include the switching logic to make use of that.
Indeed it is so, AMD does NOt have inter-chiplet comms, everything passes through the IOD and IF.

and yes, i've always said that the client processors not using wide GMI3 is a waste of performance where it's most needed (as AMD is very sensitive to RAM BW), specially on single-die models they could use both IFOP links on the IOD and the CCD, and only one per CCD for the dual-die models, after all you already have the links there, BUT it would require a different substrate for both models...

There are no direct chiplet-to-chiplet interconnects, that is correct. Everything goes through IF/IOD. I should have been more explicit in wording, but replied quickly, on-the-go.

EPYC processors with 4 or fewer chiplets use both GMI links (wide GMI) to increase the bandwidth from 36 GB/s to 72 GB/s (page 11 of the file attached). By analogy, that is the case for Ryzen processors too. On the image below, both wide GMI3 links on both chiplets connect to two GMI ports on IOD, two links (wide GMI) from chiplet 1 to GMI3 port 0 and another two links (wide GMI) from chiplet 2 to GMI port 1 on IOD. We can see four clusters of links.

We do not have a shot of a single chiplet CPU that exposes GMI link, but the principle should be the same, aka IF bandwidth should be 72 GB/s, like on EPYCs with four and fewer chiplets, and not 36 GB/s.
Screenshot 2024-10-06 at 15-09-55 ZEN 5 has a 3D V-Cache Secret - YouTube.png
AMD AM5 Z4 IO die .jpg

* from page 11
INTERNAL INFINITY FABRIC INTERFACES connect the I/O die with each CPU die using 36 Gb/s Infinity Fabric links. (This is known internally as the Global Memory Interface [GMI] and is labeled this way in many figures.) In EPYC 9004 and 8004 Series processors with four or fewer CPU dies, two links connect to each CPU die for up to 72 Gb/s of connectivity
 

Attachments

  • AMD EPYC Z4 .pdf
    8.3 MB · Views: 40
Top