• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

AMD Granite Ridge "Zen 5" Processor Annotated

Joined
Jan 3, 2021
Messages
3,686 (2.51/day)
Location
Slovenia
Processor i5-6600K
Motherboard Asus Z170A
Cooling some cheap Cooler Master Hyper 103 or similar
Memory 16GB DDR4-2400
Video Card(s) IGP
Storage Samsung 850 EVO 250GB
Display(s) 2x Oldell 24" 1920x1200
Case Bitfenix Nova white windowless non-mesh
Audio Device(s) E-mu 1212m PCI
Power Supply Seasonic G-360
Mouse Logitech Marble trackball, never had a mouse
Keyboard Key Tronic KT2000, no Win key because 1994
Software Oldwin
Very interesting, seems like they optimized the die's very well this time. only problem is the aged and slow way they are connected. Why are the two dies so far from eachother when it would seem be faster and more efficient to be close... On TR/Epyc it's acceptable because of the heat and the much more capable IO die.
Density of wires. You can see part of the complexity in the pic that @Tek-Check attached. Here you see one layer but the wires are spread across several layers.

That's also the probable reason why AMD couldn't move the IOD farther to the edge, and CCDs closer to the centre. There are just too many wires for signals running from the IOD to the contacts at the other side of the substrate. The 28 PCIe lanes take four wires each, for example.
 
Joined
May 3, 2018
Messages
2,881 (1.18/day)
They created the cIOD so it spares them development costs for the uncore for at least 2 generations (worked for Ryzen 3000 and Ryzen 5000).

So, if they stick with AM5 for Zen 6, they might develop a new cIOD. Maybe switch to N5, give it an RDNA 3.5 iGPU, faster memory controllers, and maybe even an NPU.
Strix Halo is getting new cIOD made on 3nm. One would expect Zen 6 to do so too. Zen 6 is about about fixing all the failings with the current design, including high ccd-to-ccd core latency.
 
Joined
May 22, 2024
Messages
417 (1.77/day)
System Name Kuro
Processor AMD Ryzen 7 7800X3D@65W
Motherboard MSI MAG B650 Tomahawk WiFi
Cooling Thermalright Phantom Spirit 120 EVO
Memory Corsair DDR5 6000C30 2x48GB (Hynix M)@6000 30-36-36-76 1.36V
Video Card(s) PNY XLR8 RTX 4070 Ti SUPER 16G@200W
Storage Crucial T500 2TB + WD Blue 8TB
Case Lian Li LANCOOL 216
Power Supply MSI MPG A850G
Software Ubuntu 24.04 LTS + Windows 10 Home Build 19045
Benchmark Scores 17761 C23 Multi@65W
Strix Halo is getting new cIOD made on 3nm. One would expect Zen 6 to do so too. Zen 6 is about about fixing all the failings with the current design, including high ccd-to-ccd core latency.
Strix Halo is arguably closer to a respectably capable GPU that happened to have CPU chiplets hanging off the bus, than a regular CPU. Zen 5 CCD-to-CCD latency regression is said to have been restored to Zen 4 level in the latest firmware, though it remains to be seen if Zen 6 would do better, without trade-offs elsewhere.

They could well implement some IBM-like evict-to-other-chiplet virtual-L4-cache scheme, if they could do significantly better than memory latency with that. DRAM latency is only going to get worse.
 
Joined
Oct 30, 2020
Messages
298 (0.19/day)
Location
Toronto
System Name GraniteXT
Processor Ryzen 9950X
Motherboard ASRock B650M-HDV
Cooling 2x360mm custom loop
Memory 2x24GB Team Xtreem DDR5-8000 [M die]
Video Card(s) RTX 3090 FE underwater
Storage Intel P5800X 800GB + Samsung 980 Pro 2TB
Display(s) MSI 342C 34" OLED
Case O11D Evo RGB
Audio Device(s) DCA Aeon 2 w/ SMSL M200/SP200
Power Supply Superflower Leadex VII XG 1300W
Mouse Razer Basilisk V3
Keyboard Steelseries Apex Pro V2 TKL
Strix Halo is getting new cIOD made on 3nm. One would expect Zen 6 to do so too. Zen 6 is about about fixing all the failings with the current design, including high ccd-to-ccd core latency.

CCD to CCD latency doesn't really matter though and is always something you wish to avoid anyway, but CCD to IOD does matter. The former is already fixed and back to Zen 4 numbers and was a simple power saving thing they turned off.

Zen 6 will get a new IOD which will help the IO bottlenecks.
 

Igb

Joined
Sep 27, 2013
Messages
17 (0.00/day)
In the image of the article I count 7 cores annotated. Is that… correct?

For me it does not make sense. I don’t think I saw a 7 core sku ever, and even if they eventually release one it will be for sure a 8 core CCD with one disabled.
 
Joined
Jan 14, 2019
Messages
13,338 (6.09/day)
Location
Midlands, UK
Processor Various Intel and AMD CPUs
Motherboard Micro-ATX and mini-ITX
Cooling Yes
Memory Overclocking is overrated
Video Card(s) Various Nvidia and AMD GPUs
Storage A lot
Display(s) Monitors and TVs
Case The smaller the better
Audio Device(s) Speakers and headphones
Power Supply 300 to 750 W, bronze to gold
Mouse Wireless
Keyboard Mechanic
VR HMD Not yet
Software Linux gaming master race
In the image of the article I count 7 cores annotated. Is that… correct?

For me it does not make sense. I don’t think I saw a 7 core sku ever, and even if they eventually release one it will be for sure a 8 core CCD with one disabled.
It's 8 cores, with the first core annotated. Here you go:
o8fTKzZg4LzwrEPN.jpg
 
Joined
Jan 3, 2021
Messages
3,686 (2.51/day)
Location
Slovenia
Processor i5-6600K
Motherboard Asus Z170A
Cooling some cheap Cooler Master Hyper 103 or similar
Memory 16GB DDR4-2400
Video Card(s) IGP
Storage Samsung 850 EVO 250GB
Display(s) 2x Oldell 24" 1920x1200
Case Bitfenix Nova white windowless non-mesh
Audio Device(s) E-mu 1212m PCI
Power Supply Seasonic G-360
Mouse Logitech Marble trackball, never had a mouse
Keyboard Key Tronic KT2000, no Win key because 1994
Software Oldwin
CCD to CCD latency doesn't really matter though and is always something you wish to avoid anyway, but CCD to IOD does matter. The former is already fixed and back to Zen 4 numbers and was a simple power saving thing they turned off.
CCD to CCD latency is fixed but still huge. It's really hard to understand where those ~80 ns come from. There must be some very complex switching logic and cache coherency logic on the IOD, or something.
CCD to IOD latency can't be observed directly but CCD to RAM latency seems fine, no issues here (68 ns in AIDA64 on launch day review).
Zen 6 will get a new IOD which will help the IO bottlenecks.
Hopefully. That should be high on AMD's priority list. Another possibility for improvement would be a direct CCD to CCD connection in addition to the existing ones.
 
Joined
Jan 14, 2019
Messages
13,338 (6.09/day)
Location
Midlands, UK
Processor Various Intel and AMD CPUs
Motherboard Micro-ATX and mini-ITX
Cooling Yes
Memory Overclocking is overrated
Video Card(s) Various Nvidia and AMD GPUs
Storage A lot
Display(s) Monitors and TVs
Case The smaller the better
Audio Device(s) Speakers and headphones
Power Supply 300 to 750 W, bronze to gold
Mouse Wireless
Keyboard Mechanic
VR HMD Not yet
Software Linux gaming master race
CCD to CCD latency is fixed but still huge. It's really hard to understand where those ~80 ns come from. There must be some very complex switching logic and cache coherency logic on the IOD, or something.
CCD to IOD latency can't be observed directly but CCD to RAM latency seems fine, no issues here (68 ns in AIDA64 on launch day review).
I assume that the second interconnect on the CCD is only used on Epyc, but not on Ryzen. So, inter-CCD communication on Ryzen is still done via the IO die.
 
Joined
Jan 3, 2021
Messages
3,686 (2.51/day)
Location
Slovenia
Processor i5-6600K
Motherboard Asus Z170A
Cooling some cheap Cooler Master Hyper 103 or similar
Memory 16GB DDR4-2400
Video Card(s) IGP
Storage Samsung 850 EVO 250GB
Display(s) 2x Oldell 24" 1920x1200
Case Bitfenix Nova white windowless non-mesh
Audio Device(s) E-mu 1212m PCI
Power Supply Seasonic G-360
Mouse Logitech Marble trackball, never had a mouse
Keyboard Key Tronic KT2000, no Win key because 1994
Software Oldwin
I assume that the second interconnect on the CCD is only used on Epyc, but not on Ryzen. So, inter-CCD communication on Ryzen is still done via the IO die.
There are no direct CCD-to-CCD links in any of the AMD processors.
 
Joined
Jan 14, 2019
Messages
13,338 (6.09/day)
Location
Midlands, UK
Processor Various Intel and AMD CPUs
Motherboard Micro-ATX and mini-ITX
Cooling Yes
Memory Overclocking is overrated
Video Card(s) Various Nvidia and AMD GPUs
Storage A lot
Display(s) Monitors and TVs
Case The smaller the better
Audio Device(s) Speakers and headphones
Power Supply 300 to 750 W, bronze to gold
Mouse Wireless
Keyboard Mechanic
VR HMD Not yet
Software Linux gaming master race
There are no direct CCD-to-CCD links in any of the AMD processors.
So what's this?

o8fTKzZg4LzwrEPN.jpg


As I understand it, there's one "IFOP PHY" responsible for communication with the IO die, and the other one is inactive on Ryzen. Or is it some kind of double-wide communication bus with the Epyc IO die?
 
Joined
Jan 3, 2021
Messages
3,686 (2.51/day)
Location
Slovenia
Processor i5-6600K
Motherboard Asus Z170A
Cooling some cheap Cooler Master Hyper 103 or similar
Memory 16GB DDR4-2400
Video Card(s) IGP
Storage Samsung 850 EVO 250GB
Display(s) 2x Oldell 24" 1920x1200
Case Bitfenix Nova white windowless non-mesh
Audio Device(s) E-mu 1212m PCI
Power Supply Seasonic G-360
Mouse Logitech Marble trackball, never had a mouse
Keyboard Key Tronic KT2000, no Win key because 1994
Software Oldwin
EPYC processors with 4 or fewer chiplets use both GMI links (wide GMI) to increase the bandwidth from 36 GB/s to 72 GB/s (page 11 of the file attached). By analogy, that is the case for Ryzen processors too. On the image below, both wide GMI3 links on both chiplets connect to two GMI ports on IOD, two links (wide GMI) from chiplet 1 to GMI3 port 0 and another two links (wide GMI) from chiplet 2 to GMI port 1 on IOD. We can see four clusters of links.
That IFOP or GMI is still a bit of a mystery, with too little documentation available (some is here). May I ask you to do a tek-check of the data I compiled, calculated and listed here below?

A single (not wide) IFOP (GMI) interface in the Zen 4 architecture has:

- a 32-bit wide bus in each direction (single-ended, 1 wire per bit)
- 3000 MHz default clock in Ryzen CPUs (2250 MHz in Epyc CPUs)
- quad data rate transfers, which calculates to...
- 12 GT/s in Ryzen (9 GT/s in Epyc)
- 48 GB/s per direction in Ryzen (36 GB/s in Epyc)
 
Joined
May 22, 2010
Messages
402 (0.08/day)
Processor R7-7700X
Motherboard Gigabyte X670 Aorus Elite AX
Cooling Scythe Fuma 2 rev B
Memory no name DDR5-5200
Video Card(s) Some 3080 10GB
Storage dual Intel DC P4610 1.6TB
Display(s) Gigabyte G34MQ + Dell 2708WFP
Case Lian-Li Lancool III black no rgb
Power Supply CM UCP 750W
Software Win 10 Pro x64
So what's this?

View attachment 366587

As I understand it, there's one "IFOP PHY" responsible for communication with the IO die, and the other one is inactive on Ryzen. Or is it some kind of double-wide communication bus with the Epyc IO die?
indeed it is so, AFAIK client Ryzen does not use wide GMI, only one IFOP per die, i haven't seen any documentation that says they use wide GMI

There are no direct chiplet-to-chiplet interconnects, that is correct. Everything goes through IF/IOD. I should have been more explicit in wording, but replied quickly, on-the-go.

EPYC processors with 4 or fewer chiplets use both GMI links (wide GMI) to increase the bandwidth from 36 GB/s to 72 GB/s (page 11 of the file attached). By analogy, that is the case for Ryzen processors too. On the image below, both wide GMI3 links on both chiplets connect to two GMI ports on IOD, two links (wide GMI) from chiplet 1 to GMI3 port 0 and another two links (wide GMI) from chiplet 2 to GMI port 1 on IOD. We can see four clusters of links.

We do not have a shot of a single chiplet CPU that exposes GMI link, but the principle should be the same, aka IF bandwidth should be 72 GB/s, like on EPYCs with four and fewer chiplets, and not 36 GB/s.

* from page 11
INTERNAL INFINITY FABRIC INTERFACES connect the I/O die with each CPU die using 36 Gb/s Infinity Fabric links. (This is known internally as the Global Memory Interface [GMI] and is labeled this way in many figures.) In EPYC 9004 and 8004 Series processors with four or fewer CPU dies, two links connect to each CPU die for up to 72 Gb/s of connectivity
i don't think client ryzen uses wide GMI, that's only reserved for special EPYC
 
Joined
Jun 2, 2017
Messages
9,558 (3.44/day)
System Name Best AMD Computer
Processor AMD 7900X3D
Motherboard Asus X670E E Strix
Cooling In Win SR36
Memory GSKILL DDR5 32GB 5200 30
Video Card(s) Sapphire Pulse 7900XT (Watercooled)
Storage Corsair MP 700, Seagate 530 2Tb, Adata SX8200 2TBx2, Kingston 2 TBx2, Micron 8 TB, WD AN 1500
Display(s) GIGABYTE FV43U
Case Corsair 7000D Airflow
Audio Device(s) Corsair Void Pro, Logitch Z523 5.1
Power Supply Deepcool 1000M
Mouse Logitech g7 gaming mouse
Keyboard Logitech G510
Software Windows 11 Pro 64 Steam. GOG, Uplay, Origin
Benchmark Scores Firestrike: 46183 Time Spy: 25121
CCD to CCD latency is fixed but still huge. It's really hard to understand where those ~80 ns come from. There must be some very complex switching logic and cache coherency logic on the IOD, or something.
CCD to IOD latency can't be observed directly but CCD to RAM latency seems fine, no issues here (68 ns in AIDA64 on launch day review).

Hopefully. That should be high on AMD's priority list. Another possibility for improvement would be a direct CCD to CCD connection in addition to the existing ones.
How long is 80 nanoseconds? How many nanos are 1 second?
 
Joined
Jan 14, 2019
Messages
13,338 (6.09/day)
Location
Midlands, UK
Processor Various Intel and AMD CPUs
Motherboard Micro-ATX and mini-ITX
Cooling Yes
Memory Overclocking is overrated
Video Card(s) Various Nvidia and AMD GPUs
Storage A lot
Display(s) Monitors and TVs
Case The smaller the better
Audio Device(s) Speakers and headphones
Power Supply 300 to 750 W, bronze to gold
Mouse Wireless
Keyboard Mechanic
VR HMD Not yet
Software Linux gaming master race
How long is 80 nanoseconds? How many nanos are 1 second?
It's 80 billionth of a second. It's apparently enough time for some folks to make breakfast or something.
 
Joined
Aug 25, 2021
Messages
1,193 (0.97/day)
I assume that the second interconnect on the CCD is only used on Epyc, but not on Ryzen. So, inter-CCD communication on Ryzen is still done via the IO die.
As I understand it, there's one "IFOP PHY" responsible for communication with the IO die, and the other one is inactive on Ryzen. Or is it some kind of double-wide communication bus with the Epyc IO die?
indeed it is so, AFAIK client Ryzen does not use wide GMI, only one IFOP per die, i haven't seen any documentation that says they use wide GMI
i don't think client ryzen uses wide GMI, that's only reserved for special EPYC
Let's try to get to the bottom of this by focusing on what we know, what is visible on Ryzen die and what could be inferred.
Zen 5 image​
Zen 2 image​
ZEN 5 .png
AMD AM4 Z2 CPU 3000.png
- first of all, it looks like that High Yields provided an old image of Zen 2 communication lanes by adding the layer of old lanes onto Zen 5 photo
- we can clearly see this on CCD level, as GMIs were placed in the middle of CCD with two 4-core CCXs on Zen 2
- so, the left image is not a genuine Zen 5 communication diagram from the video, but it must have lanes positioned in a different way
- with that out the way, let's move on

- what do you mean by "wide GMI"? Both GMI ports?
- each CCD has two GMI ports and IOD also has the same two GMI ports. Each GMI port is '9-wide', which means each GMI PHY has nine logic areas. - each of the nine logic areas within GMI port translates into PHY that could get one or two communication lanes. This is visible.
- what we do not know is whether all those IF lanes are wired to one GMI port only; not visible on the image; topology documentation is scarce
- it'd be great to see EPYC tolopogy of IF lanes on CPUs that have 4 and fewer CCDs; this would bring us closer to the anwer
That IFOP or GMI is still a bit of a mystery, with too little documentation available (some is here). May I ask you to do a tek-check of the data I compiled, calculated and listed here below? A single (not wide) IFOP (GMI) interface in the Zen 4 architecture has:

- a 32-bit wide bus in each direction (single-ended, 1 wire per bit)
- 3000 MHz default clock in Ryzen CPUs (2250 MHz in Epyc CPUs)
- quad data rate transfers, which calculates to...
- 12 GT/s in Ryzen (9 GT/s in Epyc)
- 48 GB/s per direction in Ryzen (36 GB/s in Epyc)
- Zen 4 is 32B/cycle read, 16B/cycle write
- more on this here: https://chipsandcheese.com/p/amds-zen-4-part-3-system-level-stuff-and-igpu
- Gigabyte leak: https://chipsandcheese.com/p/details-on-the-gigabyte-leak
- read speeds are much faster than write speeds over IF
- read bandwidth does not increase much when two CCDs operate
- write bandwidth almost doubles with two CCDs

Screenshot 2024-10-08 at 22-17-32 AMD’s Zen 4 Part 3 System Level Stuff and iGPU.png


Screenshot 2024-10-08 at 22-17-38 AMD’s Zen 4 Part 3 System Level Stuff and iGPU.png
 
Last edited:
Joined
Jan 3, 2021
Messages
3,686 (2.51/day)
Location
Slovenia
Processor i5-6600K
Motherboard Asus Z170A
Cooling some cheap Cooler Master Hyper 103 or similar
Memory 16GB DDR4-2400
Video Card(s) IGP
Storage Samsung 850 EVO 250GB
Display(s) 2x Oldell 24" 1920x1200
Case Bitfenix Nova white windowless non-mesh
Audio Device(s) E-mu 1212m PCI
Power Supply Seasonic G-360
Mouse Logitech Marble trackball, never had a mouse
Keyboard Key Tronic KT2000, no Win key because 1994
Software Oldwin
it'd be great to see EPYC tolopogy of IF lanes on CPUs that have 4 and fewer CCDs; this would bring us closer to the anwer
You mean this slide, or are you looking for something more detailed?

1728479832900.png

(taken from Tom's)

Also, you've mentioned 36 GB/s and 72 GB/s before, and 9 bits here. It's obvious that 9 bits include the parity bit, but I don't understand what numbers AMD took to calculate 36 and 72 GB/s - unles that includes parity too.
 
Joined
Aug 25, 2021
Messages
1,193 (0.97/day)
You mean this slide, or are you looking for something more detailed?
I have this slide and entire presentation in .pdf. Ideal image or diagram would be EPYC with 4 or fewer CCDs.
I am looking for those detailed die shots where we can see physical wiring, or a diagram that shows them all.
This would allow us to see how they wire two GMI ports from each CCD.
When AMD says "GMI-Narrow", do they mean one GMI port only? And "GMI-Wide" means both GMI ports? It would make sense.
The next question is whether they wire all nine logic parts of a single GMI port to Infinity Fabric. If so, how many single wires, how many double ones?
I do not know.

Of course, AMD does not want to give up some secrets about their IF sauce and wiring, such as the speed of the fabric itself and how they wire it to CCDs and IOD. This is beyond my pay grade.

What we know from AMD:
1. CCD on >32C EPYC Zen4 was configured for 36 Gbps throughput per link on one GMI port to IF ("GMI-Narrow"). 36 Gbps = 4.5 GB/s per link
2. CCD on ≤32C EPYC Zen4 was configured for 36x2 Gbps throughput per link on two GMI ports to IF ("GMI-Wide"). 72 Gbps = 9 GB/s per dual link

The answer we need here is how many IF links does one GMI port provide? Is it 9? There are 9 pieces of logic on die per GMI port. Are they all used on PHY level? If 9 links are used, the throughput would be 9x36 Gbps = 324 Gbps = 40.5 GB/s for one CCD, and 648 Gbps = 81 GB/s for "GMI-wide"

Chips&Cheese testing of IF on Ryzen:
3. one CCD on Ryzen Zen4 has throughput of ~63 GB/s towards DDR5-6000 memory via IF; two CCDs ~77 GB/s
This shows the speed of 504 Gbps for one CCD and 616 Gbps for two CCDs.

- if only one GMI port is used on Ryzen and we assumed there are 9 links in each GMI port, this gives us 9x36 Gbps = 324 Gbps = 40.5 GB/s.
- the measured throughput by C&C was 63 GB/s on read speed, so more links would be needed on one CCD to achieve this throughput
- it seems physically impossible to use both GMI ports on Ryzen CCD and connect those to one GMI port on IOD.
- therefore, it could be the case that IF was configured to run faster on Ryzen CPUs

How does this sit?
 
Joined
Aug 25, 2021
Messages
1,193 (0.97/day)
You mean this slide, or are you looking for something more detailed?
We now have a confirmation from EPYC Zen 5 file:
"INTERNAL INFINITY FABRIC INTERFACES connect the I/O die with each CPU die using a total of 16 36 Gb/s Infinity Fabric links."

Going back to our previous considerations. What we know from AMD about theoretical bandwidth:
1. CCD on >32C EPYC Zen4 was configured for 36 Gbps throughput per link on one GMI port to IF ("GMI-Narrow"). 36 Gbps = 4.5 GB/s per link
- 16 links x 36 Gbps = 576 Gbps = 72 GB/s
2. CCD on ≤32C EPYC Zen4 was configured for 36x2 Gbps throughput per link on two GMI ports to IF ("GMI-Wide"). 72 Gbps = 9 GB/s per dual link
- 16 links x 72 Gbps = 1152 Gbps = 144 GB/s

3. one CCD on Ryzen Zen4 has throughput of ~63 GB/s towards DDR5-6000 memory via IF; two CCDs ~77 GB/s
So yes, it looks like one GMI port is used.
 
Top