DDR5 Memory Performance Scaling with AMD Zen 5 97

DDR5 Memory Performance Scaling with AMD Zen 5

(97 Comments) »

Introduction

AMD Logo

Just a few weeks ago, AMD debuted the Zen 5 CPU microarchitecture and the Ryzen 9000 series Granite Ridge desktop processors. We've reviewed each of the four models the company launched, so be sure to check them out for additional context and background. In this article, we will investigate how the new AMD Zen 5 processors behave with popular memory speeds—both standard and overclocking. AMD processors have good, modern, memory support—the Ryzen 9000 series can handle up to 256 GB of dual-channel DDR5 memory, just like the competing Intel processors. You can drop in any PC DDR5 UDIMM memory kit ever made, and it will just work on Ryzen, albeit at JEDEC-standard speeds. Where AMD and Intel diverge a little is the way they handle memory overclocking, with the two platforms each having a handful of unique settings.



The adoption of DDR5 memory on modern platforms has brought significant advancements in bandwidth and data transfer rates, and AMD's Ryzen 9000 AM5 platform is no exception. As DDR5 memory becomes more mainstream, the potential for performance scaling across various memory speeds offers an opportunity to understand how different configurations impact overall system behavior. In this review, we investigate the scaling of DDR5 memory on Ryzen 9 9950X, running speeds from DDR5-8000 down to the JEDEC baseline specifications of DDR5-5600 and DDR5-4800. These tests aim to highlight the performance gains and any potential trade-offs when pushing DDR5 memory beyond the standard specifications. We're also doing testing at the interesting DDR5-6000 CL28 setting as well as DDR5-6400 with UCLK 1:1.

In this article, we are testing with a Ryzen 9 9950X processor, AMD's fastest Zen 5, with a handful of important memory speeds. To begin with, we run two JEDEC-spec baseline memory speeds of 4800 MT/s and 5600 MT/s—these serve as baseline. Next up, we have DDR5-6000 CL28, which represents AMD's official sweetspot recommendation for these processors, but with tight timings. We also ventured out into DDR5-6400 with 1:1 UCLK to MCLK ratio, which ran perfectly stable on our test system—more on that "ratio" on the next page. Moving up in memory speed we're running DDR5-7200 CL34, which is a typical maximum of the pre-Zen 5 era. Last but not least, we're testing DDR5-8000 CL38, using the wonderful G.Skill Trident Z5 Royal Neo memory kit.

But first some theory. On the following page we'll detail how the memory controller works on AMD's Zen 5 architecture, and why it is quite different to how Intel has been designing their memory architecture. We will also delve into how the memory controller handles high-speed DDR5 memory configurations, assessing potential bottlenecks, thermal behavior, and any adjustments required to maintain stability.

Memory Controller Architecture


AMD's Zen 5 architecture is built around a multi-die design, where the I/O die (IOD) is separate from the compute dies, commonly known as Core Complex Dies (CCDs). This separation is a defining feature of AMD's modern Ryzen platforms, allowing the I/O Die to handle all data traffic between the CPU cores, memory, and other system components. By isolating the memory controller and other I/O functions in a dedicated die, AMD can optimize the design and thermal characteristics of the CCDs, focusing them solely on compute tasks. This not only improves efficiency but also helps scale performance across different configurations of core counts and memory speeds, especially as DDR5 introduces higher bandwidth demands.

Having a distinct I/O die allows AMD to manufacture the CCDs and IOD on different process nodes, which provides flexibility in optimizing each component for its specific function. For example, while the CCDs may be produced on a leading-edge process for better performance, the IOD can be manufactured on a more mature node that balances performance with power efficiency, and it helps reduce cost, too.

FCLK vs UCLK vs MCLK


In AMD's DDR5 memory architecture, the clocking system is divided into three key components: FCLK, UCLK, and MCLK, each playing a crucial role in memory performance.
  • FCLK (Infinity Fabric Clock) governs the frequency of the Infinity Fabric, which links the compute dies, I/O die, and other components.
  • UCLK (Unified Memory Controller Clock) controls the speed of the memory controller within the I/O die, how fast it can forward the data flowing between Infinity Fabric and the chips on the memory module.
  • MCLK (Memory Clock) directly corresponds to the speed of the DDR5 memory, this is the "DDR5-6000" that memory vendors list in their spec sheets. Please note that due to the double-data-rate (DDR) nature of DDR5, the actual clock frequency is half that, i.e. 3000 MHz for DDR5-6000.
In an ideal configuration, these three clocks operate in a synchronized manner—the 1:1:1 ratio (FCLK:UCLK:MCLK), that you might have seen before. In this configuration latency between the processor and memory is at a minimum, because data can flow right through. However, as DDR5 speeds increase, maintaining this ratio becomes more challenging, which is why AMD enabled a decoupled mode, which lets each component operate at or near its own maximum frequency.

Decoupling the clocks will introduce additional latency because the memory controller and memory are no longer perfectly synchronized, leading to small delays in transfers while data is buffered for a short moment. However, this trade-off is often necessary to stabilize the system at higher DDR5 speeds, as maintaining a 1:1:1 ratio becomes increasingly difficult due to the limits of the Infinity Fabric and memory controller at extreme frequencies.

In the past, AMD has recommended running at a 1:1:1 mode, but this has changed with Zen 5. Now the recommendation is Auto:1:1. This means that you should keep FCLK at around 2000 MHz, which is set automatically by the BIOS. Actually, on my board it gets set to 2100 MHz, which is what I used for all testing.

1:1 vs 1:2

By default, AMD's Zen 5 processors switch from the 1:1 mode to a 1:2 mode for all memory speeds higher than 6000 MT/s. DDR5-6000 runs at 1:1, which is part of the reason why AMD selected it as the sweet spot. Due to the way AMD designed their memory controller, it's unable to run at the high speeds required by modern DDR5.


The table above shows the relationship between MCLK and UCLK at various DDR speeds. As you can see, once you go above DDR5-6000, the memory controller has to run at MCLK of over 3000 MHz, which is getting close to its maximum frequency limits.

When the 1:2 mode is active, the memory runs at twice the frequency of the memory controller, which means the memory controller can run at lower speeds, which it can handle better. This lets you achieve higher memory speeds without compromising the stability of the memory controller. While memory bandwidth is increased, additional latency is introduced, because the memory controller has to manage the data flow without being able to synchronize perfectly with the memory clock.

Test System

  • The DDR5 testing scores are with the power limit disabled using PBO. With the default power limit, the additional power consumption in the memory controller due to the higher DRAM speed will take away some power headroom from the CPU cores. This is explained on the next page.
Test System "Zen 5"
Processor:AMD Ryzen 9 9950X 16c/32t
Motherboard: ASUS X670E Crosshair Hero
BIOS 2201
Memory: G.Skill Trident Z5 Royal Neo
F5-8000J3848H16GX2-TR5NS

DDR5-8000 38-48-48-127 / 1.45 V / UCLK 1:2
DDR5-7200 34-42-42-84 / 1.45 V / UCLK 1:2
DDR5-6400 32-38-38-76-156 / 1.45 V / UCLK 1:1
DDR5-6000 28-36-36-72 / 1.45 V / UCLK 1:1
DDR5-4800 40-40-40-77 / 1.1 V (JEDEC)
DDR5-5600 40-40-40-77 / 1.1 V (JEDEC)

Fabric Clock @ 2100 MHz
Graphics:PNY GeForce RTX 4090 XLR8
Storage:2 TB M.2 NVMe SSD
Air Cooling:Noctua NH-D15
Water Cooling:Arctic Liquid Freezer II
420 mm AIO
Thermal Paste:Arctic MX-6
Power Supply:Thermaltake Toughpower GF3 1200 W
ATX 3.0 / 16-pin 12VHPWR
Software:Windows 11 Professional 64-bit 23H2
VBS enabled (Windows 11 default)
Drivers:NVIDIA GeForce 555.85 WHQL
Ryzen Chipset Drivers: 6.06.28.910
Game Mode enabled, Game Bar installed and active

Memory Controller Power Consumption

Testing on this page focuses on measuring the power usage of the I/O die in AMD's Ryzen 5 processors, which includes the memory controller that manages data transfer between the CPU and DDR5 memory. By evaluating various memory settings, we aim to reveal the power consumption patterns of this component under different operational conditions. This analysis is crucial for understanding the efficiency of the memory controller and its performance implications as memory speeds increase, ultimately guiding optimizations for both thermal and energy efficiency in high-demand computing scenarios.

We're reporting the power draw of the whole I/O die as displayed in Ryzen Master as "SoC Power." This means that the number does include additional consumers like Infinity Fabric and IO Hub, the relative differences can be attributed to the MC though, because the only parameter that we're changing is the memory speed.


Interesting findings here, and right off the bat, your eyes are drawn to how high memory OC speeds (DDR5-7200 and DDR5-8000) cause the power draw of the I/O die to skyrocket, exceeding 20 W when under stress, and even the idle power draw nearly doubles over the default settings.

At stock DDR5-4800 speeds, the BIOS runs the I/O die at a voltage of 1.025 V (VSoC). When the memory is running faster than that, SoC voltage gets ramped up to 1.2 V, regardless of the degree of the overclock. To provide additional context, I measured DDR5-4800 with a 1.2 V manual VSoC setting.

I can hear you asking "Is 10 or 20 W really a big deal?"—Not sure if it is, but it has interesting implications that aren't immediately apparent.

The processor power limit (TDP/PPT) applies to the whole processor. If the memory controller consumes 20 W instead of 10 W, these additional 10 W come out of the power budget for the whole CPU. If a certain workload is intense enough, it will make the processor power throttle to ensure it stays within power and thermal limits—as designed.

But with a power-hungry MC, you're left with less power available for the CPU cores that do the calculations, which means the processor will throttle their frequency until the whole processor stays within the overall power budget. In some tests the result is that DDR5-4800 ends up faster than DDR5-8000, because there's 10 W more that can go to the CPU cores, so they can boost higher. 10 W is not insignificant—it's +5% when the CPU is running at 200 W.

"So, I'll just remove the power limit"—good thinking, and that's exactly what I did for this testing. But now you'll run into another limit. All Zen 5 processors have their thermal limit set to 95°C, and when the 9950X is fully loaded it will reach that temperature, even with a big 360 mm AIO running at full speed.

Now you're not limited by the power limit, but by the thermal limit, and the additional 10 W from the IO die will make your processor a little bit hotter, which means it has to throttle the CPU frequency just a bit more, to stay at 95°C—we're back to square one. The underlying reason for the high temperature is the thick IHS, which is a compromise AMD chose to retain cooler compatibility with AM4. Of course, you could delid the processor and remove the IHS, but that's a rare scenario that's impractical for the vast majority of users.

Y-Cruncher

Y-Cruncher is a highly optimized piece of software that can calculate Pi and other constants to a huge number of digits. It is fully multithreaded, uses a modern code design and is optimized for all major processor architectures. This ability has made it a popular application, used by the enthusiast community to determine and compare how powerful their overclocked systems are.



Our Patreon Silver Supporters can read articles in single-page format.
Discuss(97 Comments)
Dec 23rd, 2024 07:53 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts