Thursday, December 9th 2021
12-channel DDR5 Memory Support Confirmed for Zen 4 EPYC CPUs by AMD
Thanks to a Linux driver update, we now know that AMD's upcoming Zen 4 based EPYC CPUs will support up to 12 channels of DDR5 memory, an upgrade over the current eight. The EDAC driver, or Error Detection and Correction driver update from AMD contained details of the memory types supported by AMD's upcoming server and workstation CPUs and although this doesn't tell us much about what we'll see from the desktop platform, some of this might spill over to a future Ryzen Threadripper CPU.
The driver also reveals that there will be support for both RDDR5 and LRDDR5, which translates to Registered DDR5 and Load-Reduced DDR5 respectively. LRDDR5 is the replacement for LRDIMMs, which are used in current servers with very high memory densities. Although we don't know when AMD is planning to announce Zen 4, even less so the new EPYC processors, it's expected that it will be some time in the second half next year.
Source:
Phoronix
The driver also reveals that there will be support for both RDDR5 and LRDDR5, which translates to Registered DDR5 and Load-Reduced DDR5 respectively. LRDDR5 is the replacement for LRDIMMs, which are used in current servers with very high memory densities. Although we don't know when AMD is planning to announce Zen 4, even less so the new EPYC processors, it's expected that it will be some time in the second half next year.
63 Comments on 12-channel DDR5 Memory Support Confirmed for Zen 4 EPYC CPUs by AMD
Whether any controller does or not I wouldn't know. That was my opening question right back at the start.
As to why we dont just call it a 128bit bus for dual channel 64bit or quad 32bit... would be because 128bit bus would require both channels to be filled...
Unless you want mobo boxes saying up to 128bit bus....
parallelconcurrent addressing. When paired the independence is not there, they act as a single channel then.One channel can be writing and the other reading for example. Or reading two separate areas for two tasks. A four channel controller can then be transferring to/from four areas
in parallelconcurrently. So it's a function of the controller too, not just the DIMMs and mobo.DDR4 and previously: One DIMM has one 64-bit channel, platforms have X (always an integer) channels and nX DIMM slots.
DDR5: One DIMM has two 32-bit channels, platforms have Y (always a multiple of 2) channels and nY/2 DIMM slots (i.e. you can never have a single-channel DDR5 platform, at least using DIMMs, as they always have two channels.
As always, DIMM slots and channel count are mediated by the number of DIMMs per channel, which is typically limited to two for consumer platforms, but can be (a lot) more in servers. So, for example, with a 4-DIMM motherboard, with DDR4 and previous that board could have 1, 2, 3 or 4 channels (but most likely 2 with 2DPC), with DDR5 a four-slot board can have 2, 4, 6 or 8 channels (most likely 4) - but they're half as wide.
Because of this, and because hardly anyone knows how wide a RAM channel is to begin with (and having to explain this constantly will just add to the confusion), DDR5 "channels" are talked of as if it still had a single 64-bit channel like previous standards, despite this being technically wrong. It's just that much easier to communicate.
Thus, a "12-channel" DDR5 server platform, even as reported in a rumor for the technically inclined, is likely to actually have 24 32-bit channels, or 12 "2x32-bit" channel pairs. Which means that if you're used to current 8-channel DDR4 platforms, this is a 50% increase (before accounting for clock speed increases).
The main reason is you had single core/thread CPU that wouldn't benefits as much to have another channel to read and write. The single core would just queue or merge the command mostly do 1 thing at the same time with no problem
Today, CPU on AM5 have up to 32 thread that need to use 2 channel for all their read and write. With DDR5, they now have 4 to do that while keeping the same bus width. The ratio Thread/channel change from 16/1 to 8/1. This is to help for high memory load latency (and not benchmark like AIDA64 that are mostly just a synthetic benchmark that read various amount of data in serial).
I am not sure if that will greatly benefits for desktop but it will certainly help servers on high concurrency workload. At least for right now. We all know that it took years before games started to be multithreaded, but those days, most game are. It's quite possible that future workload on desktop will require all those channels.
Valantar is right, if you have to write a 64b line in memory, it will always go thru a single 32 bit channel and it will never be split across 2 channel whenever it's on the same DIMM realm/channel or on another one. It will first screw up the addressing but also, it would defeat the main purpose of splitting these channels in the first place. (Having more parallel way of doing read/write to reduce latency in highly threaded workload.)
As we can see, CPU get larger and larger cache and we might not be far away than we think of a desktop class CPU with 1+ GB of L3 cache. The cache really help with Read operation but it do not solve the problem that all data need to be written to memory and this need bandwidth. The data must be somewhere before it get evicted from the cache. Having more channel to offload those write plus having higher bandwidth help. They also added a bunch of other feature like same bank refresh to ensure that the memory can be available to write or read as much as possible.
People freak out a bit about the high latency of DDR5 but the truth is in 2-3 CPU generation, it wouldn't really matter since those CPU would have large cache anyway. Most of the data they will run will fit in cache.
People that designed the DDR5 standard were very smart and they were seeing already what the future would look like. They know that standard will last for half a decade at minimum.
With DDR4, when a refresh occur, a whole bank group is inaccessible and those refresh take quite some time. With DDR5, instead of refreshing all bank, it have the ability to only refresh one of the bank in each bank group. A bank need to be idle (no read or write) for the duration of the refresh that can last 280+ns on a 16 GB stick and need to be run every 4 µs. A Same Bank Refresh can be run every 2 µs and will be much shorter. (130 ns on a 16 GB stick).
And most importantly, while 1 bank is being refresh, all the 3 other bank of each bank group can still be active and perform read and/or write operation. In All bank mode the DIMM is being paused until the refresh is done.
So, when Intel says two total, Intel is likely speaking correctly because that's how that particular controller functions.
As for not always wiring up one channel (Historical AthlonXP example above). That's also true for today. All multichannel controllers can have unconnected channels.
PS: On that caches thing, it's highly likely there is caching schemes that use channels in place of multi-porting. Multi-ported SRAM is very bulky (right at the cell level) so only used for the small L1 caching. Although, in the case of Apple's M1 cores having 128 kB for each data cache (combined with the extreme number of execution units), they may well have opted for a multi-channel solution there instead.