Monday, April 3rd 2023
AMD and JEDEC Create DDR5 MRDIMMs with 17,600 MT/s Speeds
AMD and JEDEC are collaborating to create a new industry standard for DDR5 memory called MRDIMMs (multi-ranked buffered DIMMs). The constant need for bandwidth in server systems provides trouble that can not easily be solved. Adding more memory is difficult, as motherboards can only get so big. Incorporating on-package memory solutions like HBM is expensive and can only scale to a specific memory capacity. However, engineers of JEDEC, with the help of AMD, have come to make a new standard that will try and solve this challenge using the new MRDIMM technology. The concept of MRDIMM is, on paper, straightforward. It combines two DDR5 DIMMs on a single module to effectively double the bandwidth. Specifically, if you take two DDR5 DIMMs running at 4,400 MT/s and connect them to create a single DIMM, you get 8,800 MT/s speeds on a single module. To efficiently use it, a special data mux or buffer will effectively take two Double Data Rate (DDR) DIMMs and convert them into Quad Data Rate (QDR) DIMMs.
The design also allows simultaneous access to both ranks of memory, thanks to the added mux. First-generation MRDIMMs can produce speeds of up to 8,800 MT/s, while the second and third generations modules can go to 12,800 MT/s and 17,600 MT/s, respectively. We expect third-generation MRDIMMs after 2030, so the project is still far away. Additionally, Intel has a similar solution called Multiplexer Combined Ranks DIMM (MCRDIMM) which uses a similar approach. However, Intel's technology is expected to see the light of the day as early as 2024/2025 and beyond the generation of servers, with Granite Rapids likely representing a contender for this technology. SK Hynix already makes MCRDIMMs, and you can see the demonstration of the approach below.
Sources:
Robert Hormuth, via Tom's Hardware
The design also allows simultaneous access to both ranks of memory, thanks to the added mux. First-generation MRDIMMs can produce speeds of up to 8,800 MT/s, while the second and third generations modules can go to 12,800 MT/s and 17,600 MT/s, respectively. We expect third-generation MRDIMMs after 2030, so the project is still far away. Additionally, Intel has a similar solution called Multiplexer Combined Ranks DIMM (MCRDIMM) which uses a similar approach. However, Intel's technology is expected to see the light of the day as early as 2024/2025 and beyond the generation of servers, with Granite Rapids likely representing a contender for this technology. SK Hynix already makes MCRDIMMs, and you can see the demonstration of the approach below.
24 Comments on AMD and JEDEC Create DDR5 MRDIMMs with 17,600 MT/s Speeds
Cool.
It's curious though that AMD did collaborate with JEDEC and Intel apparently didn't, or at least did not succeed. Were they too late, is their MCR solution technically inferior, or do they just love proprietary tech?
Also, in servers, overvolting is out of the question, so individual DRAM dies don't have much potential to go over 4800 MT/s. You need multiplexing to arrive at 8800. One and a half volts is for people who just want to set records and don't care how long RAM (or IMC) will live.
Both are generally accepted (especially w/ those aware that RAID was originally for $/MB efficiency) It's only similar in concept, not execution.
While it is effectively 2 RAM DIMMs glued together, they are in an Integrated construction, not Independant.
So... it's still RAID,
just not RAInexpensiveD,
or RAIndependantD
Aren't acronyms fun?! :p
*Yes. I would bet on it being that. *
[looks at Optane NVDIMMs]
But seriously, usually, stuff that starts off as "server" parts will eventually make it way down to us little people, but it may take a while, since this would probably involve mobo mfgr's to implement the changes needed to use these sticks to their full potential, and perhaps a tweak or two by M$, depending on how the sticks are seen/utilized by the OS...
But that is interesting. It would mean that with 2 DIMM and 2 channel on the mainboard, you would have 4 Subchannel per DIMM for a total of 8.
That bring things close to where it was. For a very long time, there was at least 1 channel per core (since CPU were single core). Now, on a CPU like a 5900x, it's 0,125 channel per core or 0,25 on the 7950x with DDR5. With that it would bring things back to at least 0.5 channel per core. (and still, we omit SMT that destroy that ratio too).
Among many other things, this is one of the reason why memory optimization's take such a big place today.
In real world, you have to wait for a busy channel to be free again before sending another command. This add latency. If you can do your operation on a free channel, you not only get more bandwidth, but less latency.
We will see.
Again just hope we won't have 2 standard on desktop/laptop
Also, i wonder if that could mean a shorter life for AM5 if this need a new socket.
MRDIMMs, building on top of registered DIMMs, will be "even more server exclusive" IMO.
I'm not even sure that Core and Ryzen CPUs exploit the granularity of subchannels.
So why the core to channels is important? well because a core would normally work in series and initiate memory ops in sequences (a huge simplification, but let's keep it simple). In a 1 to 1 scenario, the memory should have a small or no queue at all as it would fill memory request as they come.
But in a scenario where you have 16 cores with SMT (so 32 thread) working, you would have 32 thread that generate memory ops that can only be sent to 2 or 4 subchannels. So there will be queues for sure. You will lose precious CPU cycles.
Prefetching to cache might help to alleviate this up to a point by starting the memory read before you actually need it and loading it the cache. But there is no saving for write (but usually, most memory operations are read).
And that work for L1 and L2. L3 is a victim cache on Intel and AMD, it means it will not be smart and will only contain data that got evicted from L2. It can be good if you reuse the same data very frequently. but in many scenario, it won't help at all.
If you have more channel, not only it give you more bandwidth, but also more queues to spread the load and to reduce the overall memory latency. Bandwidth and number of memory IOP grow with the number of cores (if they are in use indeed).
As for the CPU being aware of channel. They are not. That logic is handled by the memory controller. So in this case, the memory controller will be aware of double the channel. But anyway, it's quite easy to see that memory channel help performance. Just look at benchmark. The fact that it's not a linear increase is probably related to how the data is spread accross channel and how much latency the cache subsystem was able to hide.