Monday, April 3rd 2023

AMD and JEDEC Create DDR5 MRDIMMs with 17,600 MT/s Speeds

AMD and JEDEC are collaborating to create a new industry standard for DDR5 memory called MRDIMMs (multi-ranked buffered DIMMs). The constant need for bandwidth in server systems provides trouble that can not easily be solved. Adding more memory is difficult, as motherboards can only get so big. Incorporating on-package memory solutions like HBM is expensive and can only scale to a specific memory capacity. However, engineers of JEDEC, with the help of AMD, have come to make a new standard that will try and solve this challenge using the new MRDIMM technology. The concept of MRDIMM is, on paper, straightforward. It combines two DDR5 DIMMs on a single module to effectively double the bandwidth. Specifically, if you take two DDR5 DIMMs running at 4,400 MT/s and connect them to create a single DIMM, you get 8,800 MT/s speeds on a single module. To efficiently use it, a special data mux or buffer will effectively take two Double Data Rate (DDR) DIMMs and convert them into Quad Data Rate (QDR) DIMMs.

The design also allows simultaneous access to both ranks of memory, thanks to the added mux. First-generation MRDIMMs can produce speeds of up to 8,800 MT/s, while the second and third generations modules can go to 12,800 MT/s and 17,600 MT/s, respectively. We expect third-generation MRDIMMs after 2030, so the project is still far away. Additionally, Intel has a similar solution called Multiplexer Combined Ranks DIMM (MCRDIMM) which uses a similar approach. However, Intel's technology is expected to see the light of the day as early as 2024/2025 and beyond the generation of servers, with Granite Rapids likely representing a contender for this technology. SK Hynix already makes MCRDIMMs, and you can see the demonstration of the approach below.
Sources: Robert Hormuth, via Tom's Hardware
Add your own comment

24 Comments on AMD and JEDEC Create DDR5 MRDIMMs with 17,600 MT/s Speeds

#1
Crackong
So quad channels in dual physical slots
Posted on Reply
#2
hs4
Last time Hynix-Renesus-Intel announced MCR DIMMs, and this looks similar to that, but are they compatible?
Posted on Reply
#3
LabRat 891
So.... AMD worked w/ JEDEC to make what is essentially: RAM RAID0
Cool.
Posted on Reply
#4
usiname
Is it only for the server/HEDT platforms or its possible to be implemented for the regular users?
Posted on Reply
#5
TumbleGeorge
I think that ahead in time to 2025 FB-Dimm also will have support of DDR5 around 8400-9000. And will be faster in effective usage because all of RAM modules work in parallel.
Posted on Reply
#6
Minus Infinity
Apparenlty AMD has almost certified DDR5 7200MT's for Epyc Turin, and one would expect Zen 5 desktop will easily support such speeds with multiple DIMMS.
Posted on Reply
#7
Wirko
LabRat 891So.... AMD worked w/ JEDEC to make what is essentially: RAM RAID0
Cool.
Redundant array of *inexpensive* ... huh?

It's curious though that AMD did collaborate with JEDEC and Intel apparently didn't, or at least did not succeed. Were they too late, is their MCR solution technically inferior, or do they just love proprietary tech?
Posted on Reply
#8
TumbleGeorge
o_O Hard to imagine DDR5 7200@12 channels! It will probably happen, but what RAM speed is that? Terrible!
Posted on Reply
#9
Wirko
CrackongSo quad channels in dual physical slots
24 channels in 12 physical slots is probably a better description. These things are aimed at servers with many many cores per processor, high RAM bandwidth is valuable in those but high RAM density is at least equally valuable, and this multiplexed memory can achieve both.

Also, in servers, overvolting is out of the question, so individual DRAM dies don't have much potential to go over 4800 MT/s. You need multiplexing to arrive at 8800. One and a half volts is for people who just want to set records and don't care how long RAM (or IMC) will live.
Posted on Reply
#10
LabRat 891
WirkoRedundant array of *inexpensive* ... huh?

It's curious though that AMD did collaborate with JEDEC and Intel apparently didn't, or at least did not succeed. Were they too late, is their MCR solution technically inferior, or *do they just love proprietary tech?*
That acronym has changed over the years: Redundant Array of Independent Disks
Both are generally accepted (especially w/ those aware that RAID was originally for $/MB efficiency)
The concept of MRDIMM is, on paper, straightforward. It combines two DDR5 DIMMs on a single module to effectively double the bandwidth.
It's only similar in concept, not execution.
While it is effectively 2 RAM DIMMs glued together, they are in an Integrated construction, not Independant.

So... it's still RAID,
just not RAInexpensiveD,
or RAIndependantD
Aren't acronyms fun?! :p


*Yes. I would bet on it being that. *
[looks at Optane NVDIMMs]
Posted on Reply
#11
bonehead123
usinamepossible to be implemented for the regular users?
It is possible, but only if you have ~$1000 to spend on each stick, hehehe :)

But seriously, usually, stuff that starts off as "server" parts will eventually make it way down to us little people, but it may take a while, since this would probably involve mobo mfgr's to implement the changes needed to use these sticks to their full potential, and perhaps a tweak or two by M$, depending on how the sticks are seen/utilized by the OS...
Posted on Reply
#12
Punkenjoy
I hope there won't be memory just for Intel and memory just for AMD. On desktop at least. It would be less bad on the server market.

But that is interesting. It would mean that with 2 DIMM and 2 channel on the mainboard, you would have 4 Subchannel per DIMM for a total of 8.

That bring things close to where it was. For a very long time, there was at least 1 channel per core (since CPU were single core). Now, on a CPU like a 5900x, it's 0,125 channel per core or 0,25 on the 7950x with DDR5. With that it would bring things back to at least 0.5 channel per core. (and still, we omit SMT that destroy that ratio too).

Among many other things, this is one of the reason why memory optimization's take such a big place today.

In real world, you have to wait for a busy channel to be free again before sending another command. This add latency. If you can do your operation on a free channel, you not only get more bandwidth, but less latency.

We will see.

Again just hope we won't have 2 standard on desktop/laptop

Also, i wonder if that could mean a shorter life for AM5 if this need a new socket.
Posted on Reply
#13
dragontamer5788
usinameIs it only for the server/HEDT platforms or its possible to be implemented for the regular users?
RDIMMs are already server-only (I don't think Threadripper Pro even takes them).

MRDIMMs, building on top of registered DIMMs, will be "even more server exclusive" IMO.
Posted on Reply
#14
BoboOOZ
PunkenjoyI hope there won't be memory just for Intel and memory just for AMD. On desktop at least. It would be less bad on the server market.

But that is interesting. It would mean that with 2 DIMM and 2 channel on the mainboard, you would have 4 Subchannel per DIMM for a total of 8.

That bring things close to where it was. For a very long time, there was at least 1 channel per core (since CPU were single core). Now, on a CPU like a 5900x, it's 0,125 channel per core or 0,25 on the 7950x with DDR5. With that it would bring things back to at least 0.5 channel per core. (and still, we omit SMT that destroy that ratio too).

Among many other things, this is one of the reason why memory optimization's take such a big place today.

In real world, you have to wait for a busy channel to be free again before sending another command. This add latency. If you can do your operation on a free channel, you not only get more bandwidth, but less latency.

We will see.

Again just hope we won't have 2 standard on desktop/laptop

Also, i wonder if that could mean a shorter life for AM5 if this need a new socket.
Luckily, on desktop they can decrease latency by just sticking a big slab of 3DVCache on the CPU ;)
Posted on Reply
#15
Wirko
PunkenjoyThat bring things close to where it was. For a very long time, there was at least 1 channel per core (since CPU were single core). Now, on a CPU like a 5900x, it's 0,125 channel per core or 0,25 on the 7950x with DDR5. With that it would bring things back to at least 0.5 channel per core. (and still, we omit SMT that destroy that ratio too).
Why do you think the number of channels is so important? The combined bandwidth certainly is, that's what you should calculate per core. But can there be a big advantage of two independent 32-bit channels (in DDR5) versus one 64-bit (which in DDR5 terms means that the two subchannels aren't independent, they both receive and execute the same R/W requests at all times)?
I'm not even sure that Core and Ryzen CPUs exploit the granularity of subchannels.
Posted on Reply
#16
Punkenjoy
WirkoWhy do you think the number of channels is so important? The combined bandwidth certainly is, that's what you should calculate per core. But can there be a big advantage of two independent 32-bit channels (in DDR5) versus one 64-bit (which in DDR5 terms means that the two subchannels aren't independent, they both receive and execute the same R/W requests at all times)?
I'm not even sure that Core and Ryzen CPUs exploit the granularity of subchannels.
The bandwidth and latency of a memory is already significant. A trip to memory is very costly in CPU cycles. If you add on top of that the fact that you might have to wait for the channel to be free, you greatly waste CPU cycles. If your memory request take 60 ns but there are 10 in queues before that one, well, it's going to take 600 ns.

So why the core to channels is important? well because a core would normally work in series and initiate memory ops in sequences (a huge simplification, but let's keep it simple). In a 1 to 1 scenario, the memory should have a small or no queue at all as it would fill memory request as they come.

But in a scenario where you have 16 cores with SMT (so 32 thread) working, you would have 32 thread that generate memory ops that can only be sent to 2 or 4 subchannels. So there will be queues for sure. You will lose precious CPU cycles.

Prefetching to cache might help to alleviate this up to a point by starting the memory read before you actually need it and loading it the cache. But there is no saving for write (but usually, most memory operations are read).

And that work for L1 and L2. L3 is a victim cache on Intel and AMD, it means it will not be smart and will only contain data that got evicted from L2. It can be good if you reuse the same data very frequently. but in many scenario, it won't help at all.

If you have more channel, not only it give you more bandwidth, but also more queues to spread the load and to reduce the overall memory latency. Bandwidth and number of memory IOP grow with the number of cores (if they are in use indeed).

As for the CPU being aware of channel. They are not. That logic is handled by the memory controller. So in this case, the memory controller will be aware of double the channel. But anyway, it's quite easy to see that memory channel help performance. Just look at benchmark. The fact that it's not a linear increase is probably related to how the data is spread accross channel and how much latency the cache subsystem was able to hide.
Posted on Reply
#17
unwind-protect
dragontamer5788RDIMMs are already server-only (I don't think Threadripper Pro even takes them).

MRDIMMs, building on top of registered DIMMs, will be "even more server exclusive" IMO.
The Pro version of Threadripper uses registered RAM. Until DDR4 you could also feed it unbuffered RAM (in much less quantity of course).
Posted on Reply
#18
demirael
unwind-protectThe Pro version of Threadripper uses registered RAM. Until DDR4 you could also feed it unbuffered RAM (in much less quantity of course).
"Until DDR4"? What? It's Ryzen based and has only ever used DDR4.
Posted on Reply
#19
unwind-protect
demirael"Until DDR4"? What? It's Ryzen based and has only ever used DDR4.
Yes, what I am saying is that with DDR5 (current and future platforms) you can no longer use UDIMMs instead of RDIMMs. Up to DDR4 most boards would allow that.
Posted on Reply
#20
R-T-B
WirkoRedundant array of *inexpensive* ... huh?
To be fair "inexpensive" became "independent" awhile ago... lol.
Posted on Reply
#21
Minus Infinity
TumbleGeorgeo_O Hard to imagine DDR5 7200@12 channels! It will probably happen, but what RAM speed is that? Terrible!
DDR56400 is already qualified. Genoa only qualified for DDR54800 max. So this is a huge improvement. Would be fantastic to see 4 dimms usable at good latencies at 6400+MT's on Zen 5 desktop
Posted on Reply
#22
DaveLT
TumbleGeorgeI think that ahead in time to 2025 FB-Dimm also will have support of DDR5 around 8400-9000. And will be faster in effective usage because all of RAM modules work in parallel.
FBDIMM? Is this 2007?
Posted on Reply
#24
Hardware Geek
LabRat 891That acronym has changed over the years: Redundant Array of Independent Disks
Both are generally accepted (especially w/ those aware that RAID was originally for $/MB efficiency)



It's only similar in concept, not execution.
While it is effectively 2 RAM DIMMs glued together, they are in an Integrated construction, not Independant.

So... it's still RAID,
just not RAInexpensiveD,
or RAIndependantD
Aren't acronyms fun?! :p


*Yes. I would bet on it being that. *
[looks at Optane NVDIMMs]
To be fair, it isn't redundant either.
Posted on Reply
Add your own comment
Jan 22nd, 2025 00:50 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts