Wednesday, July 13th 2022

CXL Memory Pooling will Save Millions in DRAM Cost

Hyperscalers such as Microsoft, Google, Amazon, etc., all run their cloud divisions with a specific goal. To provide their hardware to someone else in a form called instance and have the user pay for it by the hour. However, instances are usually bound by a specific CPU and memory configuration, which you can not configure yourself. But instead, you can only choose from the few available options that are listed. For example, when selecting one virtual CPU core, you get two GB of RAM and can go as high as you want with CPU cores. However, the available RAM will also double, even though you might not need it. When renting an instance, the allocated CPU cores and memory are yours until the instance is turned off.

And it is precisely this that hyperscalers are dealing with. Many instances don't fully utilize their DRAM, making the whole data center usage inefficient. Microsoft Azure, one of the largest cloud providers, measured that 50% of all VMs never touch 50% of their rented memory. This makes memory stranded in a rented VM, making it unusable for anything else.
At Azure, we find that a major contributor to DRAM inefficiency is platform-level memory stranding. Memory stranding occurs when a server's cores are fully rented to virtual machines (VMs), but unrented memory remains. With the cores exhausted, the remaining memory is unrentable on its own, and is thus stranded. Surprisingly, we find that up to 25% of DRAM may become stranded at any given moment.
To achieve better results, we have to turn to mainframe designs and copy their behavior. The memory pooling concept is designed to allow the CPU to access as much memory as it needs without occupying and stranding DRAM in VMs that don't need it. Backing this up is the new CXL protocol for cache coherency, which every major hardware provider is including in their offering. Having a data center with CXL hardware allows companies like Microsoft to reduce costs. As the company notes, "[memory] disaggregation can achieve a 9 - 10% reduction in overall DRAM, which represents hundreds of millions of dollars in cost savings for a large cloud provider."
Microsoft estimates that the use of CXL and memory pooling will cut data center costs by 4-5% server costs. This is a significant number, as DRAM alone consumes more than 50% of server costs.

As the performance is concerned, the Azure team benchmarked a few configurations that use local DRAM and pooled DRAM to achieve the best results. The performance penalty for using pooled memory depended on the application. However, we know that accessing pooled memory required an additional 67-87 ns (nanoseconds) latency. This is quite a significant performance hit, resulting in a slowdown of a few applications. About 20% of applications receive no perfromance hit from pooled memory; 23% of applications receive less than 5% slowdown; 25% experience more than 20% slowdown; while 12% experienced more than 30% downshift. Performance figures can be seen below.
According to Microsoft, this is only the first generation testing conducted on the first wave of CXL hardware. Results are promising as they reduce cloud costs for the hyperscaler. With the next-generation hardware and CXL protocol specifications, we could experience much better behavior. For more information, please refer to the paper that examines this in greater detail.
Source: SemiAnalysis
Add your own comment

24 Comments on CXL Memory Pooling will Save Millions in DRAM Cost

#1
Valantar
Next press release: "Samsung, Micron, Hynix, Elpida and all other DRAM makers announce plans to consolidate into single entity in response to challenging market conditions"
Posted on Reply
#2
Blaeza
ValantarNext press release: "Samsung, Micron, Hynix, Elpida and all other DRAM makers announce plans to consolidate into single entity in response to challenging market conditions"
Not in our lifetimes...
Posted on Reply
#3
Valantar
BlaezaNot in our lifetimes...
The major tendency in late-stage capitalism is consolidation and monopolization in order to increase profits and keep prices high. Heck, DRAM makers have long been accused of working as a cartel to artificially inflate prices. We've already gone from dozens to a tiny handful of companies in the past two decades. My post was obviously exaggerated, but that kind of movement is more than realistic IMO.
Posted on Reply
#4
Blaeza
ValantarThe major tendency in late-stage capitalism is consolidation and monopolization in order to increase profits and keep prices high. Heck, DRAM makers have long been accused of working as a cartel to artificially inflate prices. We've already gone from dozens to a tiny handful of companies in the past two decades. My post was obviously exaggerated, but that kind of movement is more than realistic IMO.
I read it as a big dollop of sarcasm... I've been in the PC game way too short. Didn't realise the monopoly game was real in PC land.
Posted on Reply
#5
Valantar
BlaezaI read it as a big dollop of sarcasm... I've been in the PC game way too short. Didn't realise the monopoly game was real in PC land.
It is real in literally every single sector of business. The easiest way to ensure an advantageous competitive position in an unregulated or effectively unregulated market is to buy or merge with your competitors. When profit is seen as the only goal of business, that's the necessary logical conclusion, and the ongoing, decades-long work towards undermining and hollowing out the hard-fought regulatory systems constructed in the early and mid 20th century has been paying off for a while now.
Posted on Reply
#6
bonehead123
ValantarIt is real in literally every single sector of business. The easiest way to ensure an advantageous competitive position in an unregulated or effectively unregulated market is to buy or merge with your competitors. When profit is seen as the only goal of business, that's the necessary logical conclusion, and the ongoing, decades-long work towards undermining and hollowing out the hard-fought regulatory systems constructed in the early and mid 20th century has been paying off for a while now.
Sooooo, in other words:

Capitalism 101 at it's finest, and megacorps getting stronger/moar powerful and the rich getting richer by the minute, at the sole expense of everyone else ..:eek:..:fear:..:cry:
Posted on Reply
#7
Valantar
bonehead123Sooooo, in other words:

Capitalism 101 at it's finest, and megacorps getting stronger/moar powerful and the rich getting richer by the minute, at the sole expense of everyone else ..:eek:..:fear:..:cry:
Yep, precisely that.
Posted on Reply
#8
Prince Valiant
ValantarThe major tendency in late-stage capitalism is consolidation and monopolization in order to increase profits and keep prices high. Heck, DRAM makers have long been accused of working as a cartel to artificially inflate prices. We've already gone from dozens to a tiny handful of companies in the past two decades. My post was obviously exaggerated, but that kind of movement is more than realistic IMO.
They're been caught doing it enough times that I find it safer to assume it's the status quo. They'll stay in it together whether or not they officially merge.
bonehead123Sooooo, in other words:

Capitalism 101 at it's finest, and megacorps getting stronger/moar powerful and the rich getting richer by the minute, at the sole expense of everyone else ..:eek:..:fear:..:cry:
The system of choice is only a minor affect (at best) on how long it takes the unscrupulous to exploit it.
Posted on Reply
#9
R0H1T
ValantarHeck, DRAM makers have long been accused of working as a cartel to artificially inflate prices.
In case you haven't noticed all major markets operate that way. At least for the last few years! You see raw material prices go down what do the companies do? Jack up their margins! Tax rate goes down, margins go high ~ you go low & they go high. The only exception being Chinese firms which usually have their own "cartel" ~ they've driven local manufacturers here in most electronic sectors out of business & then when entrenched just increased their prices & margins! Prime example being mobile phones ~ remember how much Xiaomi or Oneplus used to cost & now?
Posted on Reply
#10
Valantar
Prince ValiantThey're been caught doing it enough times that I find it safer to assume it's the status quo. They'll stay in it together whether or not they officially merge.
To be clear, that post was a joke. Mergers might happen, but not they way I put it :p
Prince ValiantThe system of choice is only a minor affect (at best) on how long it takes the unscrupulous to exploit it.
That is simply not true. There are major, massive differences between a system that seeks checks on power and exploitation and those that don't. None are immune to exploitation or corruption, obviously, but saying a system designed to have ineffective regulation - such as our current neoliberal, late stage capitalist one - only has a minor effect on the level of exploitation taking place within that system is an extreme misrepresentation of reality.
R0H1TIn case you haven't noticed all major markets operate that way. At least for the last few years! You see raw material prices go down what do the companies do? Jack up their margins! Tax rate goes down, margins go high ~ you go low & they go high. The only exception being Chinese firms which usually have their own "cartel" ~ they've driven local manufacturers here in most electronic sectors out of business & then when entrenched just increased their prices & margins! Prime example being mobile phones ~ remember how much Xiaomi or Oneplus used to cost & now?
Heck, you're preaching to the choir here man.
ValantarIt is real in literally every single sector of business. The easiest way to ensure an advantageous competitive position in an unregulated or effectively unregulated market is to buy or merge with your competitors. When profit is seen as the only goal of business, that's the necessary logical conclusion, and the ongoing, decades-long work towards undermining and hollowing out the hard-fought regulatory systems constructed in the early and mid 20th century has been paying off for a while now.
Posted on Reply
#11
First Strike
What is this thread babbling about....... All industry has its lifecycles. Competition does not simply vanish as long as the instrument of violence remains just. Competition just flows to those fastest growing industries. Phone makers become a**holes not because all their CEOs become a**holes overnight, it's because smartphone industry is becoming an outdated meh compared to 10 years ago.

Back in 1980s, semiconductor device manufacturing are the crown jewel. Then WWW. Then smartphones and UGCs. Then AIs. You can't just restore the glory of an industry that is already behind its days. Just as you can't split a dam into two and let two hydroplant to "compete". It's simply stupid.
Posted on Reply
#12
Steevo
bonehead123Sooooo, in other words:

Capitalism 101 at it's finest, and megacorps getting stronger/moar powerful and the rich getting richer by the minute, at the sole expense of everyone else ..:eek:..:fear:..:cry:
Yes but then we eats them. The rich that is.
Posted on Reply
#13
Wirko
In the end, Samsung, Micron, Hynix, Elpida and others won't merge because demand for RAM is only going to go one way: up. Microsoft, Amazon, Google, Oracle and other cloud providers won't merge into one either, even if they fail to save many millions on RAM by using CXL. Demand for RAM is going up because everyone needs even more advanced AI tools to analyse the spending habits of the ~8 billion humans. And that's what "hyperscalers" are all about, in a nutshell. The humans will be drowned in more targeted ads so they will buy more things that they don't need, and that late stage will be extended, once again, by a few years. Sorry, I tried to get back on topic and even mentioned CXL, without much success.
Posted on Reply
#14
Panther_Seraphin
The problem with this is similar to using flexible memory limits on Hyper-V etc

Its great because you can flexibly use RAM around to maxmise their usage however it can lead to situations where things wont run because the RAM assigned is below what is required to start the process and also if the oversubscribe the node and then those instances DO decide to heavily use their RAM? THe node falls over due to it trying to use more RAM than what it has.

So great for 95% of use cases. Fucks over that other 5%.
Posted on Reply
#15
Dr. Dro
SteevoYes but then we eats them. The rich that is.
I couldn't help but read that in Gollum's voice, haha

WirkoIn the end, Samsung, Micron, Hynix, Elpida and others won't merge because demand for RAM is only going to go one way: up. Microsoft, Amazon, Google, Oracle and other cloud providers won't merge into one either, even if they fail to save many millions on RAM by using CXL. Demand for RAM is going up because everyone needs even more advanced AI tools to analyse the spending habits of the ~8 billion humans. And that's what "hyperscalers" are all about, in a nutshell. The humans will be drowned in more targeted ads so they will buy more things that they don't need, and that late stage will be extended, once again, by a few years. Sorry, I tried to get back on topic and even mentioned CXL, without much success.
Nah, you're right. We pretty much know where these resources are going, though it never ceases to amaze me to see how large the customer data analysis and targeted advertisement empire really is.
Posted on Reply
#16
jeremyshaw
ValantarNext press release: "Samsung, Micron, Hynix, Elpida and all other DRAM makers announce plans to consolidate into single entity in response to challenging market conditions"
A little too late. Elpida has been subsumed for 9 years already.
Posted on Reply
#17
PapaTaipei
ValantarThe major tendency in late-stage capitalism is consolidation and monopolization in order to increase profits and keep prices high. Heck, DRAM makers have long been accused of working as a cartel to artificially inflate prices. We've already gone from dozens to a tiny handful of companies in the past two decades. My post was obviously exaggerated, but that kind of movement is more than realistic IMO.
It's not exaggerated at all, LCD and optic drives manufacturers were all fined massively for being a cartel. Do you guys forget so fast?
Posted on Reply
#18
Valantar
PapaTaipeiIt's not exaggerated at all, LCD and optic drives manufacturers were all fined massively for being a cartel. Do you guys forget so fast?
... Have you read literally anything that has been written in this thread?
Posted on Reply
#19
Wirko
Panther_SeraphinThe problem with this is similar to using flexible memory limits on Hyper-V etc

Its great because you can flexibly use RAM around to maxmise their usage however it can lead to situations where things wont run because the RAM assigned is below what is required to start the process and also if the oversubscribe the node and then those instances DO decide to heavily use their RAM? THe node falls over due to it trying to use more RAM than what it has.

So great for 95% of use cases. Fucks over that other 5%.
Isn't 95% a lot?

Anyway, are modern Windows and Linux OSes, running as VM guests, capable of using RAM that's variable in size instead of fixed?
Posted on Reply
#20
Panther_Seraphin
WirkoIsn't 95% a lot?

Anyway, are modern Windows and Linux OSes, running as VM guests, capable of using RAM that's variable in size instead of fixed?
Yes, I have it primarily set to work like that in my homelab and can see the memory usage rise and fall as needed. It can take a little while to finesse the memory usage as you have to make sure whatever your running doesnt have a memory leak that can impact other things going on with the system.
Posted on Reply
#21
Valantar
Panther_SeraphinThe problem with this is similar to using flexible memory limits on Hyper-V etc

Its great because you can flexibly use RAM around to maxmise their usage however it can lead to situations where things wont run because the RAM assigned is below what is required to start the process and also if the oversubscribe the node and then those instances DO decide to heavily use their RAM? THe node falls over due to it trying to use more RAM than what it has.

So great for 95% of use cases. Fucks over that other 5%.
Panther_SeraphinYes, I have it primarily set to work like that in my homelab and can see the memory usage rise and fall as needed. It can take a little while to finesse the memory usage as you have to make sure whatever your running doesnt have a memory leak that can impact other things going on with the system.
This sounds like exactly the type of thing that explains why there are engineers and technicians working at the places running this type of hardware. I.e.: people aware of the affordances and limitations of the technologies in question and how to configure workloads and setups to optimize their usage. If it saves you significant investments in hardware in 95% of cases, well, then you limit those 5% outlier cases to hardware that's configured differently.
Posted on Reply
#22
Panther_Seraphin
ValantarThis sounds like exactly the type of thing that explains why there are engineers and technicians working at the places running this type of hardware. I.e.: people aware of the affordances and limitations of the technologies in question and how to configure workloads and setups to optimize their usage. If it saves you significant investments in hardware in 95% of cases, well, then you limit those 5% outlier cases to hardware that's configured differently.
100% the problem is that in "the cloud" a lot of this is done via automation so it wont be easy to detect "the outliers" without either human intervention or steep investment in the detection algorithims on how to assign VMs to nodes.

Also it can cause issues for places that want to skimp and save now that then run out of capacity far sooner than intended due to unforseen growth/demand.
Posted on Reply
#23
Valantar
Panther_Seraphin100% the problem is that in "the cloud" a lot of this is done via automation so it wont be easy to detect "the outliers" without either human intervention or steep investment in the detection algorithims on how to assign VMs to nodes.

Also it can cause issues for places that want to skimp and save now that then run out of capacity far sooner than intended due to unforseen growth/demand.
I'm pretty sore both of those challenges are pretty firmly manageable at the scales and budgets these things tend to operate. Will it cause some hiccups? Obviously, all new tech does. But it will still be an improvement in the vast majority of cases by your own estimations, after all.
Posted on Reply
#24
Panther_Seraphin
ValantarI'm pretty sore both of those challenges are pretty firmly manageable at the scales and budgets these things tend to operate. Will it cause some hiccups? Obviously, all new tech does. But it will still be an improvement in the vast majority of cases by your own estimations, after all.
definately, it will take some "retraining" from certain admins that are set in their ways about always setting hard memory limits for VMs going forward.
Posted on Reply
Add your own comment
May 21st, 2024 19:19 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts