Wednesday, July 13th 2022

CXL Memory Pooling will Save Millions in DRAM Cost

Jul 13th, 2022 06:34 Discuss (24 Comments)

Hyperscalers such as Microsoft, Google, Amazon, etc., all run their cloud divisions with a specific goal. To provide their hardware to someone else in a form called instance and have the user pay for it by the hour. However, instances are usually bound by a specific CPU and memory configuration, which you can not configure yourself. But instead, you can only choose from the few available options that are listed. For example, when selecting one virtual CPU core, you get two GB of RAM and can go as high as you want with CPU cores. However, the available RAM will also double, even though you might not need it. When renting an instance, the allocated CPU cores and memory are yours until the instance is turned off.

And it is precisely this that hyperscalers are dealing with. Many instances don't fully utilize their DRAM, making the whole data center usage inefficient. Microsoft Azure, one of the largest cloud providers, measured that 50% of all VMs never touch 50% of their rented memory. This makes memory stranded in a rented VM, making it unusable for anything else.

At Azure, we find that a major contributor to DRAM inefficiency is platform-level memory stranding. Memory stranding occurs when a server's cores are fully rented to virtual machines (VMs), but unrented memory remains. With the cores exhausted, the remaining memory is unrentable on its own, and is thus stranded. Surprisingly, we find that up to 25% of DRAM may become stranded at any given moment.

To achieve better results, we have to turn to mainframe designs and copy their behavior. The memory pooling concept is designed to allow the CPU to access as much memory as it needs without occupying and stranding DRAM in VMs that don't need it. Backing this up is the new CXL protocol for cache coherency, which every major hardware provider is including in their offering. Having a data center with CXL hardware allows companies like Microsoft to reduce costs. As the company notes, "[memory] disaggregation can achieve a 9 - 10% reduction in overall DRAM, which represents hundreds of millions of dollars in cost savings for a large cloud provider."

Microsoft estimates that the use of CXL and memory pooling will cut data center costs by 4-5% server costs. This is a significant number, as DRAM alone consumes more than 50% of server costs.

As the performance is concerned, the Azure team benchmarked a few configurations that use local DRAM and pooled DRAM to achieve the best results. The performance penalty for using pooled memory depended on the application. However, we know that accessing pooled memory required an additional 67-87 ns (nanoseconds) latency. This is quite a significant performance hit, resulting in a slowdown of a few applications. About 20% of applications receive no perfromance hit from pooled memory; 23% of applications receive less than 5% slowdown; 25% experience more than 20% slowdown; while 12% experienced more than 30% downshift. Performance figures can be seen below.

According to Microsoft, this is only the first generation testing conducted on the first wave of CXL hardware. Results are promising as they reduce cloud costs for the hyperscaler. With the next-generation hardware and CXL protocol specifications, we could experience much better behavior. For more information, please refer to the paper that examines this in greater detail.

Source: SemiAnalysis

Add your own comment

24 Comments on CXL Memory Pooling will Save Millions in DRAM Cost

Valantar

Next press release: "Samsung, Micron, Hynix, Elpida and all other DRAM makers announce plans to consolidate into single entity in response to challenging market conditions"

Blaeza

ValantarNext press release: "Samsung, Micron, Hynix, Elpida and all other DRAM makers announce plans to consolidate into single entity in response to challenging market conditions"

Not in our lifetimes...

Valantar

BlaezaNot in our lifetimes...

The major tendency in late-stage capitalism is consolidation and monopolization in order to increase profits and keep prices high. Heck, DRAM makers have long been accused of working as a cartel to artificially inflate prices. We've already gone from dozens to a tiny handful of companies in the past two decades. My post was obviously exaggerated, but that kind of movement is more than realistic IMO.

Blaeza

ValantarThe major tendency in late-stage capitalism is consolidation and monopolization in order to increase profits and keep prices high. Heck, DRAM makers have long been accused of working as a cartel to artificially inflate prices. We've already gone from dozens to a tiny handful of companies in the past two decades. My post was obviously exaggerated, but that kind of movement is more than realistic IMO.

I read it as a big dollop of sarcasm... I've been in the PC game way too short. Didn't realise the monopoly game was real in PC land.

Valantar

BlaezaI read it as a big dollop of sarcasm... I've been in the PC game way too short. Didn't realise the monopoly game was real in PC land.

It is real in literally every single sector of business. The easiest way to ensure an advantageous competitive position in an unregulated or effectively unregulated market is to buy or merge with your competitors. When profit is seen as the only goal of business, that's the necessary logical conclusion, and the ongoing, decades-long work towards undermining and hollowing out the hard-fought regulatory systems constructed in the early and mid 20th century has been paying off for a while now.

bonehead123

ValantarIt is real in literally every single sector of business. The easiest way to ensure an advantageous competitive position in an unregulated or effectively unregulated market is to buy or merge with your competitors. When profit is seen as the only goal of business, that's the necessary logical conclusion, and the ongoing, decades-long work towards undermining and hollowing out the hard-fought regulatory systems constructed in the early and mid 20th century has been paying off for a while now.

Sooooo, in other words:

Capitalism 101 at it's finest, and megacorps getting stronger/moar powerful and the rich getting richer by the minute, at the sole expense of everyone else ..:eek:..:fear:..:cry:

Valantar

bonehead123Sooooo, in other words:

Capitalism 101 at it's finest, and megacorps getting stronger/moar powerful and the rich getting richer by the minute, at the sole expense of everyone else ..:eek:..:fear:..:cry:

Yep, precisely that.

Prince Valiant

ValantarThe major tendency in late-stage capitalism is consolidation and monopolization in order to increase profits and keep prices high. Heck, DRAM makers have long been accused of working as a cartel to artificially inflate prices. We've already gone from dozens to a tiny handful of companies in the past two decades. My post was obviously exaggerated, but that kind of movement is more than realistic IMO.

They're been caught doing it enough times that I find it safer to assume it's the status quo. They'll stay in it together whether or not they officially merge.

bonehead123Sooooo, in other words:

Capitalism 101 at it's finest, and megacorps getting stronger/moar powerful and the rich getting richer by the minute, at the sole expense of everyone else ..:eek:..:fear:..:cry:

The system of choice is only a minor affect (at best) on how long it takes the unscrupulous to exploit it.

R0H1T

ValantarHeck, DRAM makers have long been accused of working as a cartel to artificially inflate prices.

In case you haven't noticed all major markets operate that way. At least for the last few years! You see raw material prices go down what do the companies do? Jack up their margins! Tax rate goes down, margins go high ~ you go low & they go high. The only exception being Chinese firms which usually have their own "cartel" ~ they've driven local manufacturers here in most electronic sectors out of business & then when entrenched just increased their prices & margins! Prime example being mobile phones ~ remember how much Xiaomi or Oneplus used to cost & now?

#10

Valantar

Prince ValiantThey're been caught doing it enough times that I find it safer to assume it's the status quo. They'll stay in it together whether or not they officially merge.

To be clear, that post was a joke. Mergers might happen, but not they way I put it :p

Prince ValiantThe system of choice is only a minor affect (at best) on how long it takes the unscrupulous to exploit it.

That is simply not true. There are major, massive differences between a system that seeks checks on power and exploitation and those that don't. None are immune to exploitation or corruption, obviously, but saying a system designed to have ineffective regulation - such as our current neoliberal, late stage capitalist one - only has a minor effect on the level of exploitation taking place within that system is an extreme misrepresentation of reality.

R0H1TIn case you haven't noticed all major markets operate that way. At least for the last few years! You see raw material prices go down what do the companies do? Jack up their margins! Tax rate goes down, margins go high ~ you go low & they go high. The only exception being Chinese firms which usually have their own "cartel" ~ they've driven local manufacturers here in most electronic sectors out of business & then when entrenched just increased their prices & margins! Prime example being mobile phones ~ remember how much Xiaomi or Oneplus used to cost & now?

Heck, you're preaching to the choir here man.

ValantarIt is real in literally every single sector of business. The easiest way to ensure an advantageous competitive position in an unregulated or effectively unregulated market is to buy or merge with your competitors. When profit is seen as the only goal of business, that's the necessary logical conclusion, and the ongoing, decades-long work towards undermining and hollowing out the hard-fought regulatory systems constructed in the early and mid 20th century has been paying off for a while now.

#11

First Strike

What is this thread babbling about....... All industry has its lifecycles. Competition does not simply vanish as long as the instrument of violence remains just. Competition just flows to those fastest growing industries. Phone makers become a**holes not because all their CEOs become a**holes overnight, it's because smartphone industry is becoming an outdated meh compared to 10 years ago.

Back in 1980s, semiconductor device manufacturing are the crown jewel. Then WWW. Then smartphones and UGCs. Then AIs. You can't just restore the glory of an industry that is already behind its days. Just as you can't split a dam into two and let two hydroplant to "compete". It's simply stupid.

#12

Steevo

bonehead123Sooooo, in other words:

Capitalism 101 at it's finest, and megacorps getting stronger/moar powerful and the rich getting richer by the minute, at the sole expense of everyone else ..:eek:..:fear:..:cry:

Yes but then we eats them. The rich that is.

#13

Wirko

In the end, Samsung, Micron, Hynix, Elpida and others won't merge because demand for RAM is only going to go one way: up. Microsoft, Amazon, Google, Oracle and other cloud providers won't merge into one either, even if they fail to save many millions on RAM by using CXL. Demand for RAM is going up because everyone needs even more advanced AI tools to analyse the spending habits of the ~8 billion humans. And that's what "hyperscalers" are all about, in a nutshell. The humans will be drowned in more targeted ads so they will buy more things that they don't need, and that late stage will be extended, once again, by a few years. Sorry, I tried to get back on topic and even mentioned CXL, without much success.

#14

Panther_Seraphin

The problem with this is similar to using flexible memory limits on Hyper-V etc

Its great because you can flexibly use RAM around to maxmise their usage however it can lead to situations where things wont run because the RAM assigned is below what is required to start the process and also if the oversubscribe the node and then those instances DO decide to heavily use their RAM? THe node falls over due to it trying to use more RAM than what it has.

So great for 95% of use cases. Fucks over that other 5%.

#15

Dr. Dro

SteevoYes but then we eats them. The rich that is.

I couldn't help but read that in Gollum's voice, haha

WirkoIn the end, Samsung, Micron, Hynix, Elpida and others won't merge because demand for RAM is only going to go one way: up. Microsoft, Amazon, Google, Oracle and other cloud providers won't merge into one either, even if they fail to save many millions on RAM by using CXL. Demand for RAM is going up because everyone needs even more advanced AI tools to analyse the spending habits of the ~8 billion humans. And that's what "hyperscalers" are all about, in a nutshell. The humans will be drowned in more targeted ads so they will buy more things that they don't need, and that late stage will be extended, once again, by a few years. Sorry, I tried to get back on topic and even mentioned CXL, without much success.

Nah, you're right. We pretty much know where these resources are going, though it never ceases to amaze me to see how large the customer data analysis and targeted advertisement empire really is.

#16

jeremyshaw

ValantarNext press release: "Samsung, Micron, Hynix, Elpida and all other DRAM makers announce plans to consolidate into single entity in response to challenging market conditions"

A little too late. Elpida has been subsumed for 9 years already.

#17

PapaTaipei

ValantarThe major tendency in late-stage capitalism is consolidation and monopolization in order to increase profits and keep prices high. Heck, DRAM makers have long been accused of working as a cartel to artificially inflate prices. We've already gone from dozens to a tiny handful of companies in the past two decades. My post was obviously exaggerated, but that kind of movement is more than realistic IMO.

It's not exaggerated at all, LCD and optic drives manufacturers were all fined massively for being a cartel. Do you guys forget so fast?

#18

Valantar

PapaTaipeiIt's not exaggerated at all, LCD and optic drives manufacturers were all fined massively for being a cartel. Do you guys forget so fast?

... Have you read literally anything that has been written in this thread?

#19

Wirko

Panther_SeraphinThe problem with this is similar to using flexible memory limits on Hyper-V etc

Its great because you can flexibly use RAM around to maxmise their usage however it can lead to situations where things wont run because the RAM assigned is below what is required to start the process and also if the oversubscribe the node and then those instances DO decide to heavily use their RAM? THe node falls over due to it trying to use more RAM than what it has.

So great for 95% of use cases. Fucks over that other 5%.

Isn't 95% a lot?

Anyway, are modern Windows and Linux OSes, running as VM guests, capable of using RAM that's variable in size instead of fixed?

#20

Panther_Seraphin

WirkoIsn't 95% a lot?

Anyway, are modern Windows and Linux OSes, running as VM guests, capable of using RAM that's variable in size instead of fixed?

Yes, I have it primarily set to work like that in my homelab and can see the memory usage rise and fall as needed. It can take a little while to finesse the memory usage as you have to make sure whatever your running doesnt have a memory leak that can impact other things going on with the system.

#21

Valantar

Panther_SeraphinThe problem with this is similar to using flexible memory limits on Hyper-V etc

Its great because you can flexibly use RAM around to maxmise their usage however it can lead to situations where things wont run because the RAM assigned is below what is required to start the process and also if the oversubscribe the node and then those instances DO decide to heavily use their RAM? THe node falls over due to it trying to use more RAM than what it has.

So great for 95% of use cases. Fucks over that other 5%.

Panther_SeraphinYes, I have it primarily set to work like that in my homelab and can see the memory usage rise and fall as needed. It can take a little while to finesse the memory usage as you have to make sure whatever your running doesnt have a memory leak that can impact other things going on with the system.

This sounds like exactly the type of thing that explains why there are engineers and technicians working at the places running this type of hardware. I.e.: people aware of the affordances and limitations of the technologies in question and how to configure workloads and setups to optimize their usage. If it saves you significant investments in hardware in 95% of cases, well, then you limit those 5% outlier cases to hardware that's configured differently.

#22

Panther_Seraphin

ValantarThis sounds like exactly the type of thing that explains why there are engineers and technicians working at the places running this type of hardware. I.e.: people aware of the affordances and limitations of the technologies in question and how to configure workloads and setups to optimize their usage. If it saves you significant investments in hardware in 95% of cases, well, then you limit those 5% outlier cases to hardware that's configured differently.

100% the problem is that in "the cloud" a lot of this is done via automation so it wont be easy to detect "the outliers" without either human intervention or steep investment in the detection algorithims on how to assign VMs to nodes.

Also it can cause issues for places that want to skimp and save now that then run out of capacity far sooner than intended due to unforseen growth/demand.

#23

Valantar

Panther_Seraphin100% the problem is that in "the cloud" a lot of this is done via automation so it wont be easy to detect "the outliers" without either human intervention or steep investment in the detection algorithims on how to assign VMs to nodes.

Also it can cause issues for places that want to skimp and save now that then run out of capacity far sooner than intended due to unforseen growth/demand.

I'm pretty sore both of those challenges are pretty firmly manageable at the scales and budgets these things tend to operate. Will it cause some hiccups? Obviously, all new tech does. But it will still be an improvement in the vast majority of cases by your own estimations, after all.

#24

Panther_Seraphin

ValantarI'm pretty sore both of those challenges are pretty firmly manageable at the scales and budgets these things tend to operate. Will it cause some hiccups? Obviously, all new tech does. But it will still be an improvement in the vast majority of cases by your own estimations, after all.

definately, it will take some "retraining" from certain admins that are set in their ways about always setting hard memory limits for VMs going forward.

Add your own comment

CXL Memory Pooling will Save Millions in DRAM Cost

24 Comments on CXL Memory Pooling will Save Millions in DRAM Cost

Latest GPU Drivers

New Forum Posts

Popular Reviews

Controversial News Posts

CXL Memory Pooling will Save Millions in DRAM Cost

Related News

24 Comments on CXL Memory Pooling will Save Millions in DRAM Cost

Latest GPU Drivers

New Forum Posts

Popular Reviews

Controversial News Posts