Monday, October 14th 2024

Advantech Announces CXL 2.0 Memory to Boost Data Center Efficiency

Advantech, a global leader in embedded computing, is excited to announce the release of the SQRAM CXL 2.0 Type 3 Memory Module. Compute Express Link (CXL) 2.0 is the next evolution in memory technology, providing memory expansion with a high-speed, low-latency interconnect designed to meet the demands of large AI Training and HPC clusters. CXL 2.0 builds on the foundation of the original CXL specification, introducing advanced features such as memory sharing, and expansion, enabling more efficient utilization of resources across heterogeneous computing environments.

Memory Expansion via E3.S 2T Form Factor
Traditional memory architectures are often limited by fixed allocations, which can result in underutilized resources and bottlenecks in data-intensive workloads. With the E3.S form factor, based on the EDSFF standard, the CXL 2.0 Memory Module overcomes these limitations, allowing for dynamic resource management. This not only improves performance but reduces costs by maximizing existing resources.
High-Speed Interconnect via PCIe 5.0 Interface
CXL memory module operate over the PCIe 5.0 interface. This high-speed connection ensures that even as memory is expanded across different systems, data transfer remains rapid and efficient. The PCIe 5.0 interface provides up to 32 GT/s per lane, allowing the CXL memory module to deliver the bandwidth necessary for data-intensive applications. Adding more capacity will have better performance and increased memory bandwidth without the need for more servers, and capital expenses.

Memory Pooling
CXL 2.0 enables multiple host to access a shared memory pool, optimizing resource allocation and improving overall system efficiency. Through CXL's memory pooling technology, computing components such as CPUs and accelerators of multiple servers on the same shelf can share memory resources, reducing resource redundancy and solving the problem of low memory utilization.

Hot-Plug and Scalability
CXL memory module can be added or removed from the system without shutting down the server, allowing for on-the-fly memory expansion. For data centers, this translates into the ability to scale memory resources as needed, ensuring optimal performance without disruption.

Key Features
  • EDSFF E3.S 2T form-factor
  • CXL 2.0 is compatible with PCIe-Gen5 speeds running at 32 GT/s
  • Supports ECC error detection and correction
  • PCB: 30μ'' gold finger
  • Operating Environment: 0 ~70°C (Tc)
  • Compliant with CXL 1.1 & CXL 2.0
Source: Advantech
Add your own comment

9 Comments on Advantech Announces CXL 2.0 Memory to Boost Data Center Efficiency

#1
TumbleGeorge
Hmm, capacity like for the data centres? I see only 64GB on one of slides.
Posted on Reply
#2
Nomad76
News Editor
TumbleGeorgeHmm, capacity like for the data centres? I see only 64GB on one of slides.
For the moment, yes.., a single 64GB model, however, I presume next year there will be other options
Posted on Reply
#3
Nephilim666
That poor ddr5 is gonna be super starved for bandwidth.
Posted on Reply
#4
Wirko
Nephilim666That poor ddr5 is gonna be super starved for bandwidth.
Not that bad. The connector seems to be 8-lane PCIe (lookhere), and that's 8 lanes in each direction. For applications that read and write heavily at the same time, that's quite an advantage, and single-channel DDR5-5600 couldn't keep up.
Posted on Reply
#5
Minus Infinity
Nephilim666That poor ddr5 is gonna be super starved for bandwidth.
32GT/s vs 9GT/s max for desktop DDR5
Posted on Reply
#6
Lianna
One socket Turin:
Chips and Cheese recently tested 12-channel DDR5-6000 MT/s, reaching ~99% of the theoretical 576 GB/s.
Turin offers up to 3 TB in 1 DPC and 6 TB in 2 DPC (in up to 4400 MT/s, so ~422 GB/s) configuration.
128 lanes of PCIe 5.0 / CXL 2.0 offer theoretical ~504 GB/s.
This card offers 64 GB in x8 lanes, so in 128 lanes you get 1 TB. I can definitely see potential for 128 GB, or maybe even 256 GB versions of this card.

Depending on price of this device and RAM pricing (top capacity modules are reportedly very costly) CXL memory may be more cost-effective to roughly double the bandwidth and 1.5x..2x the capacity in, say, 96 lanes per socket (768 GB, ~378 GB/s), leaving the rest for SSDs etc.
It would be nice if someone gave to e.g. Phoronix 12-24 such modules to test how it looks from the system perspective. Can it be configured as "far NUMA node memory" or similar - so transparent memory extension? Or the application would have to be CXL-memory aware? What is total system latency for this memory?
Posted on Reply
#7
Rielle
This new CXL 2.0 memory from Advantech seems like a big deal for data centers. I don’t know all the super technical details, but from what I’ve read, it looks like it could really help with efficiency and scaling up without needing to swap out tons of hardware. That would definitely save some money, especially with how fast data loads are growing. I’m interested to see how flexible it is with different memory types because that could be huge for keeping systems running smoothly.
Posted on Reply
#8
Wirko
LiannaOne socket Turin:
Chips and Cheese recently tested 12-channel DDR5-6000 MT/s, reaching ~99% of the theoretical 576 GB/s.
Turin offers up to 3 TB in 1 DPC and 6 TB in 2 DPC (in up to 4400 MT/s, so ~422 GB/s) configuration.
128 lanes of PCIe 5.0 / CXL 2.0 offer theoretical ~504 GB/s.
This card offers 64 GB in x8 lanes, so in 128 lanes you get 1 TB. I can definitely see potential for 128 GB, or maybe even 256 GB versions of this card.
Actually 64 GB is a ridiculously low amount for what servers need. For 512 GB you have to spend 64 lanes, a lot of physical space, and pay for 8 controller chips (inside those memory modules) when you could spend 1/4 or 1/8 of everything. Bandwidth might then be insufficient but an 8-lane connection could at least be replaced by a 16-lane one, it's all part of the standard.
LiannaDepending on price of this device and RAM pricing (top capacity modules are reportedly very costly) CXL memory may be more cost-effective to roughly double the bandwidth and 1.5x..2x the capacity in, say, 96 lanes per socket (768 GB, ~378 GB/s), leaving the rest for SSDs etc.
I'm not sure what the usability of CXL memory is. it won't be more cost effective because it needs additional controllers. One interesting feature is memory pooling, where multiple systems share memory on CXL modules, with coherence and all that. In HPC setups this could serve as a fast communication link between nodes, and in servers it would allow assigning memory to individual nodes dynamically. Special PCIe switches are necessary for that, I haven't seen any announced so far.
LiannaIt would be nice if someone gave to e.g. Phoronix 12-24 such modules to test how it looks from the system perspective. Can it be configured as "far NUMA node memory" or similar - so transparent memory extension? Or the application would have to be CXL-memory aware? What is total system latency for this memory?
One of the manufacturer stated that the added latency (compared to local RAM) is similar to the latency to reach next NUMA node (maybe they meant a multi-processor system). I don't see a possibility of this being "transparent" to applications. The applications would have to be aware of the slow pool and the fast pool of memory and prioritise their use, which wouldn't be trivial because both are working memory - it's not like the fast pool is a cache for the slow pool.
Posted on Reply
#9
Lianna
WirkoActually 64 GB is a ridiculously low amount for what servers need. For 512 GB you have to spend 64 lanes, a lot of physical space, and pay for 8 controller chips (inside those memory modules) when you could spend 1/4 or 1/8 of everything. Bandwidth might then be insufficient but an 8-lane connection could at least be replaced by a 16-lane one, it's all part of the standard.
I can see it both ways: you can have roughly double the bandwidth, even if you don't need the capacity - and it is probably still cheaper than MRDIMMs or other specialty memory. The capacity is what drives the cost; memory in the server may be much more expensive than the 10k+ CPUs.
While I agree on 2x 8 lane device being worse off than 1x 16 lane one, maybe they know their target market and E3.S in 8 lane variant is more common? Like, "in a common platform for 16 SSD slots per socket, use 12 slots per socket for memory expansion"?
WirkoI'm not sure what the usability of CXL memory is. it won't be more cost effective because it needs additional controllers. One interesting feature is memory pooling, where multiple systems share memory on CXL modules, with coherence and all that. In HPC setups this could serve as a fast communication link between nodes, and in servers it would allow assigning memory to individual nodes dynamically. Special PCIe switches are necessary for that, I haven't seen any announced so far.
Less cost effective because of additional controllers: probably yes compared to standard, say, 768 GB per 12ch socket, i.e. 64 GB *DIMMs. The price for bigger modules may be disproportionately higher, so equation may change fast.
WirkoOne of the manufacturer stated that the added latency (compared to local RAM) is similar to the latency to reach next NUMA node (maybe they meant a multi-processor system). I don't see a possibility of this being "transparent" to applications. The applications would have to be aware of the slow pool and the fast pool of memory and prioritise their use, which wouldn't be trivial because both are working memory - it's not like the fast pool is a cache for the slow pool.
Applications "aware of the slow pool and the fast pool of memory": NUMA-aware? There is SNC3 vs HEX on Granite Rapids, NPS4 vs NPS1 on Zens, and HBM plus DDR5 on Sapphire Rapids Xeon Max. The last one is even available in HBM caching mode, so maybe the most relevant:
www.phoronix.com/review/xeon-max-hbm2e-amx
Posted on Reply
Nov 21st, 2024 11:34 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts