• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Advantech Announces CXL 2.0 Memory to Boost Data Center Efficiency

Nomad76

News Editor
Staff member
Joined
May 21, 2024
Messages
738 (3.35/day)
Advantech, a global leader in embedded computing, is excited to announce the release of the SQRAM CXL 2.0 Type 3 Memory Module. Compute Express Link (CXL) 2.0 is the next evolution in memory technology, providing memory expansion with a high-speed, low-latency interconnect designed to meet the demands of large AI Training and HPC clusters. CXL 2.0 builds on the foundation of the original CXL specification, introducing advanced features such as memory sharing, and expansion, enabling more efficient utilization of resources across heterogeneous computing environments.

Memory Expansion via E3.S 2T Form Factor
Traditional memory architectures are often limited by fixed allocations, which can result in underutilized resources and bottlenecks in data-intensive workloads. With the E3.S form factor, based on the EDSFF standard, the CXL 2.0 Memory Module overcomes these limitations, allowing for dynamic resource management. This not only improves performance but reduces costs by maximizing existing resources.



High-Speed Interconnect via PCIe 5.0 Interface
CXL memory module operate over the PCIe 5.0 interface. This high-speed connection ensures that even as memory is expanded across different systems, data transfer remains rapid and efficient. The PCIe 5.0 interface provides up to 32 GT/s per lane, allowing the CXL memory module to deliver the bandwidth necessary for data-intensive applications. Adding more capacity will have better performance and increased memory bandwidth without the need for more servers, and capital expenses.

Memory Pooling
CXL 2.0 enables multiple host to access a shared memory pool, optimizing resource allocation and improving overall system efficiency. Through CXL's memory pooling technology, computing components such as CPUs and accelerators of multiple servers on the same shelf can share memory resources, reducing resource redundancy and solving the problem of low memory utilization.

Hot-Plug and Scalability
CXL memory module can be added or removed from the system without shutting down the server, allowing for on-the-fly memory expansion. For data centers, this translates into the ability to scale memory resources as needed, ensuring optimal performance without disruption.

Key Features
  • EDSFF E3.S 2T form-factor
  • CXL 2.0 is compatible with PCIe-Gen5 speeds running at 32 GT/s
  • Supports ECC error detection and correction
  • PCB: 30μ'' gold finger
  • Operating Environment: 0 ~70°C (Tc)
  • Compliant with CXL 1.1 & CXL 2.0

View at TechPowerUp Main Site | Source
 
Joined
Sep 1, 2020
Messages
2,407 (1.53/day)
Location
Bulgaria
Hmm, capacity like for the data centres? I see only 64GB on one of slides.
 
Joined
Jan 3, 2021
Messages
3,620 (2.49/day)
Location
Slovenia
Processor i5-6600K
Motherboard Asus Z170A
Cooling some cheap Cooler Master Hyper 103 or similar
Memory 16GB DDR4-2400
Video Card(s) IGP
Storage Samsung 850 EVO 250GB
Display(s) 2x Oldell 24" 1920x1200
Case Bitfenix Nova white windowless non-mesh
Audio Device(s) E-mu 1212m PCI
Power Supply Seasonic G-360
Mouse Logitech Marble trackball, never had a mouse
Keyboard Key Tronic KT2000, no Win key because 1994
Software Oldwin
That poor ddr5 is gonna be super starved for bandwidth.
Not that bad. The connector seems to be 8-lane PCIe (look here), and that's 8 lanes in each direction. For applications that read and write heavily at the same time, that's quite an advantage, and single-channel DDR5-5600 couldn't keep up.
 
Joined
Dec 7, 2020
Messages
62 (0.04/day)
One socket Turin:
Chips and Cheese recently tested 12-channel DDR5-6000 MT/s, reaching ~99% of the theoretical 576 GB/s.
Turin offers up to 3 TB in 1 DPC and 6 TB in 2 DPC (in up to 4400 MT/s, so ~422 GB/s) configuration.
128 lanes of PCIe 5.0 / CXL 2.0 offer theoretical ~504 GB/s.
This card offers 64 GB in x8 lanes, so in 128 lanes you get 1 TB. I can definitely see potential for 128 GB, or maybe even 256 GB versions of this card.

Depending on price of this device and RAM pricing (top capacity modules are reportedly very costly) CXL memory may be more cost-effective to roughly double the bandwidth and 1.5x..2x the capacity in, say, 96 lanes per socket (768 GB, ~378 GB/s), leaving the rest for SSDs etc.
It would be nice if someone gave to e.g. Phoronix 12-24 such modules to test how it looks from the system perspective. Can it be configured as "far NUMA node memory" or similar - so transparent memory extension? Or the application would have to be CXL-memory aware? What is total system latency for this memory?
 

Rielle

New Member
Joined
Sep 17, 2024
Messages
3 (0.03/day)
This new CXL 2.0 memory from Advantech seems like a big deal for data centers. I don’t know all the super technical details, but from what I’ve read, it looks like it could really help with efficiency and scaling up without needing to swap out tons of hardware. That would definitely save some money, especially with how fast data loads are growing. I’m interested to see how flexible it is with different memory types because that could be huge for keeping systems running smoothly.
 
Joined
Jan 3, 2021
Messages
3,620 (2.49/day)
Location
Slovenia
Processor i5-6600K
Motherboard Asus Z170A
Cooling some cheap Cooler Master Hyper 103 or similar
Memory 16GB DDR4-2400
Video Card(s) IGP
Storage Samsung 850 EVO 250GB
Display(s) 2x Oldell 24" 1920x1200
Case Bitfenix Nova white windowless non-mesh
Audio Device(s) E-mu 1212m PCI
Power Supply Seasonic G-360
Mouse Logitech Marble trackball, never had a mouse
Keyboard Key Tronic KT2000, no Win key because 1994
Software Oldwin
One socket Turin:
Chips and Cheese recently tested 12-channel DDR5-6000 MT/s, reaching ~99% of the theoretical 576 GB/s.
Turin offers up to 3 TB in 1 DPC and 6 TB in 2 DPC (in up to 4400 MT/s, so ~422 GB/s) configuration.
128 lanes of PCIe 5.0 / CXL 2.0 offer theoretical ~504 GB/s.
This card offers 64 GB in x8 lanes, so in 128 lanes you get 1 TB. I can definitely see potential for 128 GB, or maybe even 256 GB versions of this card.
Actually 64 GB is a ridiculously low amount for what servers need. For 512 GB you have to spend 64 lanes, a lot of physical space, and pay for 8 controller chips (inside those memory modules) when you could spend 1/4 or 1/8 of everything. Bandwidth might then be insufficient but an 8-lane connection could at least be replaced by a 16-lane one, it's all part of the standard.
Depending on price of this device and RAM pricing (top capacity modules are reportedly very costly) CXL memory may be more cost-effective to roughly double the bandwidth and 1.5x..2x the capacity in, say, 96 lanes per socket (768 GB, ~378 GB/s), leaving the rest for SSDs etc.
I'm not sure what the usability of CXL memory is. it won't be more cost effective because it needs additional controllers. One interesting feature is memory pooling, where multiple systems share memory on CXL modules, with coherence and all that. In HPC setups this could serve as a fast communication link between nodes, and in servers it would allow assigning memory to individual nodes dynamically. Special PCIe switches are necessary for that, I haven't seen any announced so far.
It would be nice if someone gave to e.g. Phoronix 12-24 such modules to test how it looks from the system perspective. Can it be configured as "far NUMA node memory" or similar - so transparent memory extension? Or the application would have to be CXL-memory aware? What is total system latency for this memory?
One of the manufacturer stated that the added latency (compared to local RAM) is similar to the latency to reach next NUMA node (maybe they meant a multi-processor system). I don't see a possibility of this being "transparent" to applications. The applications would have to be aware of the slow pool and the fast pool of memory and prioritise their use, which wouldn't be trivial because both are working memory - it's not like the fast pool is a cache for the slow pool.
 
Joined
Dec 7, 2020
Messages
62 (0.04/day)
Actually 64 GB is a ridiculously low amount for what servers need. For 512 GB you have to spend 64 lanes, a lot of physical space, and pay for 8 controller chips (inside those memory modules) when you could spend 1/4 or 1/8 of everything. Bandwidth might then be insufficient but an 8-lane connection could at least be replaced by a 16-lane one, it's all part of the standard.
I can see it both ways: you can have roughly double the bandwidth, even if you don't need the capacity - and it is probably still cheaper than MRDIMMs or other specialty memory. The capacity is what drives the cost; memory in the server may be much more expensive than the 10k+ CPUs.
While I agree on 2x 8 lane device being worse off than 1x 16 lane one, maybe they know their target market and E3.S in 8 lane variant is more common? Like, "in a common platform for 16 SSD slots per socket, use 12 slots per socket for memory expansion"?
I'm not sure what the usability of CXL memory is. it won't be more cost effective because it needs additional controllers. One interesting feature is memory pooling, where multiple systems share memory on CXL modules, with coherence and all that. In HPC setups this could serve as a fast communication link between nodes, and in servers it would allow assigning memory to individual nodes dynamically. Special PCIe switches are necessary for that, I haven't seen any announced so far.
Less cost effective because of additional controllers: probably yes compared to standard, say, 768 GB per 12ch socket, i.e. 64 GB *DIMMs. The price for bigger modules may be disproportionately higher, so equation may change fast.
One of the manufacturer stated that the added latency (compared to local RAM) is similar to the latency to reach next NUMA node (maybe they meant a multi-processor system). I don't see a possibility of this being "transparent" to applications. The applications would have to be aware of the slow pool and the fast pool of memory and prioritise their use, which wouldn't be trivial because both are working memory - it's not like the fast pool is a cache for the slow pool.
Applications "aware of the slow pool and the fast pool of memory": NUMA-aware? There is SNC3 vs HEX on Granite Rapids, NPS4 vs NPS1 on Zens, and HBM plus DDR5 on Sapphire Rapids Xeon Max. The last one is even available in HBM caching mode, so maybe the most relevant:
https://www.phoronix.com/review/xeon-max-hbm2e-amx
 
Top