Tuesday, July 2nd 2024
Panmnesia Uses CXL Protocol to Expand GPU Memory with Add-in DRAM Card or Even SSD
South Korean startup Panmnesia has unveiled an interesting solution to address the memory limitations of modern GPUs. The company has developed a low-latency Compute Express Link (CXL) IP that could help expand GPU memory with external add-in card. Current GPU-accelerated applications in AI and HPC are constrained by the set amount of memory built into GPUs. With data sizes growing by 3x yearly, GPU networks must keep getting larger just to fit the application in the local memory, benefiting latency and token generation. Panmnesia's proposed approach to fix this leverages the CXL protocol to expand GPU memory capacity using PCIe-connected DRAM or even SSDs. The company has overcome significant technical hurdles, including the absence of CXL logic fabric in GPUs and the limitations of existing unified virtual memory (UVM) systems.
At the heart of Panmnesia's solution is a CXL 3.1-compliant root complex with multiple root ports and a host bridge featuring a host-managed device memory (HDM) decoder. This sophisticated system effectively tricks the GPU's memory subsystem into treating PCIe-connected memory as native system memory. Extensive testing has demonstrated impressive results. Panmnesia's CXL solution, CXL-Opt, achieved two-digit nanosecond round-trip latency, significantly outperforming both UVM and earlier CXL prototypes. In GPU kernel execution tests, CXL-Opt showed execution times up to 3.22 times faster than UVM. Older CXL memory extenders recorded around 250 nanoseconds round trip latency, with CXL-Opt potentially achieving less than 80 nanoseconds. As with CXL, the problem is usually that the memory pools add up latency and performance degrades, while these CXL extenders tend to add to the cost model as well. However, the Panmnesia CXL-Opt could find a use case, and we are waiting to see if anyone adopts this in their infrastructure.Below are some benchmarks by Panmnesia, as well as the architecture of the CXL-Opt.
Sources:
Panmnesia, via Tom's Hardware
At the heart of Panmnesia's solution is a CXL 3.1-compliant root complex with multiple root ports and a host bridge featuring a host-managed device memory (HDM) decoder. This sophisticated system effectively tricks the GPU's memory subsystem into treating PCIe-connected memory as native system memory. Extensive testing has demonstrated impressive results. Panmnesia's CXL solution, CXL-Opt, achieved two-digit nanosecond round-trip latency, significantly outperforming both UVM and earlier CXL prototypes. In GPU kernel execution tests, CXL-Opt showed execution times up to 3.22 times faster than UVM. Older CXL memory extenders recorded around 250 nanoseconds round trip latency, with CXL-Opt potentially achieving less than 80 nanoseconds. As with CXL, the problem is usually that the memory pools add up latency and performance degrades, while these CXL extenders tend to add to the cost model as well. However, the Panmnesia CXL-Opt could find a use case, and we are waiting to see if anyone adopts this in their infrastructure.Below are some benchmarks by Panmnesia, as well as the architecture of the CXL-Opt.
14 Comments on Panmnesia Uses CXL Protocol to Expand GPU Memory with Add-in DRAM Card or Even SSD
Of course, as faster as possible memory interfaces are better.
It must be over two decades, when I socketed additional RAM in my GPU. Not sure if it was Matrox or ATI.
But idea of L4 esque RAM pool for GPU? Killing the premium margin selling pro GPUs? It will not happen on large scale. They will not allow it.
Also, do any modern GPU+CPU architectures exist that can actually share memory between nodes, the way a multi-socket CPU system does?
Sad, since this would almost be a good excuse all its own for Gen6-> PCI-E in the Consumer Market.
Besides adding resources to GPUs and CPUs, being able to address relatively large amounts of prev.-gen. 'surplus' RAM as NVMe-like (RAMdrive) storage/cache would be useful. [Both in the Enthusiast-Consumer world, and Industry]
NtM, if Intel hasn't completely abandoned Optane; they could easily reinvigorate interest.
Offering Intel-licensed Pmem Cards (optionally, utilizing once platform-propietary P-DIMMs) over CXL, would greatly broaden the potential market. Esp. w/ the newfound interests in "AI-ing every-thing" :laugh:
GPU makers should add DDR5 memory slots on the consumer GPU so we can expand memory and still have good latency compared to any on MB solution.
1. The likes of Nvidia will never allow it and they have an iron reign over these AIBs.
2. Such option will deprive them of higher revenue/ profit margin since it allows you buy a lower end model and increase the RAM.
Nvidia could of course stop gimping their GPUs and pretending L2 cache is the answer.
There were some niche use-cases for it, and there were also attempts by some hardcore enthusiasts to try and access it to install games onto.
Would be interesting if AMD could bring it back for newer datacenter Accelerators as well as even for top-level gaming cards, making full use of PCIe 4.0 bandwidth or even PCIe 5.0 bandwidth to either use the SSDs as extra storage or internally to speed up memory use somehow.