• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Panmnesia Uses CXL Protocol to Expand GPU Memory with Add-in DRAM Card or Even SSD

AleksandarK

News Editor
Staff member
Joined
Aug 19, 2017
Messages
2,334 (0.93/day)
South Korean startup Panmnesia has unveiled an interesting solution to address the memory limitations of modern GPUs. The company has developed a low-latency Compute Express Link (CXL) IP that could help expand GPU memory with external add-in card. Current GPU-accelerated applications in AI and HPC are constrained by the set amount of memory built into GPUs. With data sizes growing by 3x yearly, GPU networks must keep getting larger just to fit the application in the local memory, benefiting latency and token generation. Panmnesia's proposed approach to fix this leverages the CXL protocol to expand GPU memory capacity using PCIe-connected DRAM or even SSDs. The company has overcome significant technical hurdles, including the absence of CXL logic fabric in GPUs and the limitations of existing unified virtual memory (UVM) systems.

At the heart of Panmnesia's solution is a CXL 3.1-compliant root complex with multiple root ports and a host bridge featuring a host-managed device memory (HDM) decoder. This sophisticated system effectively tricks the GPU's memory subsystem into treating PCIe-connected memory as native system memory. Extensive testing has demonstrated impressive results. Panmnesia's CXL solution, CXL-Opt, achieved two-digit nanosecond round-trip latency, significantly outperforming both UVM and earlier CXL prototypes. In GPU kernel execution tests, CXL-Opt showed execution times up to 3.22 times faster than UVM. Older CXL memory extenders recorded around 250 nanoseconds round trip latency, with CXL-Opt potentially achieving less than 80 nanoseconds. As with CXL, the problem is usually that the memory pools add up latency and performance degrades, while these CXL extenders tend to add to the cost model as well. However, the Panmnesia CXL-Opt could find a use case, and we are waiting to see if anyone adopts this in their infrastructure.



Below are some benchmarks by Panmnesia, as well as the architecture of the CXL-Opt.



View at TechPowerUp Main Site | Source
 

alphaLONE

New Member
Joined
Jan 4, 2023
Messages
15 (0.03/day)
that's at most what, 128GB/s on 16x Gen 5 PCIe? really not much for a big GPU, that's even less than what the RX 6500 XT has.
 
Joined
Jan 2, 2019
Messages
76 (0.04/day)
that's at most what, 128GB/s on 16x Gen 5 PCIe? really not much for a big GPU, that's even less than what the RX 6500 XT has.

That is Not a huge problem. When it comes to Big Data Processing, in HPC, in AI, etc, if a GPU cluster doesn't support a Unified Memory Architecture ( UMA ), when CPUs and GPUs do Not share RAM of a system, developers try to move as bigger as possible chunk of data to the GPU memory and after that do processing that could be a very long ( seconds, minutes, etc ). It means, that too some degree memory bandwidth is less important. It is a very important to do processing with as bigger as possible chunk of data!

Of course, as faster as possible memory interfaces are better.
 
Joined
Jul 8, 2023
Messages
36 (0.10/day)
I think the Phison AI100E / aiDAPTIV+ is more practical for most people, hope to see coverage / testing on that
 
Joined
Nov 18, 2010
Messages
7,264 (1.46/day)
Location
Rīga, Latvia
System Name HELLSTAR
Processor AMD RYZEN 9 5950X
Motherboard ASUS Strix X570-E
Cooling 2x 360 + 280 rads. 3x Gentle Typhoons, 3x Phanteks T30, 2x TT T140 . EK-Quantum Momentum Monoblock.
Memory 4x8GB G.SKILL Trident Z RGB F4-4133C19D-16GTZR 14-16-12-30-44
Video Card(s) Sapphire Pulse RX 7900XTX. Water block. Crossflashed.
Storage Optane 900P[Fedora] + WD BLACK SN850X 4TB + 750 EVO 500GB + 1TB 980PRO+SN560 1TB(W11)
Display(s) Philips PHL BDM3270 + Acer XV242Y
Case Lian Li O11 Dynamic EVO
Audio Device(s) SMSL RAW-MDA1 DAC
Power Supply Fractal Design Newton R3 1000W
Mouse Razer Basilisk
Keyboard Razer BlackWidow V3 - Yellow Switch
Software FEDORA 40
Everthing goes in circles.

It must be over two decades, when I socketed additional RAM in my GPU. Not sure if it was Matrox or ATI.

But idea of L4 esque RAM pool for GPU? Killing the premium margin selling pro GPUs? It will not happen on large scale. They will not allow it.
 
Joined
Jan 3, 2021
Messages
2,908 (2.27/day)
Location
Slovenia
Processor i5-6600K
Motherboard Asus Z170A
Cooling some cheap Cooler Master Hyper 103 or similar
Memory 16GB DDR4-2400
Video Card(s) IGP
Storage Samsung 850 EVO 250GB
Display(s) 2x Oldell 24" 1920x1200
Case Bitfenix Nova white windowless non-mesh
Audio Device(s) E-mu 1212m PCI
Power Supply Seasonic G-360
Mouse Logitech Marble trackball, never had a mouse
Keyboard Key Tronic KT2000, no Win key because 1994
Software Oldwin
But that name ... If someone had asked me yesterday what "panmnesia" means, I'd answer that it's a situation where everyone forgets everything. (Or should that mean everyone except AI?)

That is Not a huge problem. When it comes to Big Data Processing, in HPC, in AI, etc, if a GPU cluster doesn't support a Unified Memory Architecture ( UMA ), when CPUs and GPUs do Not share RAM of a system, developers try to move as bigger as possible chunk of data to the GPU memory and after that do processing that could be a very long ( seconds, minutes, etc ). It means, that too some degree memory bandwidth is less important. It is a very important to do processing with as bigger as possible chunk of data!

Of course, as faster as possible memory interfaces are better.
But this is exactly that, if I understand its purpose well. It's low-latency memory that's shared between nodes, and it becomes part of each GPU's memory space.

Also, do any modern GPU+CPU architectures exist that can actually share memory between nodes, the way a multi-socket CPU system does?
 
Joined
Jan 29, 2012
Messages
6,640 (1.46/day)
Location
Florida
System Name natr0n-PC
Processor Ryzen 5950x-5600x | 9600k
Motherboard B450 AORUS M | Z390 UD
Cooling EK AIO 360 - 6 fan action | AIO
Memory Patriot - Viper Steel DDR4 (B-Die)(4x8GB) | Samsung DDR4 (4x8GB)
Video Card(s) EVGA 3070ti FTW | Sapphire PULSE RX 590
Storage Various
Display(s) Pixio PX279 Prime
Case Thermaltake Level 20 VT | Black bench
Audio Device(s) LOXJIE D10 + Kinter Amp + 6 Bookshelf Speakers Sony+JVC+Sony
Power Supply Super Flower Leadex III ARGB 80+ Gold 650W | EVGA 700 Gold
Software XP/7/8.1/10
Benchmark Scores http://valid.x86.fr/79kuh6
now I can fix this pos 8gb 3070ti and say eat sh!t jensen
 
Joined
Apr 18, 2019
Messages
2,201 (1.15/day)
Location
Olympia, WA
System Name Sleepy Painter
Processor AMD Ryzen 5 3600
Motherboard Asus TuF Gaming X570-PLUS/WIFI
Cooling FSP Windale 6 - Passive
Memory 2x16GB F4-3600C16-16GVKC @ 16-19-21-36-58-1T
Video Card(s) MSI RX580 8GB
Storage 2x Samsung PM963 960GB nVME RAID0, Crucial BX500 1TB SATA, WD Blue 3D 2TB SATA
Display(s) Microboard 32" Curved 1080P 144hz VA w/ Freesync
Case NZXT Gamma Classic Black
Audio Device(s) Asus Xonar D1
Power Supply Rosewill 1KW on 240V@60hz
Mouse Logitech MX518 Legend
Keyboard Red Dragon K552
Software Windows 10 Enterprise 2019 LTSC 1809 17763.1757
Oh sweet summer children... this will never come to the consumer sector. :roll:
Sad, since this would almost be a good excuse all its own for Gen6-> PCI-E in the Consumer Market.

Besides adding resources to GPUs and CPUs, being able to address relatively large amounts of prev.-gen. 'surplus' RAM as NVMe-like (RAMdrive) storage/cache would be useful. [Both in the Enthusiast-Consumer world, and Industry]

NtM, if Intel hasn't completely abandoned Optane; they could easily reinvigorate interest.
Offering Intel-licensed Pmem Cards (optionally, utilizing once platform-propietary P-DIMMs) over CXL, would greatly broaden the potential market. Esp. w/ the newfound interests in "AI-ing every-thing" :laugh:
 
Last edited:
Joined
May 3, 2018
Messages
2,525 (1.12/day)
Will never be supported for desktop dGPUs so forget it, and it's also not coming any time soon.

GPU makers should add DDR5 memory slots on the consumer GPU so we can expand memory and still have good latency compared to any on MB solution.
 
Joined
Jun 3, 2008
Messages
544 (0.09/day)
Location
Pacific Coast
System Name Z77 Rev. 1
Processor Intel Core i7 3770K
Motherboard ASRock Z77 Extreme4
Cooling Water Cooling
Memory 2x G.Skill F3-2400C10D-16GTX
Video Card(s) EVGA GTX 1080
Storage Samsung 850 Pro
Display(s) Samsung 28" UE590 UHD
Case Silverstone TJ07
Audio Device(s) Onboard
Power Supply Seasonic PRIME 600W Titanium
Mouse EVGA TORQ X10
Keyboard Leopold Tenkeyless
Software Windows 10 Pro 64-bit
Benchmark Scores 3DMark Time Spy: 7695
That is Not a huge problem. When it comes to Big Data Processing, in HPC, in AI, etc, if a GPU cluster doesn't support a Unified Memory Architecture ( UMA ), when CPUs and GPUs do Not share RAM of a system, developers try to move as bigger as possible chunk of data to the GPU memory and after that do processing that could be a very long ( seconds, minutes, etc ). It means, that too some degree memory bandwidth is less important. It is a very important to do processing with as bigger as possible chunk of data!
There's a problem with your explanation. You say its not a big problem to move one big chunk slowly once, because then the data is on the GPU to be processed there. This is different. This is one big chunk next to the GPU, which will then be processed in many small chunks over the slow bus. It's effectively moving the data around on the slow bus constantly, because this is a product designed to be used with GPUs which don't have enough onboard VRAM.
 
Joined
Mar 28, 2020
Messages
1,675 (1.07/day)
Will never be supported for desktop dGPUs so forget it, and it's also not coming any time soon.

GPU makers should add DDR5 memory slots on the consumer GPU so we can expand memory and still have good latency compared to any on MB solution.
I don't think this will ever happen because,
1. The likes of Nvidia will never allow it and they have an iron reign over these AIBs.
2. Such option will deprive them of higher revenue/ profit margin since it allows you buy a lower end model and increase the RAM.
 
Joined
May 3, 2018
Messages
2,525 (1.12/day)
I don't think this will ever happen because,
1. The likes of Nvidia will never allow it and they have an iron reign over these AIBs.
2. Such option will deprive them of higher revenue/ profit margin since it allows you buy a lower end model and increase the RAM.
Oh indeed, but I can dream and it would be a simple option for consumer GPUs. This CXL stuff is for workstation+ class GPU.

Nvidia could of course stop gimping their GPUs and pretending L2 cache is the answer.
 
Last edited:
Joined
Feb 18, 2023
Messages
233 (0.46/day)
Everthing goes in circles.

It must be over two decades, when I socketed additional RAM in my GPU. Not sure if it was Matrox or ATI.

But idea of L4 esque RAM pool for GPU? Killing the premium margin selling pro GPUs? It will not happen on large scale. They will not allow it.
That was in 1997 (in my case) when I added RAM to my ATI GPU back then.
 
Joined
Jul 7, 2019
Messages
877 (0.48/day)
Everthing goes in circles.

It must be over two decades, when I socketed additional RAM in my GPU. Not sure if it was Matrox or ATI.

But idea of L4 esque RAM pool for GPU? Killing the premium margin selling pro GPUs? It will not happen on large scale. They will not allow it.

For a short period of time, AMD also experimented with their Radeon Pro SSG cards, which included a user-upgradable NVMe drive and provided up to 2TB worth of video card memory.

There were some niche use-cases for it, and there were also attempts by some hardcore enthusiasts to try and access it to install games onto.

Would be interesting if AMD could bring it back for newer datacenter Accelerators as well as even for top-level gaming cards, making full use of PCIe 4.0 bandwidth or even PCIe 5.0 bandwidth to either use the SSDs as extra storage or internally to speed up memory use somehow.
 
Top