Tuesday, April 9th 2019

Intel Reveals the "What" and "Why" of CXL Interconnect, its Answer to NVLink

CXL, short for Compute Express Link, is an ambitious new interconnect technology for removable high-bandwidth devices, such as GPU-based compute accelerators, in a data-center environment. It is designed to overcome many of the technical limitations of PCI-Express, the least of which is bandwidth. Intel sensed that its upcoming family of scalable compute accelerators under the Xe band need a specialized interconnect, which Intel wants to push as the next industry standard. The development of CXL is also triggered by compute accelerator majors NVIDIA and AMD already having similar interconnects of their own, NVLink and InfinityFabric, respectively. At a dedicated event dubbed "Interconnect Day 2019," Intel put out a technical presentation that spelled out the nuts and bolts of CXL.

Intel began by describing why the industry needs CXL, and why PCI-Express (PCIe) doesn't suit its use-case. For a client-segment device, PCIe is perfect, since client-segment machines don't have too many devices, too large memory, and the applications don't have a very large memory footprint or scale across multiple machines. PCIe fails big in the data-center, when dealing with multiple bandwidth-hungry devices and vast shared memory pools. Its biggest shortcoming is isolated memory pools for each device, and inefficient access mechanisms. Resource-sharing is almost impossible. Sharing operands and data between multiple devices, such as two GPU accelerators working on a problem, is very inefficient. And lastly, there's latency, lots of it. Latency is the biggest enemy of shared memory pools that span across multiple physical machines. CXL is designed to overcome many of these problems without discarding the best part about PCIe - the simplicity and adaptability of its physical layer.
CXL uses the PCIe physical layer, and has raw on-paper bandwidth of 32 Gbps per lane, per direction, which aligns with PCIe gen 5.0 standard. The link layer is where all the secret-sauce is. Intel worked on new handshake, auto-negotiation, and transaction protocols replacing those of PCIe, designed to overcome its shortcomings listed above. With PCIe gen 5.0 already standardized by the PCI-SIG, Intel could share CXL IP back to the SIG with PCIe gen 6.0. In other words, Intel admits that CXL may not outlive PCIe, and until the PCI-SIG can standardize gen 6.0 (around 2021-22, if not later), CXL is the need of the hour.
The CXL transaction layer consists of three multiplexed sub-protocols that run simultaneously on a single link. They are: CXL.io, CXL.cache, and CXL.memory. CXL.io deals with device discovery, link negotiation, interrupts, registry access, etc., which are basically tasks that get a machine to work with a device. CXL.cache deals with the device's access to a local processor's memory. CXL.memory deals with processor's access to non-local memory (memory controlled by another processor or another machine).

Intel listed out use-cases for CXL, which begins with accelerators with memory, such as graphics cards, GPU compute accelerators, and high-density compute cards. All three CXL transaction layer protocols are relevant to such devices. Next up, are FPGAs, and NICs. CXL.io and CXL.cache are relevant here, since network-stacks are processed by processors local to the NIC. Lastly, there are the all-important memory buffers. You can imagine these devices as "NAS, but with DRAM sticks." Future data-centers will consist of vast memory pools shared between thousands of physical machines and accelerators. CXL.memory and CXL.cache are relevant. Much of what makes the CXL link-layer faster than PCIe is its optimized stack (processing load for the CPU). The CXL stack is built from the ground up keeping low-latency as a design goal.
Source: Serve the Home
Add your own comment

37 Comments on Intel Reveals the "What" and "Why" of CXL Interconnect, its Answer to NVLink

#26
SRB151
SoNic67There is some level of anti-Intel obsession here. Like Intel owes something to anybody, meanwhile nVidia and AMD proprietary solutions are looked as "meah, nothing to see, look away". Yes CCIX is AMD's baby, and others are "contributors".
CXL, besides Intel, has already gained a lot of support from other big names interested in computing, so put that in perspective:
www.computeexpresslink.org/members

ARM, Google, Cisco, Facebook, alibaba, Dell, HP, Huawei, Lenovo, Microsoft, Microchip... they are all into giving Intel free money???
A standard is as strong as the money behind it and the adoption by industry. Better standard (by al measures) will win.
I think you missed your own point. Intel locks down their tech all the time. Thunderbolt being the latest. They have a proprietary 200g network adapter as well as a long history of this. While AMD is the opposite. Freesync, opencl support, opengl, Tressfx, etc. I don't care who makes it, as long as it is truly open. AMD has a good record of going open. Intel, not so much. I do admire your faith that this will definitely and without question be the first time without strings attached. I just don't share it.
Posted on Reply
#27
bug
SRB151I think you missed your own point. Intel locks down their tech all the time. Thunderbolt being the latest. They have a proprietary 200g network adapter as well as a long history of this. While AMD is the opposite. Freesync, opencl support, opengl, Tressfx, etc. I don't care who makes it, as long as it is truly open. AMD has a good record of going open. Intel, not so much. I do admire your faith that this will definitely and without question be the first time without strings attached. I just don't share it.
You have a really black or white view of things there.
Intel had an open source video driver for Linux long before AMD. Also: en.wikipedia.org/wiki/Thunderbolt_(interface)#Royalty_situation
AMD has to go the open route. They're the underdog, they can't sell closed solutions. If things changed, I'm pretty sure they'd reconsider their approach.
Posted on Reply
#28
SoNic67
So Intel should be nationalized, and then government should provide all those standards for free to everyone else in the world. Like they use freely the GPS.
Got it.
Posted on Reply
#29
Caring1
SoNic67Better standard (by al measures) will win.
Betamax says hello.
Posted on Reply
#30
R0H1T
PatriotThis is not for your desktop Steeevo, this is for servers where the bandwidth isn't as much for single device performance but for device to device performance. X8 may be fine for a single gpu to not lose performance, but not if it wants to work with 15 others and compete against nvlink. This is also intel railroading and not joining the other consortiums... which are already open standards Now... not to be opened on 2nd gen. This is a desperate lock-in attempt for their cascade lake failings.
I don't believe PCIe 4.0 will have such a bottleneck even for huge server farms ~ Why AMD EPYC Rome 2P Will Have 128-160 PCIe Gen4 Lanes

Having said that each use(r) case is different, so while some enterprses may need the extra lanes - they should have plenty to spare with PCIe 4.0 perhpas with the exception of (extreme) edge cases.

Some key points wrt competing solutions ~
www.openfabrics.org/images/eventpresos/2017presentations/213_CCIXGen-Z_BBenton.pdf
www.csm.ornl.gov/workshops/openshmem2017/presentations/Benton%20-%20OpenCAPI,%20Gen-Z,%20CCIX-%20Technology%20Overview,%20Trends,%20and%20Alignments.pdf
Posted on Reply
#31
SRB151
bugYou have a really black or white view of things there.
Intel had an open source video driver for Linux long before AMD. Also: en.wikipedia.org/wiki/Thunderbolt_(interface)#Royalty_situation
AMD has to go the open route. They're the underdog, they can't sell closed solutions. If things changed, I'm pretty sure they'd reconsider their approach.
LOL, so, open solutions are only for losers who have no choice but to market that way? What does that say about thunderbolt? Do you think they dropped the royalties because it was such a rousing success? It's not about black, white, blue, green, or whatever. Although I'll give you, Nvidia makes Intel look like choir boys when it comes to this. I really could care less about who comes up with what, as I said. And true innovation should be rewarded, but warming over similar tech to others in order to lock players out of one market or another is not right, and just not something I support. Optane is an original idea and more power to them for leveraging it, CXL is not.
Posted on Reply
#32
Prima.Vera
All big talk and nothing concrete. By date, there isn't even 1 mobo with PCI-X 4.0 out there, not to mention CPUs that support it (yet)
Posted on Reply
#33
bug
SRB151LOL, so, open solutions are only for losers who have no choice but to market that way? What does that say about thunderbolt? Do you think they dropped the royalties because it was such a rousing success? It's not about black, white, blue, green, or whatever. Although I'll give you, Nvidia makes Intel look like choir boys when it comes to this. I really could care less about who comes up with what, as I said. And true innovation should be rewarded, but warming over similar tech to others in order to lock players out of one market or another is not right, and just not something I support. Optane is an original idea and more power to them for leveraging it, CXL is not.
Do not put words in my mouth. Open solution are generally better. But they are contingent on existing expertise and participants agreeing with each other. Public companies on the other hand are primarily accountable to their shareholders and have to think about profit first.
Posted on Reply
#34
SoNic67
When open solutions will be building their first CPU, then good for them.
Until then, if Intel will add support in their CPU for something, it will became a defacto standard.

And yes, building for profit works, provides money for future research and development.

Open solutions don't work by themselves, they are supported by the evil non open products. Nobody likes to work for free, even the kids in their parents bedrooms want money for new phones, movie tickets with their dates, gas money...
Posted on Reply
#35
Patriot
R0H1TI don't believe PCIe 4.0 will have such a bottleneck even for huge server farms ~ Why AMD EPYC Rome 2P Will Have 128-160 PCIe Gen4 Lanes

Having said that each use(r) case is different, so while some enterprses may need the extra lanes - they should have plenty to spare with PCIe 4.0 perhpas with the exception of (extreme) edge cases.

Some key points wrt competing solutions ~
www.openfabrics.org/images/eventpresos/2017presentations/213_CCIXGen-Z_BBenton.pdf
www.csm.ornl.gov/workshops/openshmem2017/presentations/Benton - OpenCAPI, Gen-Z, CCIX- Technology Overview, Trends, and Alignments.pdf
Intel is skipping pcie 4 and going straight to 5 with this custom "optional" alternative proprietary protocol. So while AMD's Pcie 4 with CCIX and 128+++ pcie lanes** is enough for accelerators Intel has max 80 lanes of pcie 3.0 per cascade lake 2p or 4p... They need more and can catch up/pass AMD with 80 lanes of pcie 5.0.

**(160 lanes requires cutting cpu interconnects from 4 to 3/ While flexibility is nice, this is by far not optimal for compute intensive setups. Naples suffered due to interconnect saturation with nvme devices, While the bandwidth has doubled the core count has as well. Time will tell if the ram speed bump, and I/O die brings enough of an improvement to offset the loss of an interconnect.

AMD also has a 4 gpu infinity fabric ringbus that takes the load off the cpu. infinity fabric is very similar to ccix in being an alternative lower latency protocol over pcie.

Also, whoever mentioned no pcie 4 boards being on the market is only half correct, no x86 boards, but powerpc has had them for awhile, and I think a few arm boards.
2nd fun fact, powerpc chips have nvlink interconnects on die, so rather than connecting to nvlink gpus through a pcie switch... they are part of the mesh.
Posted on Reply
#36
jabbadap
Prima.VeraAll big talk and nothing concrete. By date, there isn't even 1 mobo with PCI-X 4.0 out there, not to mention CPUs that support it (yet)
heh maybe I should not say it, but pcie is not tied to x86 thus there is pcie 4.0 on couple of different arches.
Posted on Reply
#37
R0H1T
PatriotIntel is skipping pcie 4 and going straight to 5 with this custom "optional" alternative proprietary protocol.
They have PCIe 4.0 lined up however Intel's road-map change so often & they have so many overlapping ones that it's virtually impossible to say what they'll "release" next & if it'll just be a paper launch.

Patriotpowerpc chips have nvlink interconnects on die
Yes & NVlink was designed with IBM, it's a GPU-GPU & GPU-CPU interconnect that's why I said it's nothing like CXL, it's more akin to IF.
Posted on Reply
Add your own comment
Jul 1st, 2024 23:37 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts