Tuesday, March 22nd 2022
NVIDIA Unveils Grace CPU Superchip with 144 Cores and 1 TB/s Bandwidth
NVIDIA has today announced its Grace CPU Superchip, a monstrous design focused on heavy HPC and AI processing workloads. Previously, team green has teased an in-house developed CPU that is supposed to go into servers and create an entirely new segment for the company. Today, we got a more detailed look at the plan with the Grace CPU Superchip. The Superchip package represents a package of two Grace processors, each containing 72 cores. These cores are based on Arm v9 in structure set architecture iteration and two CPUs total for 144 cores in the Superchip module. These cores are surrounded by a now unknown amount of LPDDR5x with ECC memory, running at 1 TB/s total bandwidth.
NVIDIA Grace CPU Superchip uses the NVLink-C2C cache coherent interconnect, which delivers 900 GB/s bandwidth, seven times more than the PCIe 5.0 protocol. The company targets two-fold performance per Watt improvement over today's CPUs and wants to bring efficiency and performance together. We have some preliminary benchmark information provided by NVIDIA. In the SPECrate2017_int_base integer benchmark, the Grace CPU Superchip scores over 740 points, which is just the simulation for now. This means that the performance target is not finalized yet, teasing a higher number in the future. The company expects to ship the Grace CPU Superchip in the first half of 2023, with an already supported ecosystem of software, including NVIDIA RTX, HPC, NVIDIA AI, and NVIDIA Omniverse software stacks and platforms.
NVIDIA Grace CPU Superchip uses the NVLink-C2C cache coherent interconnect, which delivers 900 GB/s bandwidth, seven times more than the PCIe 5.0 protocol. The company targets two-fold performance per Watt improvement over today's CPUs and wants to bring efficiency and performance together. We have some preliminary benchmark information provided by NVIDIA. In the SPECrate2017_int_base integer benchmark, the Grace CPU Superchip scores over 740 points, which is just the simulation for now. This means that the performance target is not finalized yet, teasing a higher number in the future. The company expects to ship the Grace CPU Superchip in the first half of 2023, with an already supported ecosystem of software, including NVIDIA RTX, HPC, NVIDIA AI, and NVIDIA Omniverse software stacks and platforms.
25 Comments on NVIDIA Unveils Grace CPU Superchip with 144 Cores and 1 TB/s Bandwidth
I am not sure how to feel about this... On one hand it's quite an achievement, but on the other it's a bit monopolistic.
The head start of CUDA looks almost insurmountable at this point.
Edit: You also mentioned Xilinx as an equivalent of Mellanox, but this is not the case. Xilinx's top offering is a FPGA-based 2x100GbE network card, while NVIDIA has announced a 64 x 800GbE port switch and is selling 400G adapters since 2021.
Is anyone familiar with this benchmark tool? is 740 good?
The next super miner :clap:
So, 2x the score of 2x64 cores of EPYC?
Scratch that: 2x64 threads of EPYC. The 7F53 is a 32c64t chip.
Makes me wonder why they're only quoting integer though - are they thinking that all fp compute be offloaded to the GPUs?
As for the relation of this to the ARM acquisition attempt: this is just more proof that the world dodged a bullet on that one. This is clear indication that if Nvidia controlled ARM and their licences, they would be strongly incentivized to use these in anticompetitive ways to bolster their own performance and competitiveness. The only reason x86 works decently is that the two major are completely bound up in cross-licencing deals, meaning neither can strong-arm the other or limit their access to anything without hurting themselves too. Nothing similar exists for ARM.
Edit: derp, see above.
Edit2: Here you go:
Still good, though I have to wonder how much of this is due to the package being designed specifically for the power delivery and cooling of SXM5 though. EPYC tops out at 280W no matter what; I'd expect this to go higher.
It too looks different from the H100 modules and has very special looking VRMs also on the Hopper side (which probably are expensive and different from H100).
That combo package is really interesting. The trace density between those two packages must be absolutely insane. Also a bit weird, given that mezzanine cards are typically meant for grids of 2x4 cards or more - maybe this is for bespoke installations where it's of specific value to have very low latency between CPU and GPU?
Hmmmm... a thought: do these cards cover two SXM5 ports/sockets/whatever they're called? That would kind of make sense, and the PCB-to-GPU scale would seem to indicate that.
I don't think so. Looks like they made 2 mezzanine module types and a dedicated high density board. Also the Grace elements are not going to be launched together with H100 - they are aimed at H1 2023. Which makes sense, they are introducing a whole new CPU platform so keeping the GPUs the same while the CPUs changes makes the transition easier. We still don't know what is going to power the H100 boards in DGX boxes, is it some Intel Xeon with PCIe 5? Is it Zen 4?
Or maybe POWER10 with built-in NVLinks?Scratch that, the whitepaper says:Another point is money. A good way to look at it is like buying the land around a new subway station. If you are investing billions of dollars in the subway station you can capture that added value in the surrounding properties. In the same way if you are investing billions of dollars in making ARM CPUs you are enhancing ARM the company itself and you can capture that value and make money off it. nVidia is now spending huge amounts towards making ARM the company stronger but not getting all the value of their investment. Someone else owns those surrounding properties. That's not ideal. In fact, now that ARM is not part of nVidia, their best play is to actually offer attractive pay to ARM employees and poach them for their own CPU design efforts. Can't buy the company, buy all the employees instead.
Imagine if Microsoft was a software behemoth without an OS. If they decide to invest billions in making Mac OS software, they might want to own a piece of Apple. In the same way Sony has bought a large stake in Epic for Unreal engine.
So: buying the land around a new public transit hub makes sense why? Because of the perceived demand and increased attractivity of the area allowing you to hike up prices. Buying up the land means forcing out competitors, allowing you to set the rents and prices. This also rests on an (uninterrogated in this case) assumption that hiking up prices because you can is unproblematic, which... Put it this way: if that transit hub got built and nobody started pushing up prices, they would stay the same. There is no inherent causal relation here - that's base level predatory capitalism. The only way this logic applies to ARM is if you are arguing that Nvidia was planning to increase licencing prices. Which, again, would be anticompetitive when they make ARM chips themselves and own the licence, as they would be disadvantaging competitors for their own benefits. I mean, the naivete behind arguing that this isn't anticompetitive is downright staggering.
As for what you're saying about Nvidia "investing in ARM", that is pure nonsense. If they bought ARM and wanted to invest in their R&D, they would need to hire more engineers - more money by itself doesn't create innovation. Where would those come from? Outside of both companies, obviously. And that same source is available to Nvidia now. Heck, Apple, Qualcomm, Mediatek, AMD, Ampere, Amazon and a bunch of others have engineers with experience making high performance ARM designs. And those companies also got their engineers from somewhere.
And Nvidia making fast ARM designs mainly helps Nvidia. Sure, ARM is strengthened by there being more good designs on their licences, but there's no lack of those. Nvidia would just be another one on the list.
Team blue is investing heavily on RISC-V, from manufacturing them, to supplying them with their own IP.
RISC-V still has advantage that its royalty free.
Moving forward, it depends on how much software gets adopted for which ISA, which would determine which one would dominate.
Only reason ARM is big right now is because of smartphones.
What you are describing is literally a dystopia and nodding in agreement as if that is the way to do business. A winner takes all mentality that is fundamental to many, if not most issues we have today, casually omitting that for every winner, someone else has to lose. Its not a world you'll like living in, that's guaranteed.
Looking a bit closer at AT's EPYC testing, they're actually reporting less than 280W real-world power draw for those EPYCs under load too - 265W package power for the EPYC 7763 on a Gigabyte production motherboard (as opposed to early tests run on an updated older-generation motherboard that had elevated idle and I/O power for some reason). They list 1.037 SPECintRate/W, which leaves the Grace at just a 42% advantage against a year-old platform.
Now, as I said, it's quite possible that Nvidia knows Sapphire Rapids will kick butt, but gobble down power. Let's say it comes close, maybe 700 SPECintRate for a 2S system? It would need to consume 945W for that to be half the efficiency of 740 at 500W. Is that even remotely plausible? I have no idea. It would certainly only be feasible with water cooling, but that's true for Grace as well, as the renders show.
This is still really impressive for a first-generation product, but ... yeah, it's not blowing anything out of the water. Looking at the video again, it seems that (thanks to those massive interconnects of theirs) they're actually separating out the CPUs to a separate board, so those Grace superchips or Grace-Hopper superchips live entirely separately from the SXM grid where the GPUs live. Seems like that carrier board is some entirely other form factor. It's really impressive that they have the bandwidth and latency to do so, and it lets them make some really flexible configurations.
They also say about that three-chip board that I thought might be for automotive that "this may just be a 2023-era supercomputer building block that we are looking at," speculating that it looks designed for a specific chassis and doesn't look like the aesthetically pleasing, symmetrical concept boards typically presented for things like this. That would definitely be an interesting use of this - though that also begs the question of who this design is for.