NVIDIA CG100 "Grace" Server Processor Benchmarked by Academics

T0@st · Feb 8, 2024

The Barcelona Supercomputing Center (BSC) and the State University of New York (Stony Brook and Buffalo campuses) have pitted NVIDIA's relatively new CG100 "Grace" Superchip against several rival products in a "wide variety of HPC and AI benchmarks." Team Green marketing material has focused mainly on the overall GH200 "Grace Hopper" package—so it is interesting to see technical institutes concentrate on the company's "first true" server processor (ARM-based), rather than the ever popular GPU aspect. The Next Platform's article summarized the chip's internal makeup: "(NVIDIA's) Grace CPU has a relatively high core count and a relatively low thermal footprint, and it has banks of low-power DDR5 (LPDDR5) memory—the kind used in laptops but gussied up with error correction to be server class—of sufficient capacity to be useful for HPC systems, which typically have 256 GB or 512 GB per node these days and sometimes less."

Benchmark results were revealed at last week's HPC Asia 2024 conference (in Nagoya, Japan)—Barcelona Supercomputing Center (BSC) and the State University of New York also uploaded their findings to the ACM Digital Library (link #1 & #2). BSC's MareNostrum 5 system contains an experimental cluster portion—consisting of NVIDIA Grace-Grace and Grace-Hopper superchips. We have heard plenty about the latter (in press releases), but the former is a novel concept—as outlined by The Next Platform: "Put two Grace CPUs together into a Grace-Grace superchip, a tightly coupled package using NVLink chip-to-chip interconnects that provide memory coherence across the LPDDR5 memory banks and that consumes only around 500 watts, and it gets plenty interesting for the HPC crowd. That yields a total of 144 Arm Neoverse "Demeter" V2 cores with the Armv9 architecture, and 1 TB of physical memory with 1.1 TB/sec of peak theoretical bandwidth. For some reason, probably relating to yield on the LPDDR5 memory, only 960 GB of that memory capacity and only 1 TB/sec of that memory bandwidth is actually available."

BSC's older MareNostrum 4 supercomputer is based on "nodes comprised of a pair of 24-core Skylake-X Xeon SP-8160 Platinum processors running at 2.1 GHz." The almost seven year old Team Blue-based system was bested by the NVIDIA-fortified MareNostrum 5—the latter's worst performance results were still 67% faster, while its best was indicated a 4.49x performance advantage. The Upstate New York Institute fielded a wider ranger of rival solutions against its own NVIDIA setup—in "Grace-Grace" (CPU-CPU pair) and "Grace-Hopper" (CPU-GPU pair) configurations. The competition included: Intel Sapphire Rapids and Ice Lake, AMD Milan, plus the ARM-based Amazon Graviton 3 and Fujitsu A64FX processors. Tom's Hardware checked SUNY's comparison data: "The Grace Superchip easily beat the Graviton 3, the A64FX, an 80-core Ice Lake setup, and even a 128-core configuration of Milan in all benchmarks. However, the Sapphire Rapids server with two 48-core Xeon Max 9468s stopped Grace's winning streak."

They continued: "Against Sapphire Rapids in HBM mode, Grace only won in three of the eight tests—though it was able to outperform in five tests when in DDR5 mode. It's a surprisingly mixed bag for Nvidia considering that Grace has 50% more cores and uses TSMC's more advanced 4 nm node instead of Intel's aging Intel 7 (formerly 10 nm) process. It's not entirely out of left field, though: Sapphire Rapids also beat AMD's EPYC Genoa chips for a spot in a MI300X-powered Azure instance, indicating that, despite Sapphire Rapid's shortcomings, it still has plenty of potency for HPC...On the other hand, NVIDIA might have a crushing victory in efficiency. The Grace Superchip is rated for 500 watts, while the Xeon Max 9468 is rated for 350 watts, which means two would have a TDP of 700 watts. The paper doesn't detail power consumption on either chip, but if we assume each chip was running at its TDP, then the comparison becomes very favorable for NVIDIA."

The Next Platform believes that Team Green's CG100 server processor is truly bolstered by its onboard neighbor: "any CPU paired with the same Hopper GPU would probably do as well. On the CPU-only Grace-Grace unit, the Gromacs performance is almost as potent as a pair of 'Sapphire Rapids' Xeon Max Series CPUs. It is noteworthy that the HBM memory on this chip doesn't help that much for Gromacs. Hmmmm. Anyway, that is some food for thought about the Grace CPU and HPC workloads."

View at TechPowerUp Main Site | Source

AnotherReader · Feb 8, 2024

I wonder how it would compare to Bergamo which has the advantage of AVX-512 and higher memory bandwidth over Milan.

ScaLibBDP · Feb 8, 2024

Attention! There are a lot of issues and inconsistencies in the article and results.

- It is from academia and in some cases academia researches are good at writing publications rather than delivering high quality HPC production codes
Note: Show me a piece of code and I'll tell you if it is implemented by a PhD computer scientist or by a highly experienced software engineer

- In HPC we do Not measure Peak Processing Power ( PPP ) in FLOPS per Clock! It is always measured in FLOPS per second. Take a look at www.top500.org numbers and supercomputer specs and you'll see that core clocks of CPUs and GPUs are always different.

- Is that right to compare performance of 48-core processor against 144-core processor with different core clock frequencies without Normalization results?
Note: It is absolutely useless without normalizing results! If I normalize a result of 48-core processor against 144-core processor ( multiply by 3 ) than the ARM processor is faster!
Is that right to compare fuel efficiency of hybrid cars of different sizes and masses?

>>...HBM memory on this chip doesn't help that much for Gromacs...
- This is because processing in Gromacs is CPU-bond rather than RAM-bound.

TumbleGeorge · Feb 8, 2024

Off/How many academics does it take to change a light bulb?/end off

Denver · Feb 8, 2024

What an unflattering comparison the university has presented... They depict the EPYC 7763 (Zen3) as having 128 cores, when in reality, it only has 64 physical cores. They've mixed desktop and server components, among other discrepancies. It would be more interesting to see mi300 vs H100, both paired with Genoa vs Nvidia "Super" Chip.

At this point I can only say that something smells bad. lol

AnotherReader · Feb 8, 2024

Denver said:
What an unflattering comparison the university has presented... They depict the EPYC 7763 (Zen3) as having 128 cores, when in reality, it only has 64 physical cores. They've mixed desktop and server components, among other discrepancies. It would be more interesting to see mi300 vs H100, both paired with Genoa vs Nvidia "Super" Chip.

At this point I can only say that something smells bad. lol

They may have been using a dual socket system. Still, to be fair, they should have included Zen 4 based SKUs like the EPYC 9754 (128 cores) or EPYC 9654 (96 cores).

AnarchoPrimitiv · Feb 8, 2024

AnotherReader said:
I wonder how it would compare to Bergamo which has the advantage of AVX-512 and higher memory bandwidth over Milan.

Yeah, or I don't know, how about we compare it to the NEW epyc chips instead of the Zen3 ones? Here's what I'm interested in, Phoronix compared the Xeon Max with HBM against Genoa-X (the large cache variants) and Genoa-X easily beat the Xeons with HBM.....so if the Xeons with HBM beat the Nvidia CPUs, and Epyc Genoa-X with extra cache beat the Xeons with HBM, does that mean Epyc Genoa-X will beat the Nvidia CPU?

That's why I was so disappointed they tested with Zen3 epyc

Minus Infinity · Feb 9, 2024

AnotherReader said:
I wonder how it would compare to Bergamo which has the advantage of AVX-512 and higher memory bandwidth over Milan.

Why not compare it to MI300's the direct competitors?

ncrs · Feb 9, 2024

Phoronix has published a CPU-focused benchmark of the GH200 (CPU+GPU, not CPU+CPU like this article), but it has comparisons to a wider range of CPUs including Ampere Altra ARM and Zen 4 chips.

ScaLibBDP · Feb 9, 2024

Denver said:
At this point I can only say that something smells bad. lol

I don't think you're the only one who thinks about it!

Intel, AMD and ARM are very concerned that NVIDIA stepped into the CPU-server market with a new generation system ( CPU+GPU ). It is possible that the work was financially supported by one of these companies, of course Not directly.

Another thing is that all these companies are absolutely jealous of regarding current NVIDIA revenues and hardware advances. All of them could only dream about hardware orders similar to an order by Meta from NVIDIA, that is, 350,000 NVIDIA H100 GPUs of 10.5B US dollars! It is possible that the publication is an attempt to harm NVIDIA reputation, something like, "...look, our 3rd Gen CPUs are better than the latest most advanced system from NVIDIA..." in order to boost number of orders of older Intel Xeon and AMD EPYC CPUs.

That is why Microsoft and OpenAI are talking about investing of billions of dollars into new chip making factories. Once again, all of them are simply jealous and dream about revenues of NVIDIA.

Also, take a look at an article on www.hpcwire.com:

Nvidia’s Dominance in AI Chips Challenged by Big Tech Companies

The GenAI boom has exposed just how reliant the big tech companies have become on Nvidia, the leading global manufacturer of high-end graphics processing units (GPUs). The rise in demand […]

www.hpcwire.com

Denver · Feb 9, 2024

ScaLibBDP said:
I don't think you're the only one who thinks about it!

Intel, AMD and ARM are very concerned that NVIDIA stepped into the CPU-server market with a new generation system ( CPU+GPU ). It is possible that the work was financially supported by one of these companies, of course Not directly.

Another thing is that all these companies are absolutely jealous of regarding current NVIDIA revenues and hardware advances. All of them could only dream about hardware orders similar to an order by Meta from NVIDIA, that is, 350,000 NVIDIA H100 GPUs of 10.5B US dollars! It is possible that the publication is an attempt to harm NVIDIA reputation, something like, "...look, our 3rd Gen CPUs are better than the latest most advanced system from NVIDIA..." in order to boost number of orders of older Intel Xeon and AMD EPYC CPUs.

That is why Microsoft and OpenAI are talking about investing of billions of dollars into new chip making factories. Once again, all of them are simply jealous and dream about revenues of NVIDIA.

Also, take a look at an article on www.hpcwire.com:

Nvidia’s Dominance in AI Chips Challenged by Big Tech Companies

The GenAI boom has exposed just how reliant the big tech companies have become on Nvidia, the leading global manufacturer of high-end graphics processing units (GPUs). The rise in demand […]

www.hpcwire.com

The tests show the Nvidia chip ahead in some cases. This test does not favor AMD, I think it was just done poorly.

Redwoodz · Feb 10, 2024

Cherry-picked bench slides,lmao.

System Name	The TPU Typewriter
Processor	AMD Ryzen 5 5600 (non-X)
Motherboard	GIGABYTE B550M DS3H Micro ATX
Cooling	DeepCool AS500
Memory	Kingston Fury Renegade RGB 32 GB (2 x 16 GB) DDR4-3600 CL16
Video Card(s)	PowerColor Radeon RX 7800 XT 16 GB Hellhound OC
Storage	Samsung 980 Pro 1 TB M.2-2280 PCIe 4.0 X4 NVME SSD
Display(s)	Lenovo Legion Y27q-20 27" QHD IPS monitor
Case	GameMax Spark M-ATX (re-badged Jonsbo D30)
Audio Device(s)	FiiO K7 Desktop DAC/Amp + Philips Fidelio X3 headphones, or ARTTI T10 Planar IEMs
Power Supply	ADATA XPG CORE Reactor 650 W 80+ Gold ATX
Mouse	Roccat Kone Pro Air
Keyboard	Cooler Master MasterKeys Pro L
Software	Windows 10 64-bit Home Edition

Processor	Ryzen 7 5700X
Motherboard	ASUS TUF Gaming X570-PRO (WiFi 6)
Cooling	Noctua NH-C14S (two fans)
Memory	2x16GB DDR4 3200
Video Card(s)	Reference Vega 64
Storage	Intel 665p 1TB, WD Black SN850X 2TB, Crucial MX300 1TB SATA, Samsung 830 256 GB SATA
Display(s)	Nixeus NX-EDG27, and Samsung S23A700
Case	Fractal Design R5
Power Supply	Seasonic PRIME TITANIUM 850W
Mouse	Logitech
VR HMD	Oculus Rift
Software	Windows 11 Pro, and Ubuntu 20.04

Processor	Ryzen 7 5700X
Motherboard	ASUS TUF Gaming X570-PRO (WiFi 6)
Cooling	Noctua NH-C14S (two fans)
Memory	2x16GB DDR4 3200
Video Card(s)	Reference Vega 64
Storage	Intel 665p 1TB, WD Black SN850X 2TB, Crucial MX300 1TB SATA, Samsung 830 256 GB SATA
Display(s)	Nixeus NX-EDG27, and Samsung S23A700
Case	Fractal Design R5
Power Supply	Seasonic PRIME TITANIUM 850W
Mouse	Logitech
VR HMD	Oculus Rift
Software	Windows 11 Pro, and Ubuntu 20.04

System Name	Lightbringer
Processor	Ryzen 7 2700X
Motherboard	Asus ROG Strix X470-F Gaming
Cooling	Enermax Liqmax Iii 360mm AIO
Memory	G.Skill Trident Z RGB 32GB (8GBx4) 3200Mhz CL 14
Video Card(s)	Sapphire RX 5700XT Nitro+
Storage	Hp EX950 2TB NVMe M.2, HP EX950 1TB NVMe M.2, Samsung 860 EVO 2TB
Display(s)	LG 34BK95U-W 34" 5120 x 2160
Case	Lian Li PC-O11 Dynamic (White)
Power Supply	BeQuiet Straight Power 11 850w Gold Rated PSU
Mouse	Glorious Model O (Matte White)
Keyboard	Royal Kludge RK71
Software	Windows 10

NVIDIA CG100 "Grace" Server Processor Benchmarked by Academics

News Editor