• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

NVIDIA CG100 "Grace" Server Processor Benchmarked by Academics

T0@st

News Editor
Joined
Mar 7, 2023
Messages
2,077 (3.32/day)
Location
South East, UK
The Barcelona Supercomputing Center (BSC) and the State University of New York (Stony Brook and Buffalo campuses) have pitted NVIDIA's relatively new CG100 "Grace" Superchip against several rival products in a "wide variety of HPC and AI benchmarks." Team Green marketing material has focused mainly on the overall GH200 "Grace Hopper" package—so it is interesting to see technical institutes concentrate on the company's "first true" server processor (ARM-based), rather than the ever popular GPU aspect. The Next Platform's article summarized the chip's internal makeup: "(NVIDIA's) Grace CPU has a relatively high core count and a relatively low thermal footprint, and it has banks of low-power DDR5 (LPDDR5) memory—the kind used in laptops but gussied up with error correction to be server class—of sufficient capacity to be useful for HPC systems, which typically have 256 GB or 512 GB per node these days and sometimes less."

Benchmark results were revealed at last week's HPC Asia 2024 conference (in Nagoya, Japan)—Barcelona Supercomputing Center (BSC) and the State University of New York also uploaded their findings to the ACM Digital Library (link #1 & #2). BSC's MareNostrum 5 system contains an experimental cluster portion—consisting of NVIDIA Grace-Grace and Grace-Hopper superchips. We have heard plenty about the latter (in press releases), but the former is a novel concept—as outlined by The Next Platform: "Put two Grace CPUs together into a Grace-Grace superchip, a tightly coupled package using NVLink chip-to-chip interconnects that provide memory coherence across the LPDDR5 memory banks and that consumes only around 500 watts, and it gets plenty interesting for the HPC crowd. That yields a total of 144 Arm Neoverse "Demeter" V2 cores with the Armv9 architecture, and 1 TB of physical memory with 1.1 TB/sec of peak theoretical bandwidth. For some reason, probably relating to yield on the LPDDR5 memory, only 960 GB of that memory capacity and only 1 TB/sec of that memory bandwidth is actually available."




BSC's older MareNostrum 4 supercomputer is based on "nodes comprised of a pair of 24-core Skylake-X Xeon SP-8160 Platinum processors running at 2.1 GHz." The almost seven year old Team Blue-based system was bested by the NVIDIA-fortified MareNostrum 5—the latter's worst performance results were still 67% faster, while its best was indicated a 4.49x performance advantage. The Upstate New York Institute fielded a wider ranger of rival solutions against its own NVIDIA setup—in "Grace-Grace" (CPU-CPU pair) and "Grace-Hopper" (CPU-GPU pair) configurations. The competition included: Intel Sapphire Rapids and Ice Lake, AMD Milan, plus the ARM-based Amazon Graviton 3 and Fujitsu A64FX processors. Tom's Hardware checked SUNY's comparison data: "The Grace Superchip easily beat the Graviton 3, the A64FX, an 80-core Ice Lake setup, and even a 128-core configuration of Milan in all benchmarks. However, the Sapphire Rapids server with two 48-core Xeon Max 9468s stopped Grace's winning streak."



They continued: "Against Sapphire Rapids in HBM mode, Grace only won in three of the eight tests—though it was able to outperform in five tests when in DDR5 mode. It's a surprisingly mixed bag for Nvidia considering that Grace has 50% more cores and uses TSMC's more advanced 4 nm node instead of Intel's aging Intel 7 (formerly 10 nm) process. It's not entirely out of left field, though: Sapphire Rapids also beat AMD's EPYC Genoa chips for a spot in a MI300X-powered Azure instance, indicating that, despite Sapphire Rapid's shortcomings, it still has plenty of potency for HPC...On the other hand, NVIDIA might have a crushing victory in efficiency. The Grace Superchip is rated for 500 watts, while the Xeon Max 9468 is rated for 350 watts, which means two would have a TDP of 700 watts. The paper doesn't detail power consumption on either chip, but if we assume each chip was running at its TDP, then the comparison becomes very favorable for NVIDIA."

The Next Platform believes that Team Green's CG100 server processor is truly bolstered by its onboard neighbor: "any CPU paired with the same Hopper GPU would probably do as well. On the CPU-only Grace-Grace unit, the Gromacs performance is almost as potent as a pair of 'Sapphire Rapids' Xeon Max Series CPUs. It is noteworthy that the HBM memory on this chip doesn't help that much for Gromacs. Hmmmm. Anyway, that is some food for thought about the Grace CPU and HPC workloads."

View at TechPowerUp Main Site | Source
 
Joined
Nov 26, 2021
Messages
1,645 (1.51/day)
Location
Mississauga, Canada
Processor Ryzen 7 5700X
Motherboard ASUS TUF Gaming X570-PRO (WiFi 6)
Cooling Noctua NH-C14S (two fans)
Memory 2x16GB DDR4 3200
Video Card(s) Reference Vega 64
Storage Intel 665p 1TB, WD Black SN850X 2TB, Crucial MX300 1TB SATA, Samsung 830 256 GB SATA
Display(s) Nixeus NX-EDG27, and Samsung S23A700
Case Fractal Design R5
Power Supply Seasonic PRIME TITANIUM 850W
Mouse Logitech
VR HMD Oculus Rift
Software Windows 11 Pro, and Ubuntu 20.04
I wonder how it would compare to Bergamo which has the advantage of AVX-512 and higher memory bandwidth over Milan.
 
Joined
Jan 2, 2019
Messages
123 (0.06/day)
Attention! There are a lot of issues and inconsistencies in the article and results.

- It is from academia and in some cases academia researches are good at writing publications rather than delivering high quality HPC production codes
Note: Show me a piece of code and I'll tell you if it is implemented by a PhD computer scientist or by a highly experienced software engineer

- In HPC we do Not measure Peak Processing Power ( PPP ) in FLOPS per Clock! It is always measured in FLOPS per second. Take a look at www.top500.org numbers and supercomputer specs and you'll see that core clocks of CPUs and GPUs are always different.

- Is that right to compare performance of 48-core processor against 144-core processor with different core clock frequencies without Normalization results?
Note: It is absolutely useless without normalizing results! If I normalize a result of 48-core processor against 144-core processor ( multiply by 3 ) than the ARM processor is faster!
Is that right to compare fuel efficiency of hybrid cars of different sizes and masses?

>>...HBM memory on this chip doesn't help that much for Gromacs...
- This is because processing in Gromacs is CPU-bond rather than RAM-bound.
 
Joined
Sep 1, 2020
Messages
2,343 (1.52/day)
Location
Bulgaria
Off/How many academics does it take to change a light bulb?/end off
 
Joined
Oct 6, 2021
Messages
1,605 (1.40/day)
What an unflattering comparison the university has presented... They depict the EPYC 7763 (Zen3) as having 128 cores, when in reality, it only has 64 physical cores. They've mixed desktop and server components, among other discrepancies. It would be more interesting to see mi300 vs H100, both paired with Genoa vs Nvidia "Super" Chip.

At this point I can only say that something smells bad. lol
 
Joined
Nov 26, 2021
Messages
1,645 (1.51/day)
Location
Mississauga, Canada
Processor Ryzen 7 5700X
Motherboard ASUS TUF Gaming X570-PRO (WiFi 6)
Cooling Noctua NH-C14S (two fans)
Memory 2x16GB DDR4 3200
Video Card(s) Reference Vega 64
Storage Intel 665p 1TB, WD Black SN850X 2TB, Crucial MX300 1TB SATA, Samsung 830 256 GB SATA
Display(s) Nixeus NX-EDG27, and Samsung S23A700
Case Fractal Design R5
Power Supply Seasonic PRIME TITANIUM 850W
Mouse Logitech
VR HMD Oculus Rift
Software Windows 11 Pro, and Ubuntu 20.04
What an unflattering comparison the university has presented... They depict the EPYC 7763 (Zen3) as having 128 cores, when in reality, it only has 64 physical cores. They've mixed desktop and server components, among other discrepancies. It would be more interesting to see mi300 vs H100, both paired with Genoa vs Nvidia "Super" Chip.

At this point I can only say that something smells bad. lol
They may have been using a dual socket system. Still, to be fair, they should have included Zen 4 based SKUs like the EPYC 9754 (128 cores) or EPYC 9654 (96 cores).
 
Joined
Nov 6, 2016
Messages
1,751 (0.60/day)
Location
NH, USA
System Name Lightbringer
Processor Ryzen 7 2700X
Motherboard Asus ROG Strix X470-F Gaming
Cooling Enermax Liqmax Iii 360mm AIO
Memory G.Skill Trident Z RGB 32GB (8GBx4) 3200Mhz CL 14
Video Card(s) Sapphire RX 5700XT Nitro+
Storage Hp EX950 2TB NVMe M.2, HP EX950 1TB NVMe M.2, Samsung 860 EVO 2TB
Display(s) LG 34BK95U-W 34" 5120 x 2160
Case Lian Li PC-O11 Dynamic (White)
Power Supply BeQuiet Straight Power 11 850w Gold Rated PSU
Mouse Glorious Model O (Matte White)
Keyboard Royal Kludge RK71
Software Windows 10
I wonder how it would compare to Bergamo which has the advantage of AVX-512 and higher memory bandwidth over Milan.
Yeah, or I don't know, how about we compare it to the NEW epyc chips instead of the Zen3 ones? Here's what I'm interested in, Phoronix compared the Xeon Max with HBM against Genoa-X (the large cache variants) and Genoa-X easily beat the Xeons with HBM.....so if the Xeons with HBM beat the Nvidia CPUs, and Epyc Genoa-X with extra cache beat the Xeons with HBM, does that mean Epyc Genoa-X will beat the Nvidia CPU?

That's why I was so disappointed they tested with Zen3 epyc
 
Joined
Jan 2, 2019
Messages
123 (0.06/day)
At this point I can only say that something smells bad. lol

I don't think you're the only one who thinks about it!

Intel, AMD and ARM are very concerned that NVIDIA stepped into the CPU-server market with a new generation system ( CPU+GPU ). It is possible that the work was financially supported by one of these companies, of course Not directly.

Another thing is that all these companies are absolutely jealous of regarding current NVIDIA revenues and hardware advances. All of them could only dream about hardware orders similar to an order by Meta from NVIDIA, that is, 350,000 NVIDIA H100 GPUs of 10.5B US dollars! It is possible that the publication is an attempt to harm NVIDIA reputation, something like, "...look, our 3rd Gen CPUs are better than the latest most advanced system from NVIDIA..." in order to boost number of orders of older Intel Xeon and AMD EPYC CPUs.

That is why Microsoft and OpenAI are talking about investing of billions of dollars into new chip making factories. Once again, all of them are simply jealous and dream about revenues of NVIDIA.

Also, take a look at an article on www.hpcwire.com:

 
Joined
Oct 6, 2021
Messages
1,605 (1.40/day)
I don't think you're the only one who thinks about it!

Intel, AMD and ARM are very concerned that NVIDIA stepped into the CPU-server market with a new generation system ( CPU+GPU ). It is possible that the work was financially supported by one of these companies, of course Not directly.

Another thing is that all these companies are absolutely jealous of regarding current NVIDIA revenues and hardware advances. All of them could only dream about hardware orders similar to an order by Meta from NVIDIA, that is, 350,000 NVIDIA H100 GPUs of 10.5B US dollars! It is possible that the publication is an attempt to harm NVIDIA reputation, something like, "...look, our 3rd Gen CPUs are better than the latest most advanced system from NVIDIA..." in order to boost number of orders of older Intel Xeon and AMD EPYC CPUs.

That is why Microsoft and OpenAI are talking about investing of billions of dollars into new chip making factories. Once again, all of them are simply jealous and dream about revenues of NVIDIA.

Also, take a look at an article on www.hpcwire.com:

The tests show the Nvidia chip ahead in some cases. This test does not favor AMD, I think it was just done poorly.
 
Top