NVIDIA Announces Grace CPU for Giant AI and High Performance Computing Workloads

AleksandarK · Apr 12, 2021

NVIDIA today announced its first data center CPU, an Arm-based processor that will deliver 10x the performance of today's fastest servers on the most complex AI and high performance computing workloads.

The result of more than 10,000 engineering years of work, the NVIDIA Grace CPU is designed to address the computing requirements for the world's most advanced applications—including natural language processing, recommender systems and AI supercomputing—that analyze enormous datasets requiring both ultra-fast compute performance and massive memory. It combines energy-efficient Arm CPU cores with an innovative low-power memory subsystem to deliver high performance with great efficiency.

"Leading-edge AI and data science are pushing today's computer architecture beyond its limits - processing unthinkable amounts of data," said Jensen Huang, founder and CEO of NVIDIA. "Using licensed Arm IP, NVIDIA has designed Grace as a CPU specifically for giant-scale AI and HPC. Coupled with the GPU and DPU, Grace gives us the third foundational technology for computing, and the ability to re-architect the data center to advance AI. NVIDIA is now a three-chip company."

Grace is a highly specialized processor targeting workloads such as training next-generation NLP models that have more than 1 trillion parameters. When tightly coupled with NVIDIA GPUs, a Grace CPU-based system will deliver 10x faster performance than today's state-of-the-art NVIDIA DGX -based systems, which run on x86 CPUs.

While the vast majority of data centers are expected to be served by existing CPUs, Grace—named for Grace Hopper, the U.S. computer-programming pioneer—will serve a niche segment of computing.

The Swiss National Supercomputing Centre (CSCS) and the U.S. Department of Energy's Los Alamos National Laboratory are the first to announce plans to build Grace-powered supercomputers in support of national scientific research efforts.

NVIDIA is introducing Grace as the volume of data and size of AI models are growing exponentially. Today's largest AI models include billions of parameters and are doubling every two-and-a-half months. Training them requires a new CPU that can be tightly coupled with a GPU to eliminate system bottlenecks.

NVIDIA built Grace by leveraging the incredible flexibility of Arm's data center architecture. By introducing a new server-class CPU, NVIDIA is advancing the goal of technology diversity in AI and HPC communities, where choice is key to delivering the innovation needed to solve the world's most pressing problems.

"As the world's most widely licensed processor architecture, Arm drives innovation in incredible new ways every day," said Arm CEO Simon Segars. "NVIDIA's introduction of the Grace data center CPU illustrates clearly how Arm's licensing model enables an important invention, one that will further support the incredible work of AI researchers and scientists everywhere."

Grace's First Adopters Push Limits of Science and AI
CSCS and Los Alamos National Laboratory both plan to bring Grace-powered supercomputers, built by Hewlett Packard Enterprise, online in 2023.

"NVIDIA's novel Grace CPU allows us to converge AI technologies and classic supercomputing for solving some of the hardest problems in computational science," said CSCS Director Prof. Thomas Schulthess. "We are excited to make the new NVIDIA CPU available for our users in Switzerland and globally for processing and analyzing massive and complex scientific datasets."

"With an innovative balance of memory bandwidth and capacity, this next-generation system will shape our institution's computing strategy," said Thom Mason, director of the Los Alamos National Laboratory. "Thanks to NVIDIA's new Grace CPU, we'll be able to deliver advanced scientific research using high-fidelity 3D simulations and analytics with datasets that are larger than previously possible."

Delivering Breakthrough Performance
Underlying Grace's performance is fourth-generation NVIDIA NVLink interconnect technology, which provides a record 900 GB/s connection between Grace and NVIDIA GPUs to enable 30x higher aggregate bandwidth compared to today's leading servers.

Grace will also utilize an innovative LPDDR5x memory subsystem that will deliver twice the bandwidth and 10x better energy efficiency compared with DDR4 memory. In addition, the new architecture provides unified cache coherence with a single memory address space, combining system and HBM GPU memory to simplify programmability.

Grace will be supported by the NVIDIA HPC software development kit and the full suite of CUDA and CUDA-X libraries, which accelerate more than 2,000 GPU applications, speeding discoveries for scientists and researchers working on the world's most important challenges.

Availability is expected in the beginning of 2023.

View at TechPowerUp Main Site

TheLostSwede · Apr 12, 2021

Looks more like a SoM (System on Module) to me than a CPU, but ok Nvidia, you go ahead and call it a CPU.
Bigger is better, right?

Wirko · Apr 12, 2021

Those cube-like modules for voltage conversion look quite interesting. Probably inductors and capacitors below and mosfets on top of them.

TheoneandonlyMrK · Apr 12, 2021

10X , hmnnn, I do hate marketeers, be nice to see some actual tech specs and performance evidence, but 2023 is some way off and it's reasonable to expect just a steady drip of info until then.
The picture doesn't even look like what they're describing a server Cpu, it looks like the latest jetsun no Orin (latest model released today too called errrrrrrrrrrrr, ATLAN)self driving module not a CPU, which is listed on another site I won't name and also uses grace cores With a GPU attached on a board just like that one.

Nephilim666 · Apr 12, 2021

I guess this is the "Hopper" everyone was expecting to be the next GPU Arch after Ampere.

ur6beersaway · Apr 12, 2021

" more than 10,000 engineering years of work"... what the hell is 1 engineering year equal to? :confused:

Caring1 · Apr 13, 2021

ur6beersaway said:
" more than 10,000 engineering years of work"... what the hell is 1 engineering year equal to?

1 engineer working for 1 year. :slap:

DeathtoGnomes · Apr 13, 2021

ur6beersaway said:
" more than 10,000 engineering years of work"... what the hell is 1 engineering year equal to?

it means it was all researched by the time Nvidia bought bragging rights.

watzupken · Apr 13, 2021

Caring1 said:
1 engineer working for 1 year.

That doesn't say much in my opinion. Considering we don't know how much time each engineer spent working on this each day, to me this is nothing more than a meaningless marketing metric.

I feel this product may face strong headwinds because most big companies are deploying their own ARM custom SOC that supposedly suits their workload, and it is likely also cheaper. So unless Nvidia's acquisition of ARM goes through and they start gimping other users of ARM SOC which I believe will be the case soon or later, I am not sure how well they will sell this.

TheoneandonlyMrK · Apr 13, 2021

Looks , sorry, sounds like the A64FX processor Fujitsu already made..

64K · Apr 13, 2021

ur6beersaway said:
" more than 10,000 engineering years of work"... what the hell is 1 engineering year equal to?

Assuming a 40 hour work week and 2 weeks vacation and some holidays off maybe an engineer year would be around 1,900 hours. 10,000 years would be around 19 million engineering hours. That seems high to me but I really don't know.

Caring1 · Apr 13, 2021

64K said:
Assuming a 40 hour work week and 2 weeks vacation and some holidays off maybe an engineer year would be around 1,900 hours. 10,000 years would be around 19 million engineering hours. That seems high to me but I really don't know.

No different to crunching, the computer accelerates the rate of work based off an average work unit.

Vya Domus · Apr 15, 2021

TheoneandonlyMrK said:
Looks , sorry, sounds like the A64FX processor Fujitsu already made..

From what I can gather it's nothing like that, this is essentially a low power CPU with a high bandwidth link between it and the GPU to alleviate the bottleneck caused by moving data back and forth between the two. In other words this is isn't really intended to do any heavy computing, which is probably why they gave zero estimates on performance.

Hargema · Apr 15, 2021

This company is imho irrelevant and straight up trash in the dump until they provide enough graphics cards for the market they're supposed to handle.
When the average consumer can't get a 1650 Super for less than 300€ in 2021 it's pitiful and shows that the corporation doesn't give a damn.

System Name	Overlord Mk MLI
Processor	AMD Ryzen 7 7800X3D
Motherboard	Gigabyte X670E Aorus Master
Cooling	Noctua NH-D15 SE with offsets
Memory	32GB Team T-Create Expert DDR5 6000 MHz @ CL30-34-34-68
Video Card(s)	Gainward GeForce RTX 4080 Phantom GS
Storage	1TB Solidigm P44 Pro, 2 TB Corsair MP600 Pro, 2TB Kingston KC3000
Display(s)	Acer XV272K LVbmiipruzx 4K@160Hz
Case	Fractal Design Torrent Compact
Audio Device(s)	Corsair Virtuoso SE
Power Supply	be quiet! Pure Power 12 M 850 W
Mouse	Logitech G502 Lightspeed
Keyboard	Corsair K70 Max
Software	Windows 10 Pro
Benchmark Scores	https://valid.x86.fr/yfsd9w

Processor	i5-6600K
Motherboard	Asus Z170A
Cooling	some cheap Cooler Master Hyper 103 or similar
Memory	16GB DDR4-2400
Video Card(s)	IGP
Storage	Samsung 850 EVO 250GB
Display(s)	2x Oldell 24" 1920x1200
Case	Bitfenix Nova white windowless non-mesh
Audio Device(s)	E-mu 1212m PCI
Power Supply	Seasonic G-360
Mouse	Logitech Marble trackball, never had a mouse
Keyboard	Key Tronic KT2000, no Win key because 1994
Software	Oldwin

System Name	RyzenGtEvo/ Asus strix scar II
Processor	Amd R5 5900X/ Intel 8750H
Motherboard	Crosshair hero8 impact/Asus
Cooling	360EK extreme rad+ 360$EK slim all push, cpu ek suprim Gpu full cover all EK
Memory	Corsair Vengeance Rgb pro 3600cas14 16Gb in four sticks./16Gb/16GB
Video Card(s)	Powercolour RX7900XT Reference/Rtx 2060
Storage	Silicon power 2TB nvme/8Tb external/1Tb samsung Evo nvme 2Tb sata ssd/1Tb nvme
Display(s)	Samsung UAE28"850R 4k freesync.dell shiter
Case	Lianli 011 dynamic/strix scar2
Audio Device(s)	Xfi creative 7.1 on board ,Yamaha dts av setup, corsair void pro headset
Power Supply	corsair 1200Hxi/Asus stock
Mouse	Roccat Kova/ Logitech G wireless
Keyboard	Roccat Aimo 120
VR HMD	Oculus rift
Software	Win 10 Pro
Benchmark Scores	8726 vega 3dmark timespy/ laptop Timespy 6506

System Name	Build # 5
Processor	I9-9900K
Motherboard	ASUS-ROG Maximus XI Hero wifi
Cooling	Cosair H100i aio
Memory	G.SKILL 128GB (4 x 32GB) TridentZ RGB Series DDR4 PC4-21300 2666MHz Intel XMP 2.0
Video Card(s)	EVGA RTX-3090 FTW3
Storage	XPG S40G 2TB RGB Nand Gen3x4 NVMe
Display(s)	LG-65CX-Oled
Case	Corsair CC-9011030-WW Carbide Series Air 540 High Airflow ATX Cube Case - Black
Audio Device(s)	EVGA Nu Audio card & BOSE Quietcomfort 35 II headphones
Power Supply	CORSAIR-HX-1050
Software	WIN 10/64

System Name	H7 Flow 2024
Processor	AMD 5800X3D
Motherboard	Asus X570 Tough Gaming
Cooling	Custom liquid
Memory	32 GB DDR4
Video Card(s)	Intel ARC A750
Storage	Crucial P5 Plus 2TB.
Display(s)	AOC 24" Freesync 1m.s. 75Hz
Mouse	Lenovo
Keyboard	Eweadn Mechanical
Software	W11 Pro 64 bit

NVIDIA Announces Grace CPU for Giant AI and High Performance Computing Workloads

AleksandarK

News Editor

TheLostSwede

News Editor

Wirko

TheoneandonlyMrK

Nephilim666

ur6beersaway

Caring1

DeathtoGnomes

watzupken

TheoneandonlyMrK

64K

Caring1

Vya Domus

Hargema

System Name	Dumbass
Processor	AMD Ryzen 7800X3D
Motherboard	ASUS TUF gaming B650
Cooling	Artic Liquid Freezer 2 - 420mm
Memory	G.Skill Sniper 32gb DDR5 6000
Video Card(s)	GreenTeam 4070 ti super 16gb
Storage	Samsung EVO 500gb & 1Tb, 2tb HDD, 500gb WD Black
Display(s)	1x Nixeus NX_EDG27, 2x Dell S2440L (16:9)
Case	Phanteks Enthoo Primo w/8 140mm SP Fans
Audio Device(s)	onboard (realtek?) - SPKRS:Logitech Z623 200w 2.1
Power Supply	Corsair HX1000i
Mouse	Steeseries Esports Wireless
Keyboard	Corsair K100
Software	windows 10 H
Benchmark Scores	https://i.imgur.com/aoz3vWY.jpg?2

Processor	i7 7700k
Motherboard	MSI Z270 SLI Plus
Cooling	CM Hyper 212 EVO
Memory	2 x 8 GB Corsair Vengeance
Video Card(s)	Temporary MSI RTX 4070 Super
Storage	Samsung 850 EVO 250 GB and WD Black 4TB
Display(s)	Temporary Viewsonic 4K 60 Hz
Case	Corsair Obsidian 750D Airflow Edition
Audio Device(s)	Onboard
Power Supply	EVGA SuperNova 850 W Gold
Mouse	Logitech G502
Keyboard	Logitech G105
Software	Windows 10

System Name	Good enough
Processor	AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard	ASRock B650 Pro RS
Cooling	2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory	32GB - FURY Beast RGB 5600 Mhz
Video Card(s)	Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage	1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s)	LG UltraGear 32GN650-B + 4K Samsung TV
Case	Phanteks NV7
Power Supply	GPS-750C