Tuesday, October 15th 2024

NVIDIA Contributes Blackwell Platform Design to Open Hardware Ecosystem, Accelerating AI Infrastructure Innovation

To drive the development of open, efficient and scalable data center technologies, NVIDIA today announced that it has contributed foundational elements of its NVIDIA Blackwell accelerated computing platform design to the Open Compute Project (OCP) and broadened NVIDIA Spectrum-X support for OCP standards.

At this year's OCP Global Summit, NVIDIA will be sharing key portions of the NVIDIA GB200 NVL72 system electro-mechanical design with the OCP community — including the rack architecture, compute and switch tray mechanicals, liquid-cooling and thermal environment specifications, and NVIDIA NVLink cable cartridge volumetrics — to support higher compute density and networking bandwidth.
NVIDIA has already made several official contributions to OCP across multiple hardware generations, including its NVIDIA HGX H100 baseboard design specification, to help provide the ecosystem with a wider choice of offerings from the world's computer makers and expand the adoption of AI.

In addition, expanded NVIDIA Spectrum-X Ethernet networking platform alignment with OCP Community-developed specifications enables companies to unlock the performance potential of AI factories deploying OCP-recognized equipment while preserving their investments and maintaining software consistency.

"Building on a decade of collaboration with OCP, NVIDIA is working alongside industry leaders to shape specifications and designs that can be widely adopted across the entire data center," said Jensen Huang, founder and CEO of NVIDIA. "By advancing open standards, we're helping organizations worldwide take advantage of the full potential of accelerated computing and create the AI factories of the future."

Accelerated Computing Platform for the Next Industrial Revolution
NVIDIA's accelerated computing platform was designed to power a new era of AI.

GB200 NVL72 is based on the NVIDIA MGX modular architecture, which enables computer makers to quickly and cost-effectively build a vast array of data center infrastructure designs.

The liquid-cooled system connects 36 NVIDIA Grace CPUs and 72 NVIDIA Blackwell GPUs in a rack-scale design. With a 72-GPU NVIDIA NVLink domain, it acts as a single, massive GPU and delivers 30x faster real-time trillion-parameter large language model inference than the NVIDIA H100 Tensor Core GPU.

The NVIDIA Spectrum-X Ethernet networking platform, which now includes the next-generation NVIDIA ConnectX-8 SuperNIC, supports OCP's Switch Abstraction Interface (SAI) and Software for Open Networking in the Cloud (SONiC) standards. This allows customers to use Spectrum-X's adaptive routing and telemetry-based congestion control to accelerate Ethernet performance for scale-out AI infrastructure.

ConnectX-8 SuperNICs feature accelerated networking at speeds of up to 800Gb/s and programmable packet processing engines optimized for massive-scale AI workloads. ConnectX-8 SuperNICs for OCP 3.0 will be available next year, equipping organizations to build highly flexible networks.

Critical Infrastructure for Data Centers
As the world transitions from general-purpose to accelerated and AI computing, data center infrastructure is becoming increasingly complex. To simplify the development process, NVIDIA is working closely with 40+ global electronics makers that provide key components to create AI factories.

Additionally, a broad array of partners are innovating and building on top of the Blackwell platform, including Meta, which plans to contribute its Catalina AI rack architecture based on GB200 NVL72 to OCP. This provides computer makers with flexible options to build high compute density systems and meet the growing performance and energy efficiency needs of data centers.

"NVIDIA has been a significant contributor to open computing standards for years, including their high-performance computing platform that has been the foundation of our Grand Teton server for the past two years," said Yee Jiun Song, vice president of engineering at Meta. "As we progress to meet the increasing computational demands of large-scale artificial intelligence, NVIDIA's latest contributions in rack design and modular architecture will help speed up the development and implementation of AI infrastructure across the industry."

Learn more about NVIDIA's contributions to the Open Compute Project at the 2024 OCP Global Summit, taking place at the San Jose Convention Center from Oct. 15-17.
Add your own comment

Comments on NVIDIA Contributes Blackwell Platform Design to Open Hardware Ecosystem, Accelerating AI Infrastructure Innovation

There are no comments yet.

Dec 11th, 2024 20:31 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts