News Posts matching #Hopper

Return to Keyword Browsing

NVIDIA Turbocharges Generative AI Training in MLPerf Benchmarks

NVIDIA's AI platform raised the bar for AI training and high performance computing in the latest MLPerf industry benchmarks. Among many new records and milestones, one in generative AI stands out: NVIDIA Eos - an AI supercomputer powered by a whopping 10,752 NVIDIA H100 Tensor Core GPUs and NVIDIA Quantum-2 InfiniBand networking - completed a training benchmark based on a GPT-3 model with 175 billion parameters trained on one billion tokens in just 3.9 minutes. That's a nearly 3x gain from 10.9 minutes, the record NVIDIA set when the test was introduced less than six months ago.

The benchmark uses a portion of the full GPT-3 data set behind the popular ChatGPT service that, by extrapolation, Eos could now train in just eight days, 73x faster than a prior state-of-the-art system using 512 A100 GPUs. The acceleration in training time reduces costs, saves energy and speeds time-to-market. It's heavy lifting that makes large language models widely available so every business can adopt them with tools like NVIDIA NeMo, a framework for customizing LLMs. In a new generative AI test ‌this round, 1,024 NVIDIA Hopper architecture GPUs completed a training benchmark based on the Stable Diffusion text-to-image model in 2.5 minutes, setting a high bar on this new workload. By adopting these two tests, MLPerf reinforces its leadership as the industry standard for measuring AI performance, since generative AI is the most transformative technology of our time.

GIGABYTE Announces New Direct Liquid Cooling (DLC) Multi-Node Servers Ahead of SC23

GIGABYTE Technology, Giga Computing, a subsidiary of GIGABYTE and an industry leader in high-performance servers, server motherboards, and workstations, today announced direct liquid cooling (DLC) multi-node servers for NVIDIA Grace CPU & NVIDIA Grace Hopper Superchip. In addition, a DLC ready Intel-based server for the NVIDIA HGX H100 8-GPU platform and a high-density server for AMD EPYC 9004 processors. For the ultimate in efficiency, is also a new 12U single-phase immersion tank. All these mentioned products will be at GIGABYTE booth #355 at SC23.

Just announced high-density CPU servers include Intel Xeon-based H263-S63-LAN1 and AMD EPYC-based H273-Z80-LAN1. These 2U 4 node servers employ DLC for all eight CPUs, and although it is dense computing CPU performance achieves its full potential. In August, GIGABYTE announced new servers for NVIDIA HGX H100 GPU, and now adds the DLC version to the G593 series, G593-SD0-LAX1, for NVIDIA HGX H100 8-GPU.

NVIDIA to Start Selling Arm-based CPUs to PC Clients by 2025

According to sources close to Reuters, NVIDIA is reportedly developing its custom CPUs based on Arm instruction set architecture (ISA), specifically tailored for the client ecosystem, also known as PC. NVIDIA has already developed an Arm-based CPU codenamed Grace, which is designed to handle server and HPC workloads in combination with the company's Hopper GPU. However, as we learn today, NVIDIA also wants to provide CPUs for PC users and to power Microsoft's Windows operating system. The push for more vendors of Arm-based CPUs is also supported by Microsoft, which is losing PC market share to Apple and its M-series of processors.

The creation of custom processors for PCs that Arm ISA would power makes the decades of x86-based applications either obsolete or in need of recompilation. Apple allows users to emulate x86 applications using the x86-to-Arm translation layer, and even Microsoft allows it for Windows-on-Arm devices. We are left to see how NVIDIA's solution would compete in the entire market of PC processors, which are expected to arrive in 2025. Still, the company could make some compelling solutions given its incredible silicon engineering history and performant Arm design like Grace. With the upcoming Arm-based processors hitting the market, we expect the Windows-on-Arm ecosystem to thrive and get massive investment from independent software vendors.

U.S. Restricts Exports of NVIDIA GeForce RTX 4090 to China

The GeForce RTX 4090 gaming graphics card, both as an NVIDIA first-party Founders Edition, and custom-design by AIC partners, undergoes assembly in China. A new U.S. Government trade regulation restricts NVIDIA from selling it in the Chinese domestic market. The enthusiast-segment graphics card joins several other high performance AI processors, such as the "Hopper" H800, and "Ampere" A800. If you recall, the H800 and A800 are special China-specific variants of the H100 and A100, respectively, which come with performance reductions at the hardware-level, to fly below the AI processor performance limits set by the U.S. Government. The only reasons we can think of why these chips are on the list is if end-users in China have figured out ways around these performance limiters, or are buying in greater scale to achieve the desired performance. The fresh trade embargo released on October 17 covers the A100, A800, H100, H800, L40, L40S, and RTX 4090.

Samsung Notes: HBM4 Memory is Coming in 2025 with New Assembly and Bonding Technology

According to the editorial blog post published on the Samsung blog by SangJoon Hwang, Executive Vice President and Head of the DRAM Product & Technology Team at Samsung Electronics, we have information that High-Bandwidth Memory 4 (HBM4) is coming in 2025. In the recent timeline of HBM development, we saw the first appearance of HBM memory in 2015 with the AMD Radeon R9 Fury X. The second-generation HBM2 appeared with NVIDIA Tesla P100 in 2016, and the third-generation HBM3 saw the light of the day with NVIDIA Hopper GH100 GPU in 2022. Currently, Samsung has developed 9.8 Gbps HBM3E memory, which will start sampling to customers soon.

However, Samsung is more ambitious with development timelines this time, and the company expects to announce HBM4 in 2025, possibly with commercial products in the same calendar year. Interestingly, the HBM4 memory will have some technology optimized for high thermal properties, such as non-conductive film (NCF) assembly and hybrid copper bonding (HCB). The NCF is a polymer layer that enhances the stability of micro bumps and TSVs in the chip, so memory solder bump dies are protected from shock. Hybrid copper bonding is an advanced semiconductor packaging method that creates direct copper-to-copper connections between semiconductor components, enabling high-density, 3D-like packaging. It offers high I/O density, enhanced bandwidth, and improved power efficiency. It uses a copper layer as a conductor and oxide insulator instead of regular micro bumps to increase the connection density needed for HBM-like structures.

Q2 Revenue for Top 10 Global IC Houses Surges by 12.5% as Q3 on Pace to Set New Record

Fueled by an AI-driven inventory stocking frenzy across the supply chain, TrendForce reveals that Q2 revenue for the top 10 global IC design powerhouses soared to US $38.1 billion, marking a 12.5% quarterly increase. In this rising tide, NVIDIA seized the crown, officially dethroning Qualcomm as the world's premier IC design house, while the remainder of the leaderboard remained stable.

AI charges ahead, buoying IC design performance amid a seasonal stocking slump
NVIDIA is reaping the rewards of a global transformation. Bolstered by the global demand from CSPs, internet behemoths, and enterprises diving into generative AI and large language models, NVIDIA's data center revenue skyrocketed by a whopping 105%. A deluge of shipments, including the likes of their advanced Hopper and Ampere architecture HGX systems and the high-performing InfinBand, played a pivotal role. Beyond that, both gaming and professional visualization sectors thrived under the allure of fresh product launches. Clocking a Q2 revenue of US$11.33 billion (a 68.3% surge), NVIDIA has vaulted over both Qualcomm and Broadcom to seize the IC design throne.

Tata Partners With NVIDIA to Build Large-Scale AI Infrastructure

NVIDIA today announced an extensive collaboration with Tata Group to deliver AI computing infrastructure and platforms for developing AI solutions. The collaboration will bring state-of-the-art AI capabilities within reach to thousands of organizations, businesses and AI researchers, and hundreds of startups in India. The companies will work together to build an AI supercomputer powered by the next-generation NVIDIA GH200 Grace Hopper Superchip to achieve performance that is best in class.

"The global generative AI race is in full steam," said Jensen Huang, founder and CEO of NVIDIA. "Data centers worldwide are shifting to GPU computing to build energy-efficient infrastructure to support the exponential demand for generative AI.

Google Cloud and NVIDIA Expand Partnership to Advance AI Computing, Software and Services

Google Cloud Next—Google Cloud and NVIDIA today announced new AI infrastructure and software for customers to build and deploy massive models for generative AI and speed data science workloads.

In a fireside chat at Google Cloud Next, Google Cloud CEO Thomas Kurian and NVIDIA founder and CEO Jensen Huang discussed how the partnership is bringing end-to-end machine learning services to some of the largest AI customers in the world—including by making it easy to run AI supercomputers with Google Cloud offerings built on NVIDIA technologies. The new hardware and software integrations utilize the same NVIDIA technologies employed over the past two years by Google DeepMind and Google research teams.

NVIDIA Unveils Next-Generation GH200 Grace Hopper Superchip Platform With HMB3e

NVIDIA today announced the next-generation NVIDIA GH200 Grace Hopper platform - based on a new Grace Hopper Superchip with the world's first HBM3e processor - built for the era of accelerated computing and generative AI. Created to handle the world's most complex generative AI workloads, spanning large language models, recommender systems and vector databases, the new platform will be available in a wide range of configurations. The dual configuration - which delivers up to 3.5x more memory capacity and 3x more bandwidth than the current generation offering - comprises a single server with 144 Arm Neoverse cores, eight petaflops of AI performance and 282 GB of the latest HBM3e memory technology.

"To meet surging demand for generative AI, data centers require accelerated computing platforms with specialized needs," said Jensen Huang, founder and CEO of NVIDIA. "The new GH200 Grace Hopper Superchip platform delivers this with exceptional memory technology and bandwidth to improve throughput, the ability to connect GPUs to aggregate performance without compromise, and a server design that can be easily deployed across the entire data center."

NVIDIA Proposes that AI Will Accelerate Climate Research Innovation

AI and accelerated computing will help climate researchers achieve the miracles they need to achieve breakthroughs in climate research, NVIDIA founder and CEO Jensen Huang said during a keynote Monday at the Berlin Summit for the Earth Virtualization Engines initiative. "Richard Feynman once said that "what I can't create, I don't understand" and that's the reason why climate modeling is so important," Huang told 180 attendees at the Harnack House in Berlin, a storied gathering place for the region's scientific and research community. "And so the work that you do is vitally important to policymakers to researchers to the industry," he added.

To advance this work, the Berlin Summit brings together participants from around the globe to harness AI and high-performance computing for climate prediction. In his talk, Huang outlined three miracles that will have to happen for climate researchers to achieve their goals, and touched on NVIDIA's own efforts to collaborate with climate researchers and policymakers with its Earth-2 efforts. The first miracle required will be to simulate the climate fast enough, and with a high enough resolution - on the order of just a couple of square kilometers.

NVIDIA Ada Lovelace Successor Set for 2025

According to the NVIDIA roadmap that was spotted in the recently published MLCommons training results, the Ada Lovelace successor is set to come in 2025. The roadmap also reveals the schedule for Hopper Next and Grace Next GPUs, as well as the BlueField-4 DPU.

While the roadmap does not provide a lot of details, it does give us a general idea of when to expect NVIDIA's next GeForce architecture. Since NVIDIA usually launches a new GeForce architecture every two years or so, the latest schedule might sound like a small delay, at least if it plans to launch the Ada Lovelace Next in early 2025 and not later. NVIDIA Pascal was launched in May 2016, Turing in September 2018, Ampere in May 2020, and Ada Lovelace in October 2022.

NVIDIA H100 GPUs Set Standard for Generative AI in Debut MLPerf Benchmark

In a new industry-standard benchmark, a cluster of 3,584 H100 GPUs at cloud service provider CoreWeave trained a massive GPT-3-based model in just 11 minutes. Leading users and industry-standard benchmarks agree: NVIDIA H100 Tensor Core GPUs deliver the best AI performance, especially on the large language models (LLMs) powering generative AI.

H100 GPUs set new records on all eight tests in the latest MLPerf training benchmarks released today, excelling on a new MLPerf test for generative AI. That excellence is delivered both per-accelerator and at-scale in massive servers. For example, on a commercially available cluster of 3,584 H100 GPUs co-developed by startup Inflection AI and operated by CoreWeave, a cloud service provider specializing in GPU-accelerated workloads, the system completed the massive GPT-3-based training benchmark in less than eleven minutes.

NVIDIA H100 Hopper GPU Tested for Gaming, Slower Than Integrated GPU

NVIDIA's H100 Hopper GPU is a device designed for pure AI and other compute workloads, with the least amount of consideration for gaming workloads that involve graphics processing. However, it is still interesting to see how this 30,000 USD GPU fairs in comparison to other gaming GPUs and whether it is even possible to run games on it. It turns out that it is technically feasible but not making much sense, as the Chinese YouTube channel Geekerwan notes. Based on the GH100 GPU SKU with 14,592 CUDA, the H100 PCIe version tested here can achieve 204.9 TeraFLOPS at FP16, 51.22 TeraFLOPS at FP32, and 25.61 TeraFLOPS at FP64, with its natural power laying in accelerating AI workloads.

However, how does it fare in gaming benchmarks? Not very well, as the testing shows. It scored 2681 points in 3DMark Time Spy, which is lower than AMD's integrated Radeon 680M, which managed to score 2710 points. Interestingly, the GH100 has only 24 ROPs (render output units), while the gaming-oriented GA102 (highest-end gaming GPU SKU) has 112 ROPs. This is self-explanatory and provides a clear picture as to why the H100 GPU is used for computing only. Since it doesn't have any display outputs, the system needed another regular GPU to provide the picture, while the computation happened on the H100 GPU.

NVIDIA Grace Drives Wave of New Energy-Efficient Arm Supercomputers

NVIDIA today announced a supercomputer built on the NVIDIA Grace CPU Superchip, adding to a wave of new energy-efficient supercomputers based on the Arm Neoverse platform. The Isambard 3 supercomputer to be based at the Bristol & Bath Science Park, in the U.K., will feature 384 Arm-based NVIDIA Grace CPU Superchips to power medical and scientific research, and is expected to deliver 6x the performance and energy efficiency of Isambard 2, placing it among Europe's most energy-efficient systems.

It will achieve about 2.7 petaflops of FP64 peak performance and consume less than 270 kilowatts of power, ranking it among the world's three greenest non-accelerated supercomputers. The project is being led by the University of Bristol, as part of the research consortium the GW4 Alliance, together with the universities of Bath, Cardiff and Exeter.

TechPowerUp GPU-Z v2.53.0 Released

Today, we are releasing the latest version of TechPowerUp GPU-Z, the popular graphics sub-system information and diagnostic utility. Version 2.53.0 adds support for a large number of new and rare GPUs. Among the NVIDIA GPUs support is added for include the GeForce RTX 4070, RTX 4090, RTX 4080, RTX 4060, and RTX 4050; pro-vis RTX 6000 Ada, RTX 3060 Laptop GPU (based on GA104), RTX 3050 Laptop GPU 6 GB, RTX 2050; and compute accelerators Hopper H100 PCIe AIC, and a rare engineering sample of the RTX 2080 Ti. From the AMD camp, support is added for Radeon RX 7600S, Radeon Pro W6900X, Pro V620, and the iGPU of Ryzen "Mendocino" laptop processors. From the Intel side, we've added support for Intel "Raptor Lake-HX," "Alder Lake-N," "Alder Lake-U," and the UHD P750 iGPU found with certain "Rocket Lake" processors.

DOWNLOAD: TechPowerUp GPU-Z v2.53.0

Chinese GPU Maker Biren Technology Loses its Co-Founder, Only Months After Revealing New GPUs

Golf Jiao, a co-founder and general manager of Biren Technology, has left the company late last month according to insider sources in China. No official statement has been issued by the executive team at Biren Tech, and Jiao has not provided any details regarding his departure from the fabless semiconductor design company. The Shanghai-based firm is a relatively new startup - it was founded in 2019 by several former NVIDIA, Qualcomm and Alibaba veterans. Biren Tech received $726.6 million in funding for its debut range of general-purpose graphics processing units (GPGPUs), also defined as high-performance computing graphics processing units (HPC GPUs).

The company revealed its ambitions to take on NVIDIA's Ampere A100 and Hopper H100 compute platforms, and last August announced two HPC GPUs in the form of the BR100 and BR104. The specifications and performance charts demonstrated impressive figures, but Biren Tech had to roll back its numbers when it was hit by U.S Government enforced sanctions in October 2022. The fabless company had contracted with TSMC to produce its Biren range, and the new set of rules resulted in shipments from the Taiwanese foundry being halted. Biren Tech cut its work force by a third soon after losing its supply chain with TSMC, and the engineering team had to reassess how the BR100 and BR104 would perform on a process node larger than the original 7 nm design. It was decided that a downgrade in transfer rates would appease the legal teams, and get newly redesigned Biren silicon back onto the assembly line.

NVIDIA Hopper GPUs Expand Reach as Demand for AI Grows

NVIDIA and key partners today announced the availability of new products and services featuring the NVIDIA H100 Tensor Core GPU—the world's most powerful GPU for AI—to address rapidly growing demand for generative AI training and inference. Oracle Cloud Infrastructure (OCI) announced the limited availability of new OCI Compute bare-metal GPU instances featuring H100 GPUs. Additionally, Amazon Web Services announced its forthcoming EC2 UltraClusters of Amazon EC2 P5 instances, which can scale in size up to 20,000 interconnected H100 GPUs. This follows Microsoft Azure's private preview announcement last week for its H100 virtual machine, ND H100 v5.

Additionally, Meta has now deployed its H100-powered Grand Teton AI supercomputer internally for its AI production and research teams. NVIDIA founder and CEO Jensen Huang announced during his GTC keynote today that NVIDIA DGX H100 AI supercomputers are in full production and will be coming soon to enterprises worldwide.

NVIDIA, ASML, TSMC and Synopsys Set Foundation for Next-Generation Chip Manufacturing

NVIDIA today announced a breakthrough that brings accelerated computing to the field of computational lithography, enabling semiconductor leaders like ASML, TSMC and Synopsys to accelerate the design and manufacturing of next-generation chips, just as current production processes are nearing the limits of what physics makes possible.

The new NVIDIA cuLitho software library for computational lithography is being integrated by TSMC, the world's leading foundry, as well as electronic design automation leader Synopsys into their software, manufacturing processes and systems for the latest-generation NVIDIA Hopper architecture GPUs. Equipment maker ASML is working closely with NVIDIA on GPUs and cuLitho, and is planning to integrate support for GPUs into all of its computational lithography software products.

NVIDIA Announces New System for Accelerated Quantum-Classical Computing

NVIDIA today announced a new system built with Quantum Machines that provides a revolutionary new architecture for researchers working in high-performance and low-latency quantum-classical computing. The world's first GPU-accelerated quantum computing system, the NVIDIA DGX Quantum brings together the world's most powerful accelerated computing platform - enabled by the NVIDIA Grace Hopper Superchip and CUDA Quantum open-source programming model - with the world's most advanced quantum control platform, OPX, by Quantum Machines.

The combination allows researchers to build extraordinarily powerful applications that combine quantum computing with state-of-the-art classical computing, enabling calibration, control, quantum error correction and hybrid algorithms. "Quantum-accelerated supercomputing has the potential to reshape science and industry with capabilities that can serve humanity in enormous ways," said Tim Costa, director of HPC and quantum at NVIDIA. "NVIDIA DGX Quantum will enable researchers to push the boundaries of quantum-classical computing."

Microsoft Azure Announces New Scalable Generative AI VMs Featuring NVIDIA H100

Microsoft Azure announced their new ND H100 v5 virtual machine which packs Intel's Sapphire Rapids Xeon Scalable processors with NVIDIA's Hopper H100 GPUs, as well as NVIDIA's Quantum-2 CX7 interconnect. Inside each physical machine sits eight H100s—presumably the SXM5 variant packing a whopping 132 SMs and 528 4th generation tensor cores—interconnected by NVLink 4.0 which ties them all together with 3.6 TB/s bisectional bandwidth. Outside each local machine is a network of thousands more H100s connected together with 400 GB/s Quantum-2 CX7 InfiniBand, which Microsoft says allows 3.2 Tb/s per VM for on-demand scaling to accelerate the largest AI training workloads.

Generative AI solutions like ChatGPT have accelerated demand for multi-ExaOP cloud services that can handle the large training sets and utilize the latest development tools. Azure's new ND H100 v5 VMs offer that capability to organizations of any size, whether you're a smaller startup or a larger company looking to implement large-scale AI training deployments. While Microsoft is not making any direct claims for performance, NVIDIA has advertised H100 as running up to 30x faster than the preceding Ampere architecture that is currently offered with the ND A100 v4 VMs.

NVIDIA Could Launch Hopper H100 PCIe GPU with 120 GB Memory

NVIDIA's high-performance computing hardware stack is now equipped with the top-of-the-line Hopper H100 GPU. It features 16896 or 14592 CUDA cores, developing if it comes in SXM5 of PCIe variant, with the former being more powerful. Both variants come with a 5120-bit interface, with the SXM5 version using HBM3 memory running at 3.0 Gbps speed and the PCIe version using HBM2E memory running at 2.0 Gbps. Both versions use the same capacity capped at 80 GBs. However, that could soon change with the latest rumor suggesting that NVIDIA could be preparing a PCIe version of Hopper H100 GPU with 120 GBs of an unknown type of memory installed.

According to the Chinese website "s-ss.cc" the 120 GB variant of the H100 PCIe card will feature an entire GH100 chip with everything unlocked. As the site suggests, this version will improve memory capacity and performance over the regular H100 PCIe SKU. With HPC workloads increasing in size and complexity, more significant memory allocation is needed for better performance. With the recent advances in Large Language Models (LLMs), AI workloads use trillions of parameters for tranining, most of which is done on GPUs like NVIDIA H100.

NVIDIA Ada's 4th Gen Tensor Core, 3rd Gen RT Core, and Latest CUDA Core at a Glance

Yesterday, NVIDIA launched its GeForce RTX 40-series, based on the "Ada" graphics architecture. We're yet to receive a technical briefing about the architecture itself, and the various hardware components that make up the silicon; but NVIDIA on its website gave us a first look at what's in store with the key number-crunching components of "Ada," namely the Ada CUDA core, 4th generation Tensor core, and 3rd generation RT core. Besides generational IPC and clock speed improvements, the latest CUDA core benefits from SER (shader execution reordering), an SM or GPC-level feature that reorders execution waves/threads to optimally load each CUDA core and improve parallelism.

Despite using specialized hardware such as the RT cores, the ray tracing pipeline still relies on CUDA cores and the CPU for a handful tasks, and here NVIDIA claims that SER contributes to a 3X ray tracing performance uplift (the performance contribution of CUDA cores). With traditional raster graphics, SER contributes a meaty 25% performance uplift. With Ada, NVIDIA is introducing its 4th generation of Tensor core (after Volta, Turing, and Ampere). The Tensor cores deployed on Ada are functionally identical to the ones on the Hopper H100 Tensor Core HPC processor, featuring the new FP8 Transformer Engine, which delivers up to 5X the AI inference performance over the previous generation Ampere Tensor Core (which itself delivered a similar leap by leveraging sparsity).

NVIDIA Rush-Orders A100 and H100 AI-GPUs with TSMC Before US Sanctions Hit

Early this month, the US Government banned American companies from exporting AI-acceleration GPUs to China and Russia, but these restrictions don't take effect before March 2023. This gives NVIDIA time to take rush-orders from Chinese companies for its AI-accelerators before the sanctions hit. The company has placed "rush orders" for a large quantity of A100 "Ampere" and H100 "Hopper" chips with TSMC, so they could be delivered to firms in China before March 2023, according to a report by Chinese business news publication UDN. The rush-orders for high-margin products such as AI-GPUs, could come as a shot in the arm for NVIDIA, which is facing a sudden loss in gaming GPU revenues, as those chips are no longer in demand from crypto-currency miners.

NVIDIA Hopper Features "SM-to-SM" Comms Within GPC That Minimize Cache Roundtrips and Boost Multi-Instance Performance

NVIDIA in its HotChips 34 presentation revealed a defining feature of its "Hopper" compute architecture that works to increase parallelism and help the H100 processor better perform in a multi-instance environment. The hardware component hierarchy of "Hopper" is typical of NVIDIA architectures, with GPCs, SMs, and CUDA cores forming a hierarchy. The company is introducing a new component it calls "SM to SM Network." This is a high-bandwidth communications fabric inside the Graphics Processing Cluster (GPC), which facilitates direct communication among the SMs without making round-trips to the cache or memory hierarchy, play a significant role in NVIDIA's overarching claim of "6x throughput gain over the A100."

Direct SM-to-SM communication not just impacts latency, but also unburdens the L2 cache, letting NVIDIA's memory-management free up the cache of "cooler" (infrequently accessed) data. CUDA sees every GPU as a "grid," every GPC as a "Cluster," every SM as a "thread block," and every lane of SIMD units as a "lane." Each lane has a 64 KB of shared memory, which makes up 256 KB of shared local storage per SM as there are four lanes. The GPCs interface with 50 MB of L2 cache, which is the last-level on-die cache before the 80 GB of HBM3 serves as main memory.

NVIDIA Grace CPU Specs Remind Us Why Intel Never Shared x86 with the Green Team

NVIDIA designed the Grace CPU, a processor in the classical sense, to replace the Intel Xeon or AMD EPYC processors it was having to cram into its pre-built HPC compute servers for serial-processing roles, and mainly because those half-a-dozen GPU HPC processors need to be interconnected by a CPU. The company studied the CPU-level limitations and bottlenecks not just with I/O, but also the machine-architecture, and realized its compute servers need a CPU purpose-built for the role, with an architecture that's heavily optimized for NVIDIA's APIs. This, the NVIDIA Grace CPU was born.

This is NVIDIA's first outing with a CPU with a processing footprint rivaling server processors from Intel and AMD. Built on the TSMC N4 (4 nm EUV) silicon fabrication process, it is a monolithic chip that's deployed standalone with an H100 HPC processor on a single board that NVIDIA calls a "Superchip." A board with a Grace and an H100, makes up a "Grace Hopper" Superchip. A board with two Grace CPUs makes a Grace CPU Superchip. Each Grace CPU contains a 900 GB/s switching fabric, a coherent interface, which has seven times the bandwidth of PCI-Express 5.0 x16. This is key to connecting the companion H100 processor, or neighboring Superchips on the node, with coherent memory access.
Return to Keyword Browsing
May 15th, 2024 14:17 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts