News Posts matching #H100

Return to Keyword Browsing

Supermicro Adds New 8U Universal GPU Server for AI Training, NVIDIA Omniverse, and Meta

Super Micro Computer, Inc. (SMCI), a global leader in enterprise computing, storage, networking solutions, and green computing technology, is announcing its most advanced GPU server, incorporating eight NVIDIA H100 Tensor Core GPUs. Due to its advanced airflow design, the new high-end GPU system will allow increased inlet temperatures, reducing a data center's overall Power Usage Effectiveness (PUE) while maintaining the absolute highest performance profile. In addition, Supermicro is expanding its GPU server lineup with this new Universal GPU server, which is already the largest in the industry. Supermicro now offers three distinct Universal GPU systems: the 4U,5U, and new 8U 8GPU server. The Universal GPU platforms support both current and future Intel and AMD CPUs -- up to 400 W, 350 W, and higher.

"Supermicro is leading the industry with an extremely flexible and high-performance GPU server, which features the powerful NVIDIA A100 and H100 GPU," said Charles Liang, president, and CEO, of Supermicro. "This new server will support the next generation of CPUs and GPUs and is designed with maximum cooling capacity using the same chassis. We constantly look for innovative ways to deliver total IT Solutions to our growing customer base."

U.S. Government Restricts Export of AI Compute GPUs to China and Russia (Affects NVIDIA, AMD, and Others)

The U.S. Government has imposed restrictions on the export of AI compute GPUs to China and Russia without Government-authorization in the form of a waiver or a license. This impacts sales of products such as the NVIDIA A100, H100; AMD Instinct MI100, MI200; and the upcoming Intel "Ponte Vecchio," among others. The restrictions came to light when NVIDIA on Wednesday disclosed that it has received a Government notification about licensing requirements for export of its AI compute GPUs to Russia and China.

The notification doesn't specify the A100 and H100 by name, but defines AI inference performance thresholds to meet the licensing requirements. The Government wouldn't single out NVIDIA, and so competing products such as the AMD MI200 and the upcoming Intel Xe-HP "Ponte Vecchio" would fall within these restrictions. For NVIDIA, this is impacts $400 million in TAM, unless the Government licenses specific Russian and Chinese customers to purchase these GPUs from NVIDIA. Such trade restrictions usually come with riders to prevent resale or transshipment by companies outside the restricted region (eg: a distributor in a third waived country importing these chips in bulk and reselling them to these countries).

NVIDIA Hopper Features "SM-to-SM" Comms Within GPC That Minimize Cache Roundtrips and Boost Multi-Instance Performance

NVIDIA in its HotChips 34 presentation revealed a defining feature of its "Hopper" compute architecture that works to increase parallelism and help the H100 processor better perform in a multi-instance environment. The hardware component hierarchy of "Hopper" is typical of NVIDIA architectures, with GPCs, SMs, and CUDA cores forming a hierarchy. The company is introducing a new component it calls "SM to SM Network." This is a high-bandwidth communications fabric inside the Graphics Processing Cluster (GPC), which facilitates direct communication among the SMs without making round-trips to the cache or memory hierarchy, play a significant role in NVIDIA's overarching claim of "6x throughput gain over the A100."

Direct SM-to-SM communication not just impacts latency, but also unburdens the L2 cache, letting NVIDIA's memory-management free up the cache of "cooler" (infrequently accessed) data. CUDA sees every GPU as a "grid," every GPC as a "Cluster," every SM as a "thread block," and every lane of SIMD units as a "lane." Each lane has a 64 KB of shared memory, which makes up 256 KB of shared local storage per SM as there are four lanes. The GPCs interface with 50 MB of L2 cache, which is the last-level on-die cache before the 80 GB of HBM3 serves as main memory.

NVIDIA Grace CPU Specs Remind Us Why Intel Never Shared x86 with the Green Team

NVIDIA designed the Grace CPU, a processor in the classical sense, to replace the Intel Xeon or AMD EPYC processors it was having to cram into its pre-built HPC compute servers for serial-processing roles, and mainly because those half-a-dozen GPU HPC processors need to be interconnected by a CPU. The company studied the CPU-level limitations and bottlenecks not just with I/O, but also the machine-architecture, and realized its compute servers need a CPU purpose-built for the role, with an architecture that's heavily optimized for NVIDIA's APIs. This, the NVIDIA Grace CPU was born.

This is NVIDIA's first outing with a CPU with a processing footprint rivaling server processors from Intel and AMD. Built on the TSMC N4 (4 nm EUV) silicon fabrication process, it is a monolithic chip that's deployed standalone with an H100 HPC processor on a single board that NVIDIA calls a "Superchip." A board with a Grace and an H100, makes up a "Grace Hopper" Superchip. A board with two Grace CPUs makes a Grace CPU Superchip. Each Grace CPU contains a 900 GB/s switching fabric, a coherent interface, which has seven times the bandwidth of PCI-Express 5.0 x16. This is key to connecting the companion H100 processor, or neighboring Superchips on the node, with coherent memory access.

Intel Claims "Ponte Vecchio" Will Trade Blows with NVIDIA Hopper in Most Compute Workloads

With AMD and NVIDIA launching its next-generation HPC compute architectures, "Hopper" and CDNA2, it began seeming like Intel's ambitious "Ponte Vecchio" accelerator based on the Xe-HP architecture, has missed the time-to-market bus. Intel doesn't think so, and in its Hot Chips 34 presentation, disclosed some of the first detailed performance claims that—at least on paper—put the "Hopper" H100 accelerator's published compute performance numbers to shame. We already had some idea of how Ponte Vecchio would perform this spring, at Intel's ISC'22 presentation, but the company hadn't finalized the product's power and thermal characteristics, which are determined by its clock-speed and boosting behavior. Team blue claims to have gotten over the final development hurdles, and is ready with some big numbers.

Intel claims that in classic FP32 (single-precision) and FP64 (double-precision) floating-point tests, its silicon is highly competitive with the H100 "Hopper," with the company claiming 52 TFLOP/s FP32 for the "Ponte Vecchio," compared to 60 TFLOP/s for the H100; and a significantly higher 52 TFLOP/s FP64 for the "Ponte Vecchio," compared to 30 TFLOP/s for the H100. This has to do with the SIMD units of the Xe-HP architecture all being natively capable of double-precision floating-point operations; whereas NVIDIA's architecture typically relies on FP64-specialized streaming multiprocessors.

Tachyum Submits Bid for 20-Exaflop Supercomputer to U.S. Department of Energy Advanced Computing Ecosystems

Tachyum today announced that it has responded to a U.S. Department of Energy Request for Information soliciting Advanced Computing Ecosystems for DOE national laboratories engaged in scientific and national security research. Tachyum has submitted a proposal to create a 20-exaflop supercomputer based on Tachyum's Prodigy, the world's first universal processor.

The DOE's request calls for computing systems that are five to 10 times faster than those currently available and/or that can perform more complex applications in "data science, artificial intelligence, edge deployments at facilities, and science ecosystem problems, in addition to the traditional modeling and simulation applications."

NVIDIA PrefixRL Model Designs 25% Smaller Circuits, Making GPUs More Efficient

When designing integrated circuits, engineers aim to produce an efficient design that is easier to manufacture. If they manage to keep the circuit size down, the economics of manufacturing that circuit is also going down. NVIDIA has posted on its technical blog a technique where the company uses an artificial intelligence model called PrefixRL. Using deep reinforcement learning, NVIDIA uses the PrefixRL model to outperform traditional EDA (Electronics Design Automation) tools from major vendors such as Cadence, Synopsys, or Siemens/Mentor. EDA vendors usually implement their in-house AI solution to silicon placement and routing (PnR); however, NVIDIA's PrefixRL solution seems to be doing wonders in the company's workflow.

Creating a deep reinforcement learning model that aims to keep the latency the same as the EDA PnR attempt while achieving a smaller die area is the goal of PrefixRL. According to the technical blog, the latest Hopper H100 GPU architecture uses 13,000 instances of arithmetic circuits that the PrefixRL AI model designed. NVIDIA produced a model that outputs a 25% smaller circuit than comparable EDA output. This is all while achieving similar or better latency. Below, you can compare a 64-bit adder design made by PrefixRL and the same design made by an industry-leading EDA tool.

SK hynix to Supply Industry's First HBM3 DRAM to NVIDIA

SK hynix announced that it began mass production of HBM3, the world's best-performing DRAM. The announcement comes just seven months after the company became the first in the industry to develop HBM3 in October, and is expected to solidify the company's leadership in the premium DRAM market. With accelerating advancements in cutting-edge technologies such as artificial intelligence and big data, major global tech companies are seeking ways to quickly process rapidly increasing volumes of data. HBM, with significant competitiveness in data processing speed and performance compared with traditional DRAM, is expected to draw broad industry attention and see rising adoption.

NVIDIA has recently completed its performance evaluation of SK hynix's HBM3 samples. SK hynix will provide HBM3 for NVIDIA systems expected to ship starting in the third quarter of this year. SK hynix will expand HBM3 volume in the first half in accordance with NVIDIA's schedule. The highly anticipated NVIDIA H100 is the world's largest and most powerful accelerator. SK hynix's HBM3 is expected to enhance accelerated computing performance with up to 819 GB/s of memory bandwidth, equivalent to the transmission of 163 FHD (Full-HD) movies (5 GB standard) every second.

NVIDIA H100 SXM Hopper GPU Pictured Up Close

ServeTheHome, a tech media outlet focused on everything server/enterprise, posted an exclusive set of photos of NVIDIA's latest H100 "Hopper" accelerator. Being the fastest GPU NVIDIA ever created, H100 is made on TSMC's 4 nm manufacturing process and features over 80 billion transistors on an 814 mm² CoWoS package designed by TSMC. Complementing the massive die, we have 80 GB of HBM3 memory that sits close to the die. Pictured below, we have an SXM5 H100 module packed with VRM and power regulation. Given that the rated TDP for this GPU is 700 Watts, power regulation is a serious concern and NVIDIA managed to keep it in check.

On the back of the card, we see one short and one longer mezzanine connector that acts as a power delivery connector, different from the previous A100 GPU layout. This board model is labeled PG520 and is very close to the official renders that NVIDIA supplied us with on launch day.

NVIDIA Hopper Whitepaper Reveals Key Specs of Monstrous Compute Processor

The NVIDIA GH100 silicon powering the next-generation NVIDIA H100 compute processor is a monstrosity on paper, with an NVIDIA whitepaper published over the weekend revealing its key specifications. NVIDIA is tapping into the most advanced silicon fabrication node currently available from TSMC to build the compute die, which is TSMC N4 (4 nm-class EUV). The H100 features a monolithic silicon surrounded by up to six on-package HBM3 stacks.

The GH100 compute die is built on the 4 nm EUV process, and has a monstrous transistor-count of 80 billion, a nearly 50% increase over the GA100. Interestingly though, at 814 mm², the die-area of the GH100 is less than that of the GA100, with its 826 mm² die built on the 7 nm DUV (TSMC N7) node, all thanks to the transistor-density gains of the 4 nm node over the 7 nm one.

NVIDIA H100 is a Compute Monster with 80 Billion Transistors, New Compute Units and HBM3 Memory

During the GTC 2022 keynote, NVIDIA announced its newest addition to the accelerator cards family. Called NVIDIA H100 accelerator, it is the company's most powerful creation ever. Utilizing 80 billion of TSMC's 4N 4 nm transistors, H100 can output some insane performance, according to NVIDIA. Featuring a new fourth-generation Tensor Core design, it can deliver a six-fold performance increase compared to A100 Tensor Cores and a two-fold MMA (Matrix Multiply Accumulate) improvement. Additionally, new DPX instructions accelerate Dynamic Programming algorithms up to seven times over the previous A100 accelerator. Thanks to the new Hopper architecture, the Streaming Module structure has been optimized for better transfer of large data blocks.

The full GH100 chip implementation features 144 SMs, and 128 FP32 CUDA cores per SM, resulting in 18,432 CUDA cores at maximum configuration. The NVIDIA H100 GPU with SXM5 board form-factor features 132 SMs, totaling 16,896 CUDA cores, while the PCIe 5.0 add-in card has 114 SMs, totaling 14,592 CUDA cores. As much as 80 GB of HBM3 memory surrounds the GPU at 3 TB/s bandwidth. Interestingly, the SXM5 variant features a very large TDP of 700 Watts, while the PCIe card is limited to 350 Watts. This is the result of better cooling solutions offered for the SXM form-factor. As far as performance figures are concerned, the SXM and PCIe versions provide two distinctive figures for each implementation. You can check out the performance estimates in various precision modes below. You can read more about the Hopper architecture and what makes it special in this whitepaper published by NVIDIA.
NVIDIA H100

NVIDIA Announces Hopper Architecture, the Next Generation of Accelerated Computing

GTC—To power the next wave of AI data centers, NVIDIA today announced its next-generation accelerated computing platform with NVIDIA Hopper architecture, delivering an order of magnitude performance leap over its predecessor. Named for Grace Hopper, a pioneering U.S. computer scientist, the new architecture succeeds the NVIDIA Ampere architecture, launched two years ago.

The company also announced its first Hopper-based GPU, the NVIDIA H100, packed with 80 billion transistors. The world's largest and most powerful accelerator, the H100 has groundbreaking features such as a revolutionary Transformer Engine and a highly scalable NVIDIA NVLink interconnect for advancing gigantic AI language models, deep recommender systems, genomics and complex digital twins.

NVIDIA to Split Graphics and Compute Architecture Naming, "Blackwell" Architecture Spotted

The recent NVIDIA data-leak springs up information on various upcoming graphics parts. Besides "Ada Lovelace," "Hopper," we come across a new codename, "Blackwell." It turns out that NVIDIA is splitting the the graphics and compute architecture naming with the next generation, not unlike what AMD did, with its RDNA and CDNA series. The current "Ampere" architecture is being used both for compute and graphics, with the streaming multiprocessor for the two being slightly different—the compute "Ampere" has more FP64 and Tensor components, while the graphics "Ampere" does away with these in favor of RT cores and graphics-relevant components.

The graphics architecture to succeed GeForce "Ampere" will be GeForce "Ada Lovelace." GPUs in this series are identified in the leaked code as "AD102," "AD103," "AD104," "AD106," "AD107," and "AD10B," succeeding a similar numbering for parts with the "A" (GeForce Ampere) series. The compute architecture succeeding "Ampere" will be codenamed "Hopper." with parts in the series being codenamed "GH100" and "GH202." Another compute or datacenter architecture is "Blackwell," with parts being codenamed "GB100" and "GB102." From all accounts, NVIDIA is planning to launch the GeForce 40-series "Ada" graphics card lineup in the second half of 2022. The company is in need of a similar refresh for its compute product lineup, and could debut "Hopper" either toward the end of 2022 or next year. "Blackwell" could follow "Hopper."

NVIDIA Multi-Chip-Module Hopper GPU Rumored To Tape Out Soon

Hopper is an upcoming compute architecture from NVIDIA which will be the first from the company to feature a Multi-Chip-Module (MCM) design similar to Intel's Xe-HPC and AMD's upcoming CDNA2. The Hopper architecture has been teased for over 2 years but it would appear that it is nearing completion with a recent leak suggesting the product will tape out soon. This compute GPU will likely be manufactured on TSMC's 5 nm node and could feature two dies each with 288 Streaming Microprocessors which could theoretically provide a three-fold performance improvement over the Ampere-based NVIDIA A100. The first product to feature the GPU is expected to be the NVIDIA H100 data center accelerator which will serve as a successor to the A100 and could potentially launch in mid-2022.
Return to Keyword Browsing
May 21st, 2024 10:16 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts