News Posts matching #NVIDIA H100

Return to Keyword Browsing

NVIDIA DGX H100 Systems are Now Shipping

Customers from Japan to Ecuador and Sweden are using NVIDIA DGX H100 systems like AI factories to manufacture intelligence. They're creating services that offer AI-driven insights in finance, healthcare, law, IT and telecom—and working to transform their industries in the process. Among the dozens of use cases, one aims to predict how factory equipment will age, so tomorrow's plants can be more efficient.

Called Green Physics AI, it adds information like an object's CO2 footprint, age and energy consumption to SORDI.ai, which claims to be the largest synthetic dataset in manufacturing.

NVIDIA H100 Compared to A100 for Training GPT Large Language Models

NVIDIA's H100 has recently become available to use via Cloud Service Providers (CSPs), and it was only a matter of time before someone decided to benchmark its performance and compare it to the previous generation's A100 GPU. Today, thanks to the benchmarks of MosaicML, a startup company led by the ex-CEO of Nervana and GM of Artificial Intelligence (AI) at Intel, Naveen Rao, we have some comparison between these two GPUs with a fascinating insight about the cost factor. Firstly, MosaicML has taken Generative Pre-trained Transformer (GPT) models of various sizes and trained them using bfloat16 and FP8 Floating Point precision formats. All training occurred on CoreWeave cloud GPU instances.

Regarding performance, the NVIDIA H100 GPU achieved anywhere from 2.2x to 3.3x speedup. However, an interesting finding emerges when comparing the cost of running these GPUs in the cloud. CoreWeave prices the H100 SXM GPUs at $4.76/hr/GPU, while the A100 80 GB SXM gets $2.21/hr/GPU pricing. While the H100 is 2.2x more expensive, the performance makes it up, resulting in less time to train a model and a lower price for the training process. This inherently makes H100 more attractive for researchers and companies wanting to train Large Language Models (LLMs) and makes choosing the newer GPU more viable, despite the increased cost. Below, you can see tables of comparison between two GPUs in training time, speedup, and cost of training.

Gigabyte Extends Its Leading GPU Portfolio of Servers

Giga Computing, a subsidiary of GIGABYTE and an industry leader in high-performance servers, server motherboards, and workstations, today announced a lineup of powerful GPU-centric servers with the latest AMD and Intel CPUs, including NVIDIA HGX H100 servers with both 4-GPU and 8-GPU modules. With growing interest in HPC and AI applications, specifically generative AI (GAI), this breed of server relies heavily on GPU resources to tackle compute-heavy workloads that handle large amounts of data. With the advent of OpenAI's ChatGPT and other AI chatbots, large GPU clusters are being deployed with system-level optimization to train large language models (LLMs). These LLMs can be processed by GIGABYTE's new design-optimized systems that offer a high level of customization based on users' workloads and requirements.

The GIGABYTE G-series servers are built first and foremost to support dense GPU compute and the latest PCIe technology. Starting with the 2U servers, the new G293 servers can support up to 8 dual-slot GPUs or 16 single-slot GPUs, depending on the server model. For the ultimate in CPU and GPU performance, the 4U G493 servers offer plenty of networking options and storage configurations to go alongside support for eight (Gen 5 x16) GPUs. And for the highest level of GPU compute for HPC and AI, the G393 & G593 series support NVIDIA H100 Tensor Core GPUs. All these new two CPU socket servers are designed for either 4th Gen AMD EPYC processors or 4th Gen Intel Xeon Scalable processors.

NVIDIA H100 AI Performance Receives up to 54% Uplift with Optimizations

On Wednesday, the MLCommons team released the MLPerf 3.0 Inference numbers, and there was an exciting submission from NVIDIA. Reportedly, NVIDIA has used software optimization to improve the already staggering performance of its latest H100 GPU by up to 54%. For reference, NVIDIA's H100 GPU first appeared on MLPerf 2.1 back in September of 2022. In just six months, NVIDIA engineers worked on AI optimizations for the MLPerf 3.0 release to find that basic software optimization can catalyze performance increases anywhere from 7-54%. The workloads for measuring the inferencing speed suite included RNN-T speech recognition, 3D U-Net medical imaging, RetinaNet object detection, ResNet-50 object classification, DLRM recommendation, and BERT 99/99.9% natural language processing.

What is interesting is that NVIDIA's submission is a bit modified. There are open and closed categories that vendors have to compete in, where closed is the mathematical equivalent of a neural network. In contrast, the open category is flexible and allows vendors to submit results based on optimizations for their hardware. The closed submission aims to provide an "apples-to-apples" hardware comparison. Given that NVIDIA opted to use the closed category, performance optimization of other vendors such as Intel and Qualcomm are not accounted for here. Still, it is interesting that optimization can lead to a performance increase of up to 54% in NVIDIA's case with its H100 GPU. Another interesting takeaway is that some comparable hardware, like Qualcomm Cloud AI 100, Intel Xeon Platinum 8480+, and NeuChips's ReccAccel N3000, failed to finish all the workloads. This is shown as "X" on the slides made by NVIDIA, stressing the need for proper ML system software support, which is NVIDIA's strength and an extensive marketing claim.

Supermicro Servers Now Featuring NVIDIA HGX and PCIe-Based H100 8-GPU Systems

Supermicro, Inc., a Total IT Solution Provider for AI/ML, Cloud, Storage, and 5G/Edge, today has announced that it has begun shipping its top-of-the-line new GPU servers that feature the latest NVIDIA HGX H100 8-GPU system. Supermicro servers incorporate the new NVIDIA L4 Tensor Core GPU in a wide range of application-optimized servers from the edge to the data center.

"Supermicro offers the most comprehensive portfolio of GPU systems in the industry, including servers in 8U, 6U, 5U, 4U, 2U, and 1U form factors, as well as workstations and SuperBlade systems that support the full range of new NVIDIA H100 GPUs," said Charles Liang, president and CEO of Supermicro. "With our new NVIDIA HGX H100 Delta-Next server, customers can expect 9x performance gains compared to the previous generation for AI training applications. Our GPU servers have innovative airflow designs which reduce fan speeds, lower noise levels, and consume less power, resulting in a reduced total cost of ownership (TCO). In addition, we deliver complete rack-scale liquid-cooling options for customers looking to further future-proof their data centers."

Mitsui and NVIDIA Announce World's First Generative AI Supercomputer for Pharmaceutical Industry

Mitsui & Co., Ltd., one of Japan's largest business conglomerates, is collaborating with NVIDIA on Tokyo-1—an initiative to supercharge the nation's pharmaceutical leaders with technology, including high-resolution molecular dynamics simulations and generative AI models for drug discovery.

Announced today at the NVIDIA GTC global AI conference, the Tokyo-1 project features an NVIDIA DGX AI supercomputer that will be accessible to Japan's pharma companies and startups. The effort is poised to accelerate Japan's $100 billion pharma industry, the world's third largest following the U.S. and China.

NVIDIA Hopper GPUs Expand Reach as Demand for AI Grows

NVIDIA and key partners today announced the availability of new products and services featuring the NVIDIA H100 Tensor Core GPU—the world's most powerful GPU for AI—to address rapidly growing demand for generative AI training and inference. Oracle Cloud Infrastructure (OCI) announced the limited availability of new OCI Compute bare-metal GPU instances featuring H100 GPUs. Additionally, Amazon Web Services announced its forthcoming EC2 UltraClusters of Amazon EC2 P5 instances, which can scale in size up to 20,000 interconnected H100 GPUs. This follows Microsoft Azure's private preview announcement last week for its H100 virtual machine, ND H100 v5.

Additionally, Meta has now deployed its H100-powered Grand Teton AI supercomputer internally for its AI production and research teams. NVIDIA founder and CEO Jensen Huang announced during his GTC keynote today that NVIDIA DGX H100 AI supercomputers are in full production and will be coming soon to enterprises worldwide.

Microsoft Azure Announces New Scalable Generative AI VMs Featuring NVIDIA H100

Microsoft Azure announced their new ND H100 v5 virtual machine which packs Intel's Sapphire Rapids Xeon Scalable processors with NVIDIA's Hopper H100 GPUs, as well as NVIDIA's Quantum-2 CX7 interconnect. Inside each physical machine sits eight H100s—presumably the SXM5 variant packing a whopping 132 SMs and 528 4th generation tensor cores—interconnected by NVLink 4.0 which ties them all together with 3.6 TB/s bisectional bandwidth. Outside each local machine is a network of thousands more H100s connected together with 400 GB/s Quantum-2 CX7 InfiniBand, which Microsoft says allows 3.2 Tb/s per VM for on-demand scaling to accelerate the largest AI training workloads.

Generative AI solutions like ChatGPT have accelerated demand for multi-ExaOP cloud services that can handle the large training sets and utilize the latest development tools. Azure's new ND H100 v5 VMs offer that capability to organizations of any size, whether you're a smaller startup or a larger company looking to implement large-scale AI training deployments. While Microsoft is not making any direct claims for performance, NVIDIA has advertised H100 as running up to 30x faster than the preceding Ampere architecture that is currently offered with the ND A100 v4 VMs.

Supermicro Accelerates A Wide Range of IT Workloads with Powerful New Products Featuring 4th Gen Intel Xeon Scalable Processors

Supermicro, Inc. (NASDAQ: SMCI), a Total IT Solution Provider for Cloud, AI/ML, Storage, and 5G/Edge, will be showcasing its latest generation of systems that accelerate workloads for the entire Telco industry, specifically at the edge of the network. These systems are part of the newly introduced Supermicro Intel-based product line; the better, faster, and greener systems based on the brand new 4th Gen Intel Xeon Scalable processors (formerly codenamed Sapphire Rapids) that deliver up to 60% better workload-optimized performance. From a performance standpoint these new systems that demonstrate up to 30X faster AI inference speedups on large models for AI and edge workloads with the NVIDIA H100 GPUs. In addition, Supermicro systems support the new Intel Data Center GPU Max Series (formerly codenamed Ponte Vecchio) across a wide range of servers. The Intel Data Center GPU Max Series contains up to 128 Xe-HPC cores and will accelerate a range of AI, HPC, and visualization workloads. Supermicro X13 AI systems will support next-generation built-in accelerators and GPUs up to 700 W from Intel, NVIDIA, and others.

Supermicro's wide range of product families is deployed in a broad range of industries to speed up workloads and allow faster and more accurate decisions. With the addition of purpose-built servers tuned for networking workloads, such as Open RAN deployments and private 5G, the 4th Gen Intel Xeon Scalable processor vRAN Boost technology reduces power consumption while improving performance. Supermicro continues to offer a wide range of environmentally friendly servers for workloads from the edge to the data center.

Next-Generation Dell PowerEdge Servers Deliver Advanced Performance and Energy Efficient Design

Dell Technologies expands the industry's top selling server portfolio, with an additional 13 next-generation Dell PowerEdge servers, designed to accelerate performance and reliability for powerful computing across core data centers, large-scale public clouds and edge locations. Next-generation rack, tower and multi-node PowerEdge servers, with 4th Gen Intel Xeon Scalable processors, include Dell software and engineering advancements, such as a new Smart Flow design, to improve energy and cost efficiency. Expanded Dell APEX capabilities will help organizations take an as-a-Service approach, allowing for more effective IT operations that make the most of compute resources while minimizing risk.

"Customers come to Dell for easily managed yet sophisticated and efficient servers with advanced capabilities to power their business-critical workloads," said Jeff Boudreau, president and general manager, Infrastructure Solutions Group, Dell Technologies. "Our next-generation Dell PowerEdge servers offer unmatched innovation that raises the bar in power efficiency, performance and reliability while simplifying how customers can implement a Zero Trust approach for greater security throughout their IT environments."

NVIDIA Pairs 4th Gen Intel Xeon Scalable Processors with H100 GPUs

AI is at the heart of humanity's most transformative innovations—from developing COVID vaccines at unprecedented speeds and diagnosing cancer to powering autonomous vehicles and understanding climate change. Virtually every industry will benefit from adopting AI, but the technology has become more resource intensive as neural networks have increased in complexity. To avoid placing unsustainable demands on electricity generation to run this computing infrastructure, the underlying technology must be as efficient as possible.

Accelerated computing powered by NVIDIA GPUs and the NVIDIA AI platform offer the efficiency that enables data centers to sustainably drive the next generation of breakthroughs. And now, timed with the launch of 4th Gen Intel Xeon Scalable processors, NVIDIA and its partners have kicked off a new generation of accelerated computing systems that are built for energy-efficient AI. When combined with NVIDIA H100 Tensor Core GPUs, these systems can deliver dramatically higher performance, greater scale and higher efficiency than the prior generation, providing more computation and problem-solving per watt.

NVIDIA Announces Financial Results for Third Quarter Fiscal 2023

NVIDIA (NASDAQ: NVDA) today reported revenue for the third quarter ended October 30, 2022, of $5.93 billion, down 17% from a year ago and down 12% from the previous quarter. GAAP earnings per diluted share for the quarter were $0.27, down 72% from a year ago and up 4% from the previous quarter. Non-GAAP earnings per diluted share were $0.58, down 50% from a year ago and up 14% from the previous quarter.

"We are quickly adapting to the macro environment, correcting inventory levels and paving the way for new products," said Jensen Huang, founder and CEO of NVIDIA. "The ramp of our new platforms - Ada Lovelace RTX graphics, Hopper AI computing, BlueField and Quantum networking, Orin for autonomous vehicles and robotics, and Omniverse-is off to a great start and forms the foundation of our next phase of growth.

Hewlett Packard Enterprise Brings HPE Cray EX and HPE Cray XD Supercomputers to Enterprise Customers

Hewlett Packard Enterprise (NYSE: HPE) today announced it is making supercomputing accessible for more enterprises to harness insights, solve problems and innovate faster by delivering its world-leading, energy-efficient supercomputers in a smaller form factor and at a lower price point.

The expanded portfolio includes new HPE Cray EX and HPE Cray XD supercomputers, which are based on HPE's exascale innovation that delivers end-to-end, purpose-built technologies in compute, accelerated compute, interconnect, storage, software, and flexible power and cooling options. The supercomputers provide significant performance and AI-at-scale capabilities to tackle demanding, data-intensive workloads, speed up AI and machine learning initiatives, and accelerate innovation to deliver products and services to market faster.

Meta's Grand Teton Brings NVIDIA Hopper to Its Data Centers

Meta today announced its next-generation AI platform, Grand Teton, including NVIDIA's collaboration on design. Compared to the company's previous generation Zion EX platform, the Grand Teton system packs in more memory, network bandwidth and compute capacity, said Alexis Bjorlin, vice president of Meta Infrastructure Hardware, at the 2022 OCP Global Summit, an Open Compute Project conference.

AI models are used extensively across Facebook for services such as news feed, content recommendations and hate-speech identification, among many other applications. "We're excited to showcase this newest family member here at the summit," Bjorlin said in prepared remarks for the conference, adding her thanks to NVIDIA for its deep collaboration on Grand Teton's design and continued support of OCP.

NVIDIA Could Launch Hopper H100 PCIe GPU with 120 GB Memory

NVIDIA's high-performance computing hardware stack is now equipped with the top-of-the-line Hopper H100 GPU. It features 16896 or 14592 CUDA cores, developing if it comes in SXM5 of PCIe variant, with the former being more powerful. Both variants come with a 5120-bit interface, with the SXM5 version using HBM3 memory running at 3.0 Gbps speed and the PCIe version using HBM2E memory running at 2.0 Gbps. Both versions use the same capacity capped at 80 GBs. However, that could soon change with the latest rumor suggesting that NVIDIA could be preparing a PCIe version of Hopper H100 GPU with 120 GBs of an unknown type of memory installed.

According to the Chinese website "s-ss.cc" the 120 GB variant of the H100 PCIe card will feature an entire GH100 chip with everything unlocked. As the site suggests, this version will improve memory capacity and performance over the regular H100 PCIe SKU. With HPC workloads increasing in size and complexity, more significant memory allocation is needed for better performance. With the recent advances in Large Language Models (LLMs), AI workloads use trillions of parameters for tranining, most of which is done on GPUs like NVIDIA H100.

Supermicro Expands Its NVIDIA-Certified Server Portfolio with New NVIDIA H100 Optimized GPU Systems

Super Micro Computer, Inc., a global leader in enterprise computing, GPUs, storage, networking solutions, and green computing technology, is extending its lead in accelerated compute infrastructure again with a full line of new systems optimized for NVIDIA H100 Tensor Core GPUs- encompassing over 20 product options. With a large portfolio of NVIDIA-Certified Systems, Supermicro is now leveraging the new NVIDIA H100 PCI-E and NVIDIA H100 SXM GPUs.

"Today, Supermicro introduced GPU-based servers with the new NVIDIA H100," said Charles Liang, president, and CEO of Supermicro. "We continue to offer the most comprehensive portfolio in the industry today and can deliver these systems in a range of sizes, including 8U, 5U, 4U, 2U, and 1U options. We also offer the latest GPU in our SuperBlades, workstations, and the Universal GPU systems. Customers can expect up to 30x performance gains for AI inferencing compared to previous GPU generations of accelerators for certain AI applications. Our GPU servers' innovative airflow designs result in reduced fan speeds, less power consumption, lower noise levels, and a lower total cost of ownership."

ASUS Servers Announce AI Developments at NVIDIA GTC

ASUS, the leading IT company in server systems, server motherboards and workstations, today announced its presence at NVIDIA GTC - a developer conference for the era of AI and the metaverse. ASUS will focus on three demonstrations outlining its strategic developments in AI, including: the methodology behind ASUS MLPerf Training v2.0 results that achieved multiple breakthrough records; a success story exploring the building of an academic AI data center at King Abdullah University of Science and Technology (KAUST) in Saudi Arabia; and a research AI data center created in conjunction with the National Health Research Institute in Taiwan.

MLPerf benchmark results help advance machine-learning performance and efficiency, allowing researchers to evaluate the efficacy of AI training and inference based on specific server configurations. Since joining MLCommons in 2021, ASUS has gained multiple breakthrough records in the data center closed division across six AI-benchmark tasks in AI training and inferencing MLPerf Training v2.0. At the ASUS GTC session, senior ASUS software engineers will share the methodology for achieving these world-class results—as well as the company's efforts to deliver more efficient AI workflows through machine learning.

Supermicro Adds New 8U Universal GPU Server for AI Training, NVIDIA Omniverse, and Meta

Super Micro Computer, Inc. (SMCI), a global leader in enterprise computing, storage, networking solutions, and green computing technology, is announcing its most advanced GPU server, incorporating eight NVIDIA H100 Tensor Core GPUs. Due to its advanced airflow design, the new high-end GPU system will allow increased inlet temperatures, reducing a data center's overall Power Usage Effectiveness (PUE) while maintaining the absolute highest performance profile. In addition, Supermicro is expanding its GPU server lineup with this new Universal GPU server, which is already the largest in the industry. Supermicro now offers three distinct Universal GPU systems: the 4U,5U, and new 8U 8GPU server. The Universal GPU platforms support both current and future Intel and AMD CPUs -- up to 400 W, 350 W, and higher.

"Supermicro is leading the industry with an extremely flexible and high-performance GPU server, which features the powerful NVIDIA A100 and H100 GPU," said Charles Liang, president, and CEO, of Supermicro. "This new server will support the next generation of CPUs and GPUs and is designed with maximum cooling capacity using the same chassis. We constantly look for innovative ways to deliver total IT Solutions to our growing customer base."

NVIDIA H100 SXM Hopper GPU Pictured Up Close

ServeTheHome, a tech media outlet focused on everything server/enterprise, posted an exclusive set of photos of NVIDIA's latest H100 "Hopper" accelerator. Being the fastest GPU NVIDIA ever created, H100 is made on TSMC's 4 nm manufacturing process and features over 80 billion transistors on an 814 mm² CoWoS package designed by TSMC. Complementing the massive die, we have 80 GB of HBM3 memory that sits close to the die. Pictured below, we have an SXM5 H100 module packed with VRM and power regulation. Given that the rated TDP for this GPU is 700 Watts, power regulation is a serious concern and NVIDIA managed to keep it in check.

On the back of the card, we see one short and one longer mezzanine connector that acts as a power delivery connector, different from the previous A100 GPU layout. This board model is labeled PG520 and is very close to the official renders that NVIDIA supplied us with on launch day.

NVIDIA Hopper Whitepaper Reveals Key Specs of Monstrous Compute Processor

The NVIDIA GH100 silicon powering the next-generation NVIDIA H100 compute processor is a monstrosity on paper, with an NVIDIA whitepaper published over the weekend revealing its key specifications. NVIDIA is tapping into the most advanced silicon fabrication node currently available from TSMC to build the compute die, which is TSMC N4 (4 nm-class EUV). The H100 features a monolithic silicon surrounded by up to six on-package HBM3 stacks.

The GH100 compute die is built on the 4 nm EUV process, and has a monstrous transistor-count of 80 billion, a nearly 50% increase over the GA100. Interestingly though, at 814 mm², the die-area of the GH100 is less than that of the GA100, with its 826 mm² die built on the 7 nm DUV (TSMC N7) node, all thanks to the transistor-density gains of the 4 nm node over the 7 nm one.

NVIDIA H100 is a Compute Monster with 80 Billion Transistors, New Compute Units and HBM3 Memory

During the GTC 2022 keynote, NVIDIA announced its newest addition to the accelerator cards family. Called NVIDIA H100 accelerator, it is the company's most powerful creation ever. Utilizing 80 billion of TSMC's 4N 4 nm transistors, H100 can output some insane performance, according to NVIDIA. Featuring a new fourth-generation Tensor Core design, it can deliver a six-fold performance increase compared to A100 Tensor Cores and a two-fold MMA (Matrix Multiply Accumulate) improvement. Additionally, new DPX instructions accelerate Dynamic Programming algorithms up to seven times over the previous A100 accelerator. Thanks to the new Hopper architecture, the Streaming Module structure has been optimized for better transfer of large data blocks.

The full GH100 chip implementation features 144 SMs, and 128 FP32 CUDA cores per SM, resulting in 18,432 CUDA cores at maximum configuration. The NVIDIA H100 GPU with SXM5 board form-factor features 132 SMs, totaling 16,896 CUDA cores, while the PCIe 5.0 add-in card has 114 SMs, totaling 14,592 CUDA cores. As much as 80 GB of HBM3 memory surrounds the GPU at 3 TB/s bandwidth. Interestingly, the SXM5 variant features a very large TDP of 700 Watts, while the PCIe card is limited to 350 Watts. This is the result of better cooling solutions offered for the SXM form-factor. As far as performance figures are concerned, the SXM and PCIe versions provide two distinctive figures for each implementation. You can check out the performance estimates in various precision modes below. You can read more about the Hopper architecture and what makes it special in this whitepaper published by NVIDIA.
NVIDIA H100

NVIDIA Announces Hopper Architecture, the Next Generation of Accelerated Computing

GTC—To power the next wave of AI data centers, NVIDIA today announced its next-generation accelerated computing platform with NVIDIA Hopper architecture, delivering an order of magnitude performance leap over its predecessor. Named for Grace Hopper, a pioneering U.S. computer scientist, the new architecture succeeds the NVIDIA Ampere architecture, launched two years ago.

The company also announced its first Hopper-based GPU, the NVIDIA H100, packed with 80 billion transistors. The world's largest and most powerful accelerator, the H100 has groundbreaking features such as a revolutionary Transformer Engine and a highly scalable NVIDIA NVLink interconnect for advancing gigantic AI language models, deep recommender systems, genomics and complex digital twins.
Return to Keyword Browsing
May 21st, 2024 10:03 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts