News Posts matching #H100

Return to Keyword Browsing

NVIDIA H100 GPUs Set Standard for Generative AI in Debut MLPerf Benchmark

In a new industry-standard benchmark, a cluster of 3,584 H100 GPUs at cloud service provider CoreWeave trained a massive GPT-3-based model in just 11 minutes. Leading users and industry-standard benchmarks agree: NVIDIA H100 Tensor Core GPUs deliver the best AI performance, especially on the large language models (LLMs) powering generative AI.

H100 GPUs set new records on all eight tests in the latest MLPerf training benchmarks released today, excelling on a new MLPerf test for generative AI. That excellence is delivered both per-accelerator and at-scale in massive servers. For example, on a commercially available cluster of 3,584 H100 GPUs co-developed by startup Inflection AI and operated by CoreWeave, a cloud service provider specializing in GPU-accelerated workloads, the system completed the massive GPT-3-based training benchmark in less than eleven minutes.

MLCommons Shares Intel Habana Gaudi2 and 4th Gen Intel Xeon Scalable AI Benchmark Results

Today, MLCommons published results of its industry AI performance benchmark, MLPerf Training 3.0, in which both the Habana Gaudi2 deep learning accelerator and the 4th Gen Intel Xeon Scalable processor delivered impressive training results.

"The latest MLPerf results published by MLCommons validates the TCO value Intel Xeon processors and Intel Gaudi deep learning accelerators provide to customers in the area of AI. Xeon's built-in accelerators make it an ideal solution to run volume AI workloads on general-purpose processors, while Gaudi delivers competitive performance for large language models and generative AI. Intel's scalable systems with optimized, easy-to-program open software lowers the barrier for customers and partners to deploy a broad array of AI-based solutions in the data center from the cloud to the intelligent edge." - Sandra Rivera, Intel executive vice president and general manager of the Data Center and AI Group

NVIDIA Allegedly Preparing H100 GPU with 94 and 64 GB Memory

NVIDIA's compute and AI-oriented H100 GPU is supposedly getting an upgrade. The H100 GPU is NVIDIA's most powerful offering and comes in a few different flavors: H100 PCIe, H100 SXM, and H100 NVL (a duo of two GPUs). Currently, the H100 GPU comes with 80 GB of HBM2E, both in the PCIe and SXM5 version of the card. A notable exception if the H100 NVL, which comes with 188 GB of HBM3, but that is for two cards, making it 94 GB per each. However, we could see NVIDIA enable 94 and 64 GB options for the H100 accelerator soon, as the latest PCI ID Repository shows.

According to the PCI ID Repository listing, two messages are posted: "Kindly help to add H100 SXM5 64 GB into 2337." and "Kindly help to add H100 SXM5 94 GB into 2339." These two messages indicate that NVIDIA could prepare its H100 in more variations. In September 2022, we saw NVIDIA prepare an H100 variation with 120 GB of memory, but that still isn't official. These PCIe IDs could just come from engineering samples that NVIDIA is testing in the labs, and these cards could never appear on any market. So, we have to wait and see how it plays out.

Major CSPs Aggressively Constructing AI Servers and Boosting Demand for AI Chips and HBM, Advanced Packaging Capacity Forecasted to Surge 30~40%

TrendForce reports that explosive growth in generative AI applications like chatbots has spurred significant expansion in AI server development in 2023. Major CSPs including Microsoft, Google, AWS, as well as Chinese enterprises like Baidu and ByteDance, have invested heavily in high-end AI servers to continuously train and optimize their AI models. This reliance on high-end AI servers necessitates the use of high-end AI chips, which in turn will not only drive up demand for HBM during 2023~2024, but is also expected to boost growth in advanced packaging capacity by 30~40% in 2024.

TrendForce highlights that to augment the computational efficiency of AI servers and enhance memory transmission bandwidth, leading AI chip makers such as Nvidia, AMD, and Intel have opted to incorporate HBM. Presently, Nvidia's A100 and H100 chips each boast up to 80 GB of HBM2e and HBM3. In its latest integrated CPU and GPU, the Grace Hopper Superchip, Nvidia expanded a single chip's HBM capacity by 20%, hitting a mark of 96 GB. AMD's MI300 also uses HBM3, with the MI300A capacity remaining at 128 GB like its predecessor, while the more advanced MI300X has ramped up to 192 GB, marking a 50% increase. Google is expected to broaden its partnership with Broadcom in late 2023 to produce the AISC AI accelerator chip TPU, which will also incorporate HBM memory, in order to extend AI infrastructure.

NVIDIA A100 GPUs in High Demand on Chinese Black Market

The top technology companies in China have been ordering a lot of NVIDIA enterprise-grade GPUs, even though U.S. sanctions have prevented the shipment of A100 and H100 models (plus AMD's MI250 Instinct accelerator) to the nation in recent times. ByteDance - best known for developing TikTok - managed to grab plenty of Ampere enterprise units prior to last Autumn's cutoff period, and has continued to purchase Team Green's H800 GPU, which is a cut-down version of the H100 flagship. Smaller outfits are relying on less direct sources to acquire HBC GPUs—according to a Reuters investigative article, international trade restrictions have created a thriving black market for "top-end NVIDIA AI chips."

Their reporters carried out some on-site sleuthing: "Visiting the famed Huaqiangbei electronics area in the southern Chinese city of Shenzhen is a good bet - in particular, the SEG Plaza skyscraper whose first 10 floors are crammed with shops selling everything from camera parts to drones. The chips are not advertised but asking discreetly works...They don't come cheap. Two vendors there, who spoke with Reuters in person on condition of anonymity, said they could provide small numbers of A100 artificial intelligence chips made by the U.S. chip designer, pricing them at $20,000 a piece - double the usual price."

NVIDIA H100 Hopper GPU Tested for Gaming, Slower Than Integrated GPU

NVIDIA's H100 Hopper GPU is a device designed for pure AI and other compute workloads, with the least amount of consideration for gaming workloads that involve graphics processing. However, it is still interesting to see how this 30,000 USD GPU fairs in comparison to other gaming GPUs and whether it is even possible to run games on it. It turns out that it is technically feasible but not making much sense, as the Chinese YouTube channel Geekerwan notes. Based on the GH100 GPU SKU with 14,592 CUDA, the H100 PCIe version tested here can achieve 204.9 TeraFLOPS at FP16, 51.22 TeraFLOPS at FP32, and 25.61 TeraFLOPS at FP64, with its natural power laying in accelerating AI workloads.

However, how does it fare in gaming benchmarks? Not very well, as the testing shows. It scored 2681 points in 3DMark Time Spy, which is lower than AMD's integrated Radeon 680M, which managed to score 2710 points. Interestingly, the GH100 has only 24 ROPs (render output units), while the gaming-oriented GA102 (highest-end gaming GPU SKU) has 112 ROPs. This is self-explanatory and provides a clear picture as to why the H100 GPU is used for computing only. Since it doesn't have any display outputs, the system needed another regular GPU to provide the picture, while the computation happened on the H100 GPU.

ASUS Unveils ESC N8-E11, an HGX H100 Eight-GPU Server

ASUS today announced ESC N8-E11, its most advanced HGX H100 eight-GPU AI server, along with a comprehensive PCI Express (PCIe) GPU server portfolio—the ESC8000 and ESC4000 series empowered by Intel and AMD platforms to support higher CPU and GPU TDPs to accelerate the development of AI and data science.

ASUS is one of the few HPC solution providers with its own all-dimensional resources that consist of the ASUS server business unit, Taiwan Web Service (TWS) and ASUS Cloud—all part of the ASUS group. This uniquely positions ASUS to deliver in-house AI server design, data-center infrastructure, and AI software-development capabilities, plus a diverse ecosystem of industrial hardware and software partners.

Gigabyte Shows AI/HPC and Data Center Servers at Computex

GIGABYTE is exhibiting cutting-edge technologies and solutions at COMPUTEX 2023, presenting the theme "Future of COMPUTING". From May 30th to June 2nd, GIGABYTE is showcasing over 110 products that are driving future industry transformation, demonstrating the emerging trends of AI technology and sustainability, on the 1st floor, Taipei Nangang Exhibition Center, Hall 1.

GIGABYTE and its subsidiary, Giga Computing, are introducing unparalleled AI/HPC server lineups, leading the era of exascale supercomputing. One of the stars is the industry's first NVIDIA-certified HGX H100 8-GPU SXM5 server, G593-SD0. Equipped with the 4th Gen Intel Xeon Scalable Processors and GIGABYTE's industry-leading thermal design, G593-SD0 can perform extremely intensive workloads from generative AI and deep learning model training within a density-optimized 5U server chassis, making it a top choice for data centers aimed for AI breakthroughs. In addition, GIGABYTE is debuting AI computing servers supporting NVIDIA Grace CPU and Grace Hopper Superchips. The high-density servers are accelerated with NVLink-C2C technology under the ARM Neoverse V2 platform, setting a new standard for AI/HPC computing efficiency and bandwidth.

NVIDIA Launches Accelerated Ethernet Platform for Hyperscale Generative AI

NVIDIA today announced NVIDIA Spectrum-X, an accelerated networking platform designed to improve the performance and efficiency of Ethernet-based AI clouds. NVIDIA Spectrum-X is built on networking innovations powered by the tight coupling of the NVIDIA Spectrum-4 Ethernet switch with the NVIDIA BlueField -3 DPU, achieving 1.7x better overall AI performance and power efficiency, along with consistent, predictable performance in multi-tenant environments. Spectrum-X is supercharged by NVIDIA acceleration software and software development kits (SDKs), allowing developers to build software-defined, cloud-native AI applications.

The delivery of end-to-end capabilities reduces run-times of massive transformer-based generative AI models. This allows network engineers, AI data scientists and cloud service providers to improve results and make informed decisions faster. The world's top hyperscalers are adopting NVIDIA Spectrum-X, including industry-leading cloud innovators. As a blueprint and testbed for NVIDIA Spectrum-X reference designs, NVIDIA is building Israel-1, a hyperscale generative AI supercomputer to be deployed in its Israeli data center on Dell PowerEdge XE9680 servers based on the NVIDIA HGX H100 eight-GPU platform, BlueField-3 DPUs and Spectrum-4 switches.

Dell and NVIDIA Introduce Project Helix, a Secure On-Premises Generative AI

Dell Technologies and NVIDIA announce a joint initiative to make it easier for businesses to build and use generative AI models on-premises to quickly and securely deliver better customer service, market intelligence, enterprise search and a range of other capabilities. Project Helix will deliver a series of full-stack solutions with technical expertise and pre-built tools based on Dell and NVIDIA infrastructure and software. It includes a complete blueprint to help enterprises use their proprietary data and more easily deploy generative AI responsibly and accurately.

"Project Helix gives enterprises purpose-built AI models to more quickly and securely gain value from the immense amounts of data underused today," said Jeff Clarke, vice chairman and co-chief operating officer, Dell Technologies. "With highly scalable and efficient infrastructure, enterprises can create a new wave of generative AI solutions that can reinvent their industries."

"We are at a historic moment, when incredible advances in generative AI are intersecting with enterprise demand to do more with less," said Jensen Huang, founder and CEO, NVIDIA. "With Dell Technologies, we've designed extremely scalable, highly efficient infrastructure that enables enterprises to transform their business by securely using their own data to build and operate generative AI applications."

ASUS Demonstrates Liquid Cooling and AI Solutions at ISC High Performance 2023

ASUS today announced a showcase of the latest HPC solutions to empower innovation and push the boundaries of supercomputing, at ISC High Performance 2023 in Hamburg, Germany on May 21-25, 2023. The ASUS exhibition, at booth H813, will reveal the latest supercomputing advances, including liquid-cooling and AI solutions, as well as outlining a slew of sustainability breakthroughs - plus a whole lot more besides.

Comprehensive Liquid-Cooling Solutions
ASUS is working with Submer, the industry-leading liquid-cooling provider to demonstrate immersion-cooling solutions at ISC High Performance 2023, focused on ASUS RS720-E11-IM - the Intel -based 2U4N server that leverages our trusted legacy server architecture and popular features to create a compact new design. This fresh outlook improves the accessibility on I/O ports, storage and cable routing, and strengthens the structure to allow the server to be placed vertically in the tank, with durability assured.

Frontier Remains As Sole Exaflop Machine on TOP500 List

Increasing its HPL score from 1.02 Eflop/s in November 2022 to an impressive 1.194 Eflop/s on this list, Frontier was able to improve upon its score after a stagnation between June 2022 and November 2022. Considering exascale was only a goal to aspire to just a few years ago, a roughly 17% increase here is an enormous success. Additionally, Frontier earned a score of 9.95 Eflop/s on the HLP-MxP benchmark, which measures performance for mixed-precision calculation. This is also an increase over the 7.94 EFlop/s that the system achieved on the previous list and nearly 10 times more powerful than the machine's HPL score. Frontier is based on the HPE Cray EX235a architecture and utilizes AMD EPYC 64C 2 GHz processors. It also has 8,699,904 cores and an incredible energy efficiency rating of 52.59 Gflops/watt. It also relies on gigabit ethernet for data transfer.

NVIDIA Grace Drives Wave of New Energy-Efficient Arm Supercomputers

NVIDIA today announced a supercomputer built on the NVIDIA Grace CPU Superchip, adding to a wave of new energy-efficient supercomputers based on the Arm Neoverse platform. The Isambard 3 supercomputer to be based at the Bristol & Bath Science Park, in the U.K., will feature 384 Arm-based NVIDIA Grace CPU Superchips to power medical and scientific research, and is expected to deliver 6x the performance and energy efficiency of Isambard 2, placing it among Europe's most energy-efficient systems.

It will achieve about 2.7 petaflops of FP64 peak performance and consume less than 270 kilowatts of power, ranking it among the world's three greenest non-accelerated supercomputers. The project is being led by the University of Bristol, as part of the research consortium the GW4 Alliance, together with the universities of Bath, Cardiff and Exeter.

Supermicro Launches Industry's First NVIDIA HGX H100 8 and 4-GPU H100 Servers with Liquid Cooling

Supermicro, Inc., a Total IT Solution Provider for Cloud, AI/ML, Storage, and 5G/Edge, continues to expand its data center offerings with liquid cooled NVIDIA HGX H100 rack scale solutions. Advanced liquid cooling technologies entirely from Supermicro reduce the lead time for a complete installation, increase performance, and result in lower operating expenses while significantly reducing the PUE of data centers. Savings for a data center are estimated to be 40% for power when using Supermicro liquid cooling solutions compared to an air-cooled data center. In addition, up to 86% reduction in direct cooling costs compared to existing data centers may be realized.

"Supermicro continues to lead the industry supporting the demanding needs of AI workloads and modern data centers worldwide," said Charles Liang, president, and CEO of Supermicro. "Our innovative GPU servers that use our liquid cooling technology significantly lower the power requirements of data centers. With the amount of power required to enable today's rapidly evolving large scale AI models, optimizing TCO and the Total Cost to Environment (TCE) is crucial to data center operators. We have proven expertise in designing and building entire racks of high-performance servers. These GPU systems are designed from the ground up for rack scale integration with liquid cooling to provide superior performance, efficiency, and ease of deployments, allowing us to meet our customers' requirements with a short lead time."

Google Announces A3 Supercomputers with NVIDIA H100 GPUs, Purpose-built for AI

Implementing state-of-the-art artificial intelligence (AI) and machine learning (ML) models requires large amounts of computation, both to train the underlying models, and to serve those models once they're trained. Given the demands of these workloads, a one-size-fits-all approach is not enough - you need infrastructure that's purpose-built for AI.

Together with our partners, we offer a wide range of compute options for ML use cases such as large language models (LLMs), generative AI, and diffusion models. Recently, we announced G2 VMs, becoming the first cloud to offer the new NVIDIA L4 Tensor Core GPUs for serving generative AI workloads. Today, we're expanding that portfolio with the private preview launch of the next-generation A3 GPU supercomputer. Google Cloud now offers a complete range of GPU options for training and inference of ML models.

NVIDIA A800 China-Tailored GPU Performance within 70% of A100

The recent growth in demand for training Large Language Models (LLMs) like Generative Pre-trained Transformer (GPT) has sparked the interest of many companies to invest in GPU solutions that are used to train these models. However, countries like China have struggled with US sanctions, and NVIDIA has to create custom models that meet US export regulations. Carrying two GPUs, H800 and A800, they represent cut-down versions of the original H100 and A100, respectively. We reported about H800; however, it remained as mysterious as A800 that we are talking about today. Thanks to MyDrivers, we have information that the A800 GPU performance is within 70% of the regular A100.

The regular A100 GPU manages 9.7 TeraFLOPs of FP64, 19.5 TeraFLOPS of FP64 Tensor, and up to 624 BF16/FP16 TeraFLOPS with sparsity. A rough napkin math would suggest that 70% performance of the original (a 30% cut) would equal 6.8 TeraFLOPs of FP64 precision, 13.7 TeraFLOPs of FP64 Tensor, and 437 BF16/FP16 TeraFLOPs with sparsity. MyDrivers notes that A800 can be had for 100,000 Yuan, translating to about 14,462 USD at the time of writing. This is not the most capable GPU that Chinese companies can acquire, as H800 exists. However, we don't have any information about its performance for now.

NVIDIA DGX H100 Systems are Now Shipping

Customers from Japan to Ecuador and Sweden are using NVIDIA DGX H100 systems like AI factories to manufacture intelligence. They're creating services that offer AI-driven insights in finance, healthcare, law, IT and telecom—and working to transform their industries in the process. Among the dozens of use cases, one aims to predict how factory equipment will age, so tomorrow's plants can be more efficient.

Called Green Physics AI, it adds information like an object's CO2 footprint, age and energy consumption to SORDI.ai, which claims to be the largest synthetic dataset in manufacturing.

NVIDIA H100 Compared to A100 for Training GPT Large Language Models

NVIDIA's H100 has recently become available to use via Cloud Service Providers (CSPs), and it was only a matter of time before someone decided to benchmark its performance and compare it to the previous generation's A100 GPU. Today, thanks to the benchmarks of MosaicML, a startup company led by the ex-CEO of Nervana and GM of Artificial Intelligence (AI) at Intel, Naveen Rao, we have some comparison between these two GPUs with a fascinating insight about the cost factor. Firstly, MosaicML has taken Generative Pre-trained Transformer (GPT) models of various sizes and trained them using bfloat16 and FP8 Floating Point precision formats. All training occurred on CoreWeave cloud GPU instances.

Regarding performance, the NVIDIA H100 GPU achieved anywhere from 2.2x to 3.3x speedup. However, an interesting finding emerges when comparing the cost of running these GPUs in the cloud. CoreWeave prices the H100 SXM GPUs at $4.76/hr/GPU, while the A100 80 GB SXM gets $2.21/hr/GPU pricing. While the H100 is 2.2x more expensive, the performance makes it up, resulting in less time to train a model and a lower price for the training process. This inherently makes H100 more attractive for researchers and companies wanting to train Large Language Models (LLMs) and makes choosing the newer GPU more viable, despite the increased cost. Below, you can see tables of comparison between two GPUs in training time, speedup, and cost of training.

Gigabyte Extends Its Leading GPU Portfolio of Servers

Giga Computing, a subsidiary of GIGABYTE and an industry leader in high-performance servers, server motherboards, and workstations, today announced a lineup of powerful GPU-centric servers with the latest AMD and Intel CPUs, including NVIDIA HGX H100 servers with both 4-GPU and 8-GPU modules. With growing interest in HPC and AI applications, specifically generative AI (GAI), this breed of server relies heavily on GPU resources to tackle compute-heavy workloads that handle large amounts of data. With the advent of OpenAI's ChatGPT and other AI chatbots, large GPU clusters are being deployed with system-level optimization to train large language models (LLMs). These LLMs can be processed by GIGABYTE's new design-optimized systems that offer a high level of customization based on users' workloads and requirements.

The GIGABYTE G-series servers are built first and foremost to support dense GPU compute and the latest PCIe technology. Starting with the 2U servers, the new G293 servers can support up to 8 dual-slot GPUs or 16 single-slot GPUs, depending on the server model. For the ultimate in CPU and GPU performance, the 4U G493 servers offer plenty of networking options and storage configurations to go alongside support for eight (Gen 5 x16) GPUs. And for the highest level of GPU compute for HPC and AI, the G393 & G593 series support NVIDIA H100 Tensor Core GPUs. All these new two CPU socket servers are designed for either 4th Gen AMD EPYC processors or 4th Gen Intel Xeon Scalable processors.

NVIDIA H100 AI Performance Receives up to 54% Uplift with Optimizations

On Wednesday, the MLCommons team released the MLPerf 3.0 Inference numbers, and there was an exciting submission from NVIDIA. Reportedly, NVIDIA has used software optimization to improve the already staggering performance of its latest H100 GPU by up to 54%. For reference, NVIDIA's H100 GPU first appeared on MLPerf 2.1 back in September of 2022. In just six months, NVIDIA engineers worked on AI optimizations for the MLPerf 3.0 release to find that basic software optimization can catalyze performance increases anywhere from 7-54%. The workloads for measuring the inferencing speed suite included RNN-T speech recognition, 3D U-Net medical imaging, RetinaNet object detection, ResNet-50 object classification, DLRM recommendation, and BERT 99/99.9% natural language processing.

What is interesting is that NVIDIA's submission is a bit modified. There are open and closed categories that vendors have to compete in, where closed is the mathematical equivalent of a neural network. In contrast, the open category is flexible and allows vendors to submit results based on optimizations for their hardware. The closed submission aims to provide an "apples-to-apples" hardware comparison. Given that NVIDIA opted to use the closed category, performance optimization of other vendors such as Intel and Qualcomm are not accounted for here. Still, it is interesting that optimization can lead to a performance increase of up to 54% in NVIDIA's case with its H100 GPU. Another interesting takeaway is that some comparable hardware, like Qualcomm Cloud AI 100, Intel Xeon Platinum 8480+, and NeuChips's ReccAccel N3000, failed to finish all the workloads. This is shown as "X" on the slides made by NVIDIA, stressing the need for proper ML system software support, which is NVIDIA's strength and an extensive marketing claim.

Chinese GPU Maker Biren Technology Loses its Co-Founder, Only Months After Revealing New GPUs

Golf Jiao, a co-founder and general manager of Biren Technology, has left the company late last month according to insider sources in China. No official statement has been issued by the executive team at Biren Tech, and Jiao has not provided any details regarding his departure from the fabless semiconductor design company. The Shanghai-based firm is a relatively new startup - it was founded in 2019 by several former NVIDIA, Qualcomm and Alibaba veterans. Biren Tech received $726.6 million in funding for its debut range of general-purpose graphics processing units (GPGPUs), also defined as high-performance computing graphics processing units (HPC GPUs).

The company revealed its ambitions to take on NVIDIA's Ampere A100 and Hopper H100 compute platforms, and last August announced two HPC GPUs in the form of the BR100 and BR104. The specifications and performance charts demonstrated impressive figures, but Biren Tech had to roll back its numbers when it was hit by U.S Government enforced sanctions in October 2022. The fabless company had contracted with TSMC to produce its Biren range, and the new set of rules resulted in shipments from the Taiwanese foundry being halted. Biren Tech cut its work force by a third soon after losing its supply chain with TSMC, and the engineering team had to reassess how the BR100 and BR104 would perform on a process node larger than the original 7 nm design. It was decided that a downgrade in transfer rates would appease the legal teams, and get newly redesigned Biren silicon back onto the assembly line.

NVIDIA Prepares H800 Adaptation of H100 GPU for the Chinese Market

NVIDIA's H100 accelerator is one of the most powerful solutions for powering AI workloads. And, of course, every company and government wants to use it to power its AI workload. However, in countries like China, shipment of US-made goods is challenging. With export regulations in place, NVIDIA had to get creative and make a specific version of its H100 GPU for the Chinese market, labeled the H800 model. Late last year, NVIDIA also created a China-specific version of the A100 model called A800, with the only difference being the chip-to-chip interconnect bandwidth being dropped from 600 GB/s to 400 GB/s.

This year's H800 SKU also features similar restrictions, and the company appears to have made similar sacrifices for shipping its chips to China. From the 600 GB/s bandwidth of the regular H100 PCIe model, the H800 is gutted to only 300 GB/s of bi-directional chip-to-chip interconnect bandwidth speed. While we have no data if the CUDA or Tensor core count has been adjusted, the sacrifice of bandwidth to comply with export regulations will have consequences. As the communication speed is reduced, training large models will increase the latency and slow the workload compared to the regular H100 chip. This is due to the massive data size that needs to travel from one chip to another. According to Reuters, an NVIDIA spokesperson declined to discuss other differences, stating that "our 800 series products are fully compliant with export control regulations."

NVIDIA Prepares H100 NVL GPUs With More Memory and SLI-Like Capability

NVIDIA has killed SLI on its graphics cards, disabling the possibility of connecting two or more GPUs to harness their power for gaming and other workloads. However, SLI is making a reincarnation today in the form of a new H100 GPU model that spots higher memory capacity and higher performance. Called the H100 NVL, the GPU is a unique edition design based on the regular H100 PCIe version. What makes the H100 HVL version so special is the boost in memory capacity, now up from 80 GB in the standard model to 94 GB in the NVL edition SKU, for a total of 188 GB of HMB3 memory, running on a 6144-bit bus. Being a special edition SKU, it is sold only in pairs, as these H100 NVL GPUs are paired together and are connected by three NVLink connectors on top. Installation requires two PCIe slots, separated by dual-slot spacing.

The performance differences between the H100 PCIe version and the H100 SXM version are now matched with the new H100 NVL, as the card features a boost in the TDP with up to 400 Watts per card, which is configurable. The H100 NVL uses the same Tensor and CUDA core configuration as the SXM edition, except it is placed on a PCIe slot and connected to another card. Being sold in pairs, OEMs can outfit their systems with either two or four pairs per certified system. You can see the specification table below, with information filled out by AnandTech. As NVIDIA says, the need for this special edition SKU is the emergence of Large Language Models (LLMs) that require significant computational power to run. "Servers equipped with H100 NVL GPUs increase GPT-175B model performance up to 12X over NVIDIA DGX A100 systems while maintaining low latency in power-constrained data center environments," noted the company.

ASUS Announces NVIDIA-Certified Servers and ProArt Studiobook Pro 16 OLED at GTC

ASUS today announced its participation in NVIDIA GTC, a developer conference for the era of AI and the metaverse. ASUS will offer comprehensive NVIDIA-certified server solutions that support the latest NVIDIA L4 Tensor Core GPU—which accelerates real-time video AI and generative AI—as well as the NVIDIA BlueField -3 DPU, igniting unprecedented innovation for supercomputing infrastructure. ASUS will also launch the new ProArt Studiobook Pro 16 OLED laptop with the NVIDIA RTX 3000 Ada Generation Laptop GPU for mobile creative professionals.

Purpose-built GPU servers for generative AI
Generative AI applications enable businesses to develop better products and services, and deliver original content tailored to the unique needs of customers and audiences. ASUS ESC8000 and ESC4000 are fully certified NVIDIA servers that support up to eight NVIDIA L4 Tensor Core GPUs, which deliver universal acceleration and energy efficiency for AI with up to 2.7X more generative AI performance than the previous GPU generation. ASUS ESC and RS series servers are engineered for HPC workloads, with support for the NVIDIA Bluefield-3 DPU to transform data center infrastructure, as well as NVIDIA AI Enterprise applications for streamlined AI workflows and deployment.

Supermicro Servers Now Featuring NVIDIA HGX and PCIe-Based H100 8-GPU Systems

Supermicro, Inc., a Total IT Solution Provider for AI/ML, Cloud, Storage, and 5G/Edge, today has announced that it has begun shipping its top-of-the-line new GPU servers that feature the latest NVIDIA HGX H100 8-GPU system. Supermicro servers incorporate the new NVIDIA L4 Tensor Core GPU in a wide range of application-optimized servers from the edge to the data center.

"Supermicro offers the most comprehensive portfolio of GPU systems in the industry, including servers in 8U, 6U, 5U, 4U, 2U, and 1U form factors, as well as workstations and SuperBlade systems that support the full range of new NVIDIA H100 GPUs," said Charles Liang, president and CEO of Supermicro. "With our new NVIDIA HGX H100 Delta-Next server, customers can expect 9x performance gains compared to the previous generation for AI training applications. Our GPU servers have innovative airflow designs which reduce fan speeds, lower noise levels, and consume less power, resulting in a reduced total cost of ownership (TCO). In addition, we deliver complete rack-scale liquid-cooling options for customers looking to further future-proof their data centers."
Return to Keyword Browsing
Dec 20th, 2024 00:28 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts