NVIDIA Blackwell Sets New Standard for Generative AI in MLPerf Inference Benchmark

GFreeman · Aug 28, 2024

As enterprises race to adopt generative AI and bring new services to market, the demands on data center infrastructure have never been greater. Training large language models is one challenge, but delivering LLM-powered real-time services is another. In the latest round of MLPerf industry benchmarks, Inference v4.1, NVIDIA platforms delivered leading performance across all data center tests. The first-ever submission of the upcoming NVIDIA Blackwell platform revealed up to 4x more performance than the NVIDIA H100 Tensor Core GPU on MLPerf's biggest LLM workload, Llama 2 70B, thanks to its use of a second-generation Transformer Engine and FP4 Tensor Cores.

The NVIDIA H200 Tensor Core GPU delivered outstanding results on every benchmark in the data center category - including the latest addition to the benchmark, the Mixtral 8x7B mixture of experts (MoE) LLM, which features a total of 46.7 billion parameters, with 12.9 billion parameters active per token. MoE models have gained popularity as a way to bring more versatility to LLM deployments, as they're capable of answering a wide variety of questions and performing more diverse tasks in a single deployment. They're also more efficient since they only activate a few experts per inference - meaning they deliver results much faster than dense models of a similar size.

The continued growth of LLMs is driving the need for more compute to process inference requests. To meet real-time latency requirements for serving today's LLMs, and to do so for as many users as possible, multi-GPU compute is a must. NVIDIA NVLink and NVSwitch provide high-bandwidth communication between GPUs based on the NVIDIA Hopper architecture and provide significant benefits for real-time, cost-effective large model inference. The Blackwell platform will further extend NVLink Switch's capabilities with larger NVLink domains with 72 GPUs.

In addition to the NVIDIA submissions, 10 NVIDIA partners - ASUSTek, Cisco, Dell Technologies, Fujitsu, Giga Computing, Hewlett Packard Enterprise (HPE), Juniper Networks, Lenovo, Quanta Cloud Technology and Supermicro - all made solid MLPerf Inference submissions, underscoring the wide availability of NVIDIA platforms.

Relentless Software Innovation
NVIDIA platforms undergo continuous software development, racking up performance and feature improvements on a monthly basis.

In the latest inference round, NVIDIA offerings, including the NVIDIA Hopper architecture, NVIDIA Jetson platform and NVIDIA Triton Inference Server, saw leaps and bounds in performance gains.

The NVIDIA H200 GPU delivered up to 27% more generative AI inference performance over the previous round, underscoring the added value customers get over time from their investment in the NVIDIA platform.

Triton Inference Server, part of the NVIDIA AI platform and available with NVIDIA AI Enterprise software, is a fully featured open-source inference server that helps organizations consolidate framework-specific inference servers into a single, unified platform. This helps lower the total cost of ownership of serving AI models in production and cuts model deployment times from months to minutes.

In this round of MLPerf, Triton Inference Server delivered near-equal performance to NVIDIA's bare-metal submissions, showing that organizations no longer have to choose between using a feature-rich production-grade AI inference server and achieving peak throughput performance.

Going to the Edge
Deployed at the edge, generative AI models can transform sensor data, such as images and videos, into real-time, actionable insights with strong contextual awareness. The NVIDIA Jetson platform for edge AI and robotics is uniquely capable of running any kind of model locally, including LLMs, vision transformers and Stable Diffusion.

In this round of MLPerf benchmarks, NVIDIA Jetson AGX Orin system-on-modules achieved more than a 6.2x throughput improvement and 2.4x latency improvement over the previous round on the GPT-J LLM workload. Rather than developing for a specific use case, developers can now use this general-purpose 6-billion-parameter model to seamlessly interface with human language, transforming generative AI at the edge.

Performance Leadership All Around
This round of MLPerf Inference showed the versatility and leading performance of NVIDIA platforms - extending from the data center to the edge - on all of the benchmark's workloads, supercharging the most innovative AI-powered applications and services. To learn more about these results, see our technical blog.

H200 GPU-powered systems are available today from CoreWeave - the first cloud service provider to announce general availability - and server makers ASUS, Dell Technologies, HPE, QTC and Supermicro.

View at TechPowerUp Main Site | Source

john_ · Aug 28, 2024

This is something that was out today on the news
AMD Narrows The gap With Nvidia In New MLPerf Benchmarks

Tomorrow · Aug 29, 2024

john_ said:
This is something that was out today on the news
AMD Narrows The gap With Nvidia In New MLPerf Benchmarks

1.5X at the cost of 25% higher power...

las · Aug 29, 2024

As much as people would like, AMD is not on the same level as Nvidia. Cherrypicking a benchmark here and there, don't show the entire picture and dont represent how companies that actually buy these solutions, are using them.

Nvidia is years ahead and AI king for a reason. It not just the GPU that matters. Nvidia can deliver a complete solution, AMD can't. Thats why AMD bought up ZT Systems for 5 billion dollars recently. They want to be able to deliver a complete solution, eventually.

System Name	3 desktop systems: Gaming / Internet / HTPC
Processor	Ryzen 5 5500 / Ryzen 5 4600G / FX 6300 (12 years latter got to see how bad Bulldozer is)
Motherboard	MSI X470 Gaming Plus Max (1) / MSI X470 Gaming Plus Max (2) / Gigabyte GA-990XA-UD3
Cooling	Νoctua U12S / Segotep T4 / Snowman M-T6
Memory	32GB - 16GB G.Skill RIPJAWS 3600+16GB G.Skill Aegis 3200 / 16GB JUHOR / 16GB Kingston 2400MHz (DDR3)
Video Card(s)	ASRock RX 6600 + GT 710 (PhysX)/ Vega 7 integrated / Radeon RX 580
Storage	NVMes, ONLY NVMes/ NVMes, SATA Storage / NVMe boot(Clover), SATA storage
Display(s)	Philips 43PUS8857/12 UHD TV (120Hz, HDR, FreeSync Premium) ---- 19'' HP monitor + BlitzWolf BW-V5
Case	Sharkoon Rebel 12 / CoolerMaster Elite 361 / Xigmatek Midguard
Audio Device(s)	onboard
Power Supply	Chieftec 850W / Silver Power 400W / Sharkoon 650W
Mouse	CoolerMaster Devastator III Plus / CoolerMaster Devastator / Logitech
Keyboard	CoolerMaster Devastator III Plus / CoolerMaster Devastator / Logitech
Software	Windows 10 / Windows 10&Windows 11 / Windows 10

System Name	Meh
Processor	7800X3D
Motherboard	MSI X670E Tomahawk
Cooling	Thermalright Phantom Spirit
Memory	32GB G.Skill @ 6000/CL30
Video Card(s)	Gainward RTX 4090 Phantom / Undervolt + OC
Storage	Samsung 990 Pro 2TB + WD SN850X 1TB + 64TB NAS/Server
Display(s)	27" 1440p IPS @ 360 Hz + 32" 4K/UHD QD-OLED @ 240 Hz + 77" 4K/UHD QD-OLED @ 144 Hz VRR
Case	Fractal Design North XL
Audio Device(s)	FiiO DAC
Power Supply	Corsair RM1000x / Native 12VHPWR
Mouse	Logitech G Pro Wireless Superlight + Razer Deathadder V3 Pro
Keyboard	Corsair K60 Pro / MX Low Profile Speed
Software	Windows 10 Pro x64

NVIDIA Blackwell Sets New Standard for Generative AI in MLPerf Inference Benchmark

GFreeman

News Editor

john_

Tomorrow

las