News Posts matching #MLCommons

Return to Keyword Browsing

Industry's First-to-Market Supermicro NVIDIA HGX B200 Systems Demonstrate AI Performance Leadership

Super Micro Computer, Inc. (SMCI), a Total IT Solution Provider for AI/ML, HPC, Cloud, Storage, and 5G/Edge, has announced first-to-market industry leading performance on several MLPerf Inference v5.0 benchmarks, using the 8-GPU. The 4U liquid-cooled and 10U air-cooled systems achieved the best performance in select benchmarks. Supermicro demonstrated more than 3 times the tokens per second (Token/s) generation for Llama2-70B and Llama3.1-405B benchmarks compared to H200 8-GPU systems. "Supermicro remains a leader in the AI industry, as evidenced by the first new benchmarks released by MLCommons in 2025," said Charles Liang, president and CEO of Supermicro. "Our building block architecture enables us to be first-to-market with a diverse range of systems optimized for various workloads. We continue to collaborate closely with NVIDIA to fine-tune our systems and secure a leadership position in AI workloads." Learn more about the new MLPerf v5.0 Inference benchmarks here.

Supermicro is the only system vendor publishing record MLPerf inference performance (on select benchmarks) for both the air-cooled and liquid-cooled NVIDIA HGX B200 8-GPU systems. Both air-cooled and liquid-cooled systems were operational before the MLCommons benchmark start date. Supermicro engineers optimized the systems and software to showcase the impressive performance. Within the operating margin, the Supermicro air-cooled B200 system exhibited the same level of performance as the liquid-cooled B200 system. Supermicro has been delivering these systems to customers while we conducted the benchmarks. MLCommons emphasizes that all results be reproducible, that the products are available and that the results can be audited by other MLCommons members. Supermicro engineers optimized the systems and software, as allowed by the MLCommons rules.

Intel Xeon Remains Only Server CPU on MLPerf

Today, MLCommons released its latest MLPerf Inference v5.0 benchmarks, showcasing Intel Xeon 6 with Performance-cores (P-cores) across six key benchmarks. The results reveal a remarkable 1.9x boost in AI performance over the previous generation of processors, affirming Xeon 6 as a top solution for modern AI systems.

"The latest MLPerf results demonstrate Intel Xeon 6 as the ideal CPU for AI workloads, offering a perfect balance of performance and energy efficiency. Intel Xeon remains the leading CPU for AI systems, with consistent gen-over-gen performance improvements across a variety of AI benchmarks." - Karin Eibschitz Segal, Intel corporate vice president and interim general manager of the Data Center and AI Group

MLCommons Releases New MLPerf Inference v5.0 Benchmark Results

Today, MLCommons announced new results for its industry-standard MLPerf Inference v5.0 benchmark suite, which delivers machine learning (ML) system performance benchmarking in an architecture-neutral, representative, and reproducible manner. The results highlight that the AI community is focusing much of its attention and efforts on generative AI scenarios, and that the combination of recent hardware and software advances optimized for generative AI have led to dramatic performance improvements over the past year.

The MLPerf Inference benchmark suite, which encompasses both datacenter and edge systems, is designed to measure how quickly systems can run AI and ML models across a variety of workloads. The open-source and peer-reviewed benchmark suite creates a level playing field for competition that drives innovation, performance, and energy efficiency for the entire industry. It also provides critical technical information for customers who are procuring and tuning AI systems. This round of MLPerf Inference results also includes tests for four new benchmarks: Llama 3.1 405B, Llama 2 70B Interactive for low-latency applications, RGAT, and Automotive PointPainting for 3D object detection.

NVIDIA Blackwell Takes Pole Position in Latest MLPerf Inference Results

In the latest MLPerf Inference V5.0 benchmarks, which reflect some of the most challenging inference scenarios, the NVIDIA Blackwell platform set records - and marked NVIDIA's first MLPerf submission using the NVIDIA GB200 NVL72 system, a rack-scale solution designed for AI reasoning. Delivering on the promise of cutting-edge AI takes a new kind of compute infrastructure, called AI factories. Unlike traditional data centers, AI factories do more than store and process data - they manufacture intelligence at scale by transforming raw data into real-time insights. The goal for AI factories is simple: deliver accurate answers to queries quickly, at the lowest cost and to as many users as possible.

The complexity of pulling this off is significant and takes place behind the scenes. As AI models grow to billions and trillions of parameters to deliver smarter replies, the compute required to generate each token increases. This requirement reduces the number of tokens that an AI factory can generate and increases cost per token. Keeping inference throughput high and cost per token low requires rapid innovation across every layer of the technology stack, spanning silicon, network systems and software.

NVIDIA B200 "Blackwell" Records 2.2x Performance Improvement Over its "Hopper" Predecessor

We know that NVIDIA's latest "Blackwell" GPUs are fast, but how much faster are they over the previous generation "Hopper"? Thanks to the latest MLPerf Training v4.1 results, NVIDIA's HGX B200 Blackwell platform has demonstrated massive performance gains, measuring up to 2.2x improvement per GPU compared to its HGX H200 Hopper. The latest results, verified by MLCommons, reveal impressive achievements in large language model (LLM) training. The Blackwell architecture, featuring HBM3e high-bandwidth memory and fifth-generation NVLink interconnect technology, achieved double the performance per GPU for GPT-3 pre-training and a 2.2x boost for Llama 2 70B fine-tuning compared to the previous Hopper generation. Each benchmark system incorporated eight Blackwell GPUs operating at a 1,000 W TDP, connected via NVLink Switch for scale-up.

The network infrastructure utilized NVIDIA ConnectX-7 SuperNICs and Quantum-2 InfiniBand switches, enabling high-speed node-to-node communication for distributed training workloads. While previous Hopper-based systems required 256 GPUs to optimize performance for the GPT-3 175B benchmark, Blackwell accomplished the same task with just 64 GPUs, leveraging its larger HBM3e memory capacity and bandwidth. One thing to look out for is the upcoming GB200 NVL72 system, which promises even more significant gains past the 2.2x. It features expanded NVLink domains, higher memory bandwidth, and tight integration with NVIDIA Grace CPUs, complemented by ConnectX-8 SuperNIC and Quantum-X800 switch technologies. With faster switching and better data movement with Grace-Blackwell integration, we could see even more software optimization from NVIDIA to push the performance envelope.

AMD MI300X Accelerators are Competitive with NVIDIA H100, Crunch MLPerf Inference v4.1

The MLCommons consortium on Wednesday posted MLPerf Inference v4.1 benchmark results for popular AI inferencing accelerators available in the market, across brands that include NVIDIA, AMD, and Intel. AMD's Instinct MI300X accelerators emerged competitive to NVIDIA's "Hopper" H100 series AI GPUs. AMD also used the opportunity to showcase the kind of AI inferencing performance uplifts customers can expect from its next-generation EPYC "Turin" server processors powering these MI300X machines. "Turin" features "Zen 5" CPU cores, sporting a 512-bit FPU datapath, and improved performance in AI-relevant 512-bit SIMD instruction-sets, such as AVX-512, and VNNI. The MI300X, on the other hand, banks on the strengths of its memory sub-system, FP8 data format support, and efficient KV cache management.

The MLPerf Inference v4.1 benchmark focused on the 70 billion-parameter LLaMA2-70B model. AMD's submissions included machines featuring the Instinct MI300X, powered by the current EPYC "Genoa" (Zen 4), and next-gen EPYC "Turin" (Zen 5). The GPUs are backed by AMD's ROCm open-source software stack. The benchmark evaluated inference performance using 24,576 Q&A samples from the OpenORCA dataset, with each sample containing up to 1024 input and output tokens. Two scenarios were assessed: the offline scenario, focusing on batch processing to maximize throughput in tokens per second, and the server scenario, which simulates real-time queries with strict latency limits (TTFT ≤ 2 seconds, TPOT ≤ 200 ms). This lets you see the chip's mettle in both high-throughput and low-latency queries.

Intel Xeon 6 Delivers up to 17x AI Performance Gains over 4 Years of MLPerf Results

Today, MLCommons published results of its industry-standard AI performance benchmark suite, MLPerf Inference v4.1. Intel submitted results across six MLPerf benchmarks for 5th Gen Intel Xeon Scalable processors and, for the first time, Intel Xeon 6 processors with Performance-cores (P-cores). Intel Xeon 6 processors with P-cores achieved about 1.9x geomean performance improvement in AI performance compared with 5th Gen Xeon processors.

"The newest MLPerf results show how continued investment and resourcing is critical for improving AI performance. Over the past four years, we have raised the bar for AI performance on Intel Xeon processors by up to 17x based on MLPerf. As we near general availability later this year, we look forward to ramping Xeon 6 with our customers and partners," said Pallavi Mahajan, Intel corporate vice president and general manager of Data Center and AI Software.

NVIDIA MLPerf Training Results Showcase Unprecedented Performance and Elasticity

The full-stack NVIDIA accelerated computing platform has once again demonstrated exceptional performance in the latest MLPerf Training v4.0 benchmarks. NVIDIA more than tripled the performance on the large language model (LLM) benchmark, based on GPT-3 175B, compared to the record-setting NVIDIA submission made last year. Using an AI supercomputer featuring 11,616 NVIDIA H100 Tensor Core GPUs connected with NVIDIA Quantum-2 InfiniBand networking, NVIDIA achieved this remarkable feat through larger scale - more than triple that of the 3,584 H100 GPU submission a year ago - and extensive full-stack engineering.

Thanks to the scalability of the NVIDIA AI platform, Eos can now train massive AI models like GPT-3 175B even faster, and this great AI performance translates into significant business opportunities. For example, in NVIDIA's recent earnings call, we described how LLM service providers can turn a single dollar invested into seven dollars in just four years running the Llama 3 70B model on NVIDIA HGX H200 servers. This return assumes an LLM service provider serving Llama 3 70B at $0.60/M tokens, with an HGX H200 server throughput of 24,000 tokens/second.

Intel Submits Gaudi 2 Results on MLCommons' Newest Benchmark

Today, MLCommons published results of its industry AI performance benchmark, MLPerf Training v4.0. Intel's results demonstrate the choice that Intel Gaudi 2 AI accelerators give enterprises and customers. Community-based software simplifies generative AI (GenAI) development and industry-standard Ethernet networking enables flexible scaling of AI systems. For the first time on the MLPerf benchmark, Intel submitted results on a large Gaudi 2 system (1,024 Gaudi 2 accelerators) trained in Intel Tiber Developer Cloud to demonstrate Gaudi 2 performance and scalability and Intel's cloud capacity for training MLPerf's GPT-3 175B1 parameter benchmark model.

"The industry has a clear need: address the gaps in today's generative AI enterprise offerings with high-performance, high-efficiency compute options. The latest MLPerf results published by MLCommons illustrate the unique value Intel Gaudi brings to market as enterprises and customers seek more cost-efficient, scalable systems with standard networking and open software, making GenAI more accessible to more customers," said Zane Ball, Intel corporate vice president and general manager, DCAI Product Management.

Intel Gaudi 2 Remains Only Benchmarked Alternative to NV H100 for Generative AI Performance

Today, MLCommons published results of the industry-standard MLPerf v4.0 benchmark for inference. Intel's results for Intel Gaudi 2 accelerators and 5th Gen Intel Xeon Scalable processors with Intel Advanced Matrix Extensions (Intel AMX) reinforce the company's commitment to bring "AI Everywhere" with a broad portfolio of competitive solutions. The Intel Gaudi 2 AI accelerator remains the only benchmarked alternative to Nvidia H100 for generative AI (GenAI) performance and provides strong performance-per-dollar. Further, Intel remains the only server CPU vendor to submit MLPerf results. Intel's 5th Gen Xeon results improved by an average of 1.42x compared with 4th Gen Intel Xeon processors' results in MLPerf Inference v3.1.

"We continue to improve AI performance on industry-standard benchmarks across our portfolio of accelerators and CPUs. Today's results demonstrate that we are delivering AI solutions that deliver to our customers' dynamic and wide-ranging AI requirements. Both Intel Gaudi and Xeon products provide our customers with options that are ready to deploy and offer strong price-to-performance advantages," said Zane Ball, Intel corporate vice president and general manager, DCAI Product Management.

Intel Advances Scientific Research and Performance for New Wave of Supercomputers

At SC23, Intel showcased AI-accelerated high performance computing (HPC) with leadership performance for HPC and AI workloads across Intel Data Center GPU Max Series, Intel Gaudi 2 AI accelerators and Intel Xeon processors. In partnership with Argonne National Laboratory, Intel shared progress on the Aurora generative AI (genAI) project, including an update on the 1 trillion parameter GPT-3 LLM on the Aurora supercomputer that is made possible by the unique architecture of the Max Series GPU and the system capabilities of the Aurora supercomputer. Intel and Argonne demonstrated the acceleration of science with applications from the Aurora Early Science Program (ESP) and the Exascale Computing Project. The company also showed the path to Intel Gaudi 3 AI accelerators and Falcon Shores.

"Intel has always been committed to delivering innovative technology solutions to meet the needs of the HPC and AI community. The great performance of our Xeon CPUs along with our Max GPUs and CPUs help propel research and science. That coupled with our Gaudi accelerators demonstrate our full breadth of technology to provide our customers with compelling choices to suit their diverse workloads," said Deepak Patil, Intel corporate vice president and general manager of Data Center AI Solutions.

Intel Joins the MLCommons AI Safety Working Group

Today, Intel announced it is joining the new MLCommons AI Safety (AIS) working group alongside artificial intelligence experts from industry and academia. As a founding member, Intel will contribute its expertise and knowledge to help create a flexible platform for benchmarks that measure the safety and risk factors of AI tools and models. As testing matures, the standard AI safety benchmarks developed by the working group will become a vital element of our society's approach to AI deployment and safety.

"Intel is committed to advancing AI responsibly and making it accessible to everyone. We approach safety concerns holistically and develop innovations across hardware and software to enable the ecosystem to build trustworthy AI. Due to the ubiquity and pervasiveness of large language models, it is crucial to work across the ecosystem to address safety concerns in the development and deployment of AI. To this end, we're pleased to join the industry in defining the new processes, methods and benchmarks to improve AI everywhere," said Deepak Patil, Intel corporate vice president and general manager, Data Center AI Solutions.

SiMa.ai Surpasses NVIDIA Again in MLPerf Closed Edge ResNet50 Benchmark

SiMa.ai, the machine learning company delivering solutions for the embedded edge, today announced the results of its second MLPerf submission, outperforming industry ML leader, NVIDIA's Orin NX and AGX Orin in the Closed Edge power category in the MLCommons ML Perf 3.1 benchmark. SiMa.ai participated in the MLPerf Inference 3.1 closed, edge, power division of this benchmarking process, focusing on the image classification benchmark Resnet50. Since the company's prior submission in April 2023, SiMa.ai achieved a 20 percent improvement in its results for Single Stream Resnet50 for performance and power, while exhibiting up to 85 percent greater Resnet50 MultiStream efficiency compared to NVIDIA. With frames per second per watt as the defacto performance standard for edge AI and ML, these results demonstrate SiMa.ai's pushbutton approach drives continued leadership in unrivaled power efficiency that does not compromise performance.

"Outperforming the industry leader not only once, but again for a second time is great validation for our technology. Our team at SiMa.ai will persistently pursue performance per watt leadership and new standards in ease of use for the embedded edge market as part of our core DNA," said Krishna Rangasayee, CEO and founder, SiMa.ai. "We are proud of the SiMa.ai team's leadership in the latest MLPerf benchmark and excited to extend these latest improvements to our customers' real-world needs and use cases."

Intel Shows Strong AI Inference Performance

Today, MLCommons published results of its MLPerf Inference v3.1 performance benchmark for GPT-J, the 6 billion parameter large language model, as well as computer vision and natural language processing models. Intel submitted results for Habana Gaudi 2 accelerators, 4th Gen Intel Xeon Scalable processors, and Intel Xeon CPU Max Series. The results show Intel's competitive performance for AI inference and reinforce the company's commitment to making artificial intelligence more accessible at scale across the continuum of AI workloads - from client and edge to the network and cloud.

"As demonstrated through the recent MLCommons results, we have a strong, competitive AI product portfolio, designed to meet our customers' needs for high-performance, high-efficiency deep learning inference and training, for the complete spectrum of AI models - from the smallest to the largest - with leading price/performance." -Sandra Rivera, Intel executive vice president and general manager of the Data Center and AI Group

GIGABYTE Leads MLPerf Training v3.0 Benchmarks with Top-Performing Accelerators in GIGABYTE Servers

GIGABYTE Technology: The latest MLPerf Training v3.0 benchmark results are out, and the GIGABYTE G593-SD0 server has emerged as a leader in this round of testing. Going head-to-head against impressive systems, GIGABYTE's servers secured top positions in various categories, showcasing their prowess in handling real-world machine learning use cases. With an unparalleled focus on performance, efficiency, and reliability, GIGABYTE has once again proven its commitment to driving progress in the field of AI.

GIGABYTE, one of the founding members of MLCommons, has been actively contributing to the organization's efforts in designing and planning systems to benchmark fairly. Understanding the importance of replicating real-world scenarios in AI development, GIGABYTE's collaboration with MLCommons has been instrumental in shaping the benchmark tasks to encompass critical use cases such as image recognition, object detection, speech-to-text, natural language processing, and recommendation engines. By actively engaging with end applications, GIGABYTE ensures that its servers are designed to meet the highest standards, delivering supreme performance, and facilitating meaningful comparisons between different ML systems.

NVIDIA Ada Lovelace Successor Set for 2025

According to the NVIDIA roadmap that was spotted in the recently published MLCommons training results, the Ada Lovelace successor is set to come in 2025. The roadmap also reveals the schedule for Hopper Next and Grace Next GPUs, as well as the BlueField-4 DPU.

While the roadmap does not provide a lot of details, it does give us a general idea of when to expect NVIDIA's next GeForce architecture. Since NVIDIA usually launches a new GeForce architecture every two years or so, the latest schedule might sound like a small delay, at least if it plans to launch the Ada Lovelace Next in early 2025 and not later. NVIDIA Pascal was launched in May 2016, Turing in September 2018, Ampere in May 2020, and Ada Lovelace in October 2022.

MLCommons Shares Intel Habana Gaudi2 and 4th Gen Intel Xeon Scalable AI Benchmark Results

Today, MLCommons published results of its industry AI performance benchmark, MLPerf Training 3.0, in which both the Habana Gaudi2 deep learning accelerator and the 4th Gen Intel Xeon Scalable processor delivered impressive training results.

"The latest MLPerf results published by MLCommons validates the TCO value Intel Xeon processors and Intel Gaudi deep learning accelerators provide to customers in the area of AI. Xeon's built-in accelerators make it an ideal solution to run volume AI workloads on general-purpose processors, while Gaudi delivers competitive performance for large language models and generative AI. Intel's scalable systems with optimized, easy-to-program open software lowers the barrier for customers and partners to deploy a broad array of AI-based solutions in the data center from the cloud to the intelligent edge." - Sandra Rivera, Intel executive vice president and general manager of the Data Center and AI Group
Return to Keyword Browsing
Jul 12th, 2025 05:23 CDT change timezone

New Forum Posts

Popular Reviews

TPU on YouTube

Controversial News Posts