Monday, April 7th 2025

Industry's First-to-Market Supermicro NVIDIA HGX B200 Systems Demonstrate AI Performance Leadership
Super Micro Computer, Inc. (SMCI), a Total IT Solution Provider for AI/ML, HPC, Cloud, Storage, and 5G/Edge, has announced first-to-market industry leading performance on several MLPerf Inference v5.0 benchmarks, using the 8-GPU. The 4U liquid-cooled and 10U air-cooled systems achieved the best performance in select benchmarks. Supermicro demonstrated more than 3 times the tokens per second (Token/s) generation for Llama2-70B and Llama3.1-405B benchmarks compared to H200 8-GPU systems. "Supermicro remains a leader in the AI industry, as evidenced by the first new benchmarks released by MLCommons in 2025," said Charles Liang, president and CEO of Supermicro. "Our building block architecture enables us to be first-to-market with a diverse range of systems optimized for various workloads. We continue to collaborate closely with NVIDIA to fine-tune our systems and secure a leadership position in AI workloads." Learn more about the new MLPerf v5.0 Inference benchmarks here.
Supermicro is the only system vendor publishing record MLPerf inference performance (on select benchmarks) for both the air-cooled and liquid-cooled NVIDIA HGX B200 8-GPU systems. Both air-cooled and liquid-cooled systems were operational before the MLCommons benchmark start date. Supermicro engineers optimized the systems and software to showcase the impressive performance. Within the operating margin, the Supermicro air-cooled B200 system exhibited the same level of performance as the liquid-cooled B200 system. Supermicro has been delivering these systems to customers while we conducted the benchmarks. MLCommons emphasizes that all results be reproducible, that the products are available and that the results can be audited by other MLCommons members. Supermicro engineers optimized the systems and software, as allowed by the MLCommons rules.The SYS-421GE-NBRT-LCC (8x NVIDIA B200-SXM-180 GB) and SYS-A21GE-NBRT (8x NVIDIA B200-SXM-180 GB) showed performance leadership running the Mixtral 8x7B Inference, Mixture of Experts benchmarks with 129,000 tokens/second. The Supermicro air-cooled and liquid-cooled NVIDIA B200 based system delivered over 1,000 tokens/second inference for the large Llama3.1-405b model, whereas the previous generations of GPU systems have much smaller results. For smaller inferencing tasks, using the LLAMA2-70b benchmark, a Supermicro system with the NVIDIA B200 SXM-180 GB installed shows the highest performance from a Tier 1 system supplier.
Specifically:
#1 queries/s, 28.92
#1 Tokens/s, 62,265.70
#1 Tokens/s 1521.74
#1 Tokens/s, 1080.31 (for an 8-GPU node)
#1 Tokens/s, 129,047.00
#1 Tokens/s, 128,795.00
"MLCommons congratulates Supermicro on their submission to the MLPerf Inference v5.0 benchmark. We are pleased to see their results showcasing significant performance gains compared to earlier generations of systems," said David Kanter, Head of MLPerf at MLCommons. "Customers will be pleased by the performance improvements achieved which are validated by the neutral, representative and reproducible MLPerf results." Supermicro offers a comprehensive AI portfolio with over 100 GPU-optimized systems, both air-cooled and liquid-cooled options, with a choice of CPUs, ranging from single-socket optimized systems to 8-way multiprocessor systems. Supermicro rack-scale systems include computing, storage, and network components, which reduce the time required to install them once they are delivered to a customer site.
Supermicro's NVIDIA HGX B200 8-GPU systems utilize next-generation liquid-cooling and air-cooling technology. The newly developed cold plates and the new 250kW coolant distribution unit (CDU) more than double the cooling capacity of the previous generation in the same 4U form factor. Available in 42U, 48U, or 52U configurations, the rack-scale design with the new vertical coolant distribution manifolds (CDM) no longer occupies valuable rack units. This enables eight systems, comprising 64 NVIDIA Blackwell GPUs in a 42U rack, and up to 12 systems with 96 NVIDIA Blackwell GPUs in a 52U rack.
The new air-cooled 10U NVIDIA HGX B200 system features a redesigned chassis with expanded thermal headroom to accommodate eight 1000 W TDP Blackwell GPUs. Up to 4 of the new 10U air-cooled systems can be installed and fully integrated in a rack, the same density as the previous generation, while providing up to 15x inference and 3x training performance.
Source:
Supermicro News
Supermicro is the only system vendor publishing record MLPerf inference performance (on select benchmarks) for both the air-cooled and liquid-cooled NVIDIA HGX B200 8-GPU systems. Both air-cooled and liquid-cooled systems were operational before the MLCommons benchmark start date. Supermicro engineers optimized the systems and software to showcase the impressive performance. Within the operating margin, the Supermicro air-cooled B200 system exhibited the same level of performance as the liquid-cooled B200 system. Supermicro has been delivering these systems to customers while we conducted the benchmarks. MLCommons emphasizes that all results be reproducible, that the products are available and that the results can be audited by other MLCommons members. Supermicro engineers optimized the systems and software, as allowed by the MLCommons rules.The SYS-421GE-NBRT-LCC (8x NVIDIA B200-SXM-180 GB) and SYS-A21GE-NBRT (8x NVIDIA B200-SXM-180 GB) showed performance leadership running the Mixtral 8x7B Inference, Mixture of Experts benchmarks with 129,000 tokens/second. The Supermicro air-cooled and liquid-cooled NVIDIA B200 based system delivered over 1,000 tokens/second inference for the large Llama3.1-405b model, whereas the previous generations of GPU systems have much smaller results. For smaller inferencing tasks, using the LLAMA2-70b benchmark, a Supermicro system with the NVIDIA B200 SXM-180 GB installed shows the highest performance from a Tier 1 system supplier.
Specifically:
- Stable Diffusion XL (Server)
#1 queries/s, 28.92
- llama2-70b-interactive-99 (Server)
#1 Tokens/s, 62,265.70
- Llama3.1-405b (offline)
#1 Tokens/s 1521.74
- Llama3.1-405b (Server)
#1 Tokens/s, 1080.31 (for an 8-GPU node)
- mixtral-8x7b (Server)
#1 Tokens/s, 129,047.00
- mixtral-8x7b (Offline)
#1 Tokens/s, 128,795.00
"MLCommons congratulates Supermicro on their submission to the MLPerf Inference v5.0 benchmark. We are pleased to see their results showcasing significant performance gains compared to earlier generations of systems," said David Kanter, Head of MLPerf at MLCommons. "Customers will be pleased by the performance improvements achieved which are validated by the neutral, representative and reproducible MLPerf results." Supermicro offers a comprehensive AI portfolio with over 100 GPU-optimized systems, both air-cooled and liquid-cooled options, with a choice of CPUs, ranging from single-socket optimized systems to 8-way multiprocessor systems. Supermicro rack-scale systems include computing, storage, and network components, which reduce the time required to install them once they are delivered to a customer site.
Supermicro's NVIDIA HGX B200 8-GPU systems utilize next-generation liquid-cooling and air-cooling technology. The newly developed cold plates and the new 250kW coolant distribution unit (CDU) more than double the cooling capacity of the previous generation in the same 4U form factor. Available in 42U, 48U, or 52U configurations, the rack-scale design with the new vertical coolant distribution manifolds (CDM) no longer occupies valuable rack units. This enables eight systems, comprising 64 NVIDIA Blackwell GPUs in a 42U rack, and up to 12 systems with 96 NVIDIA Blackwell GPUs in a 52U rack.
The new air-cooled 10U NVIDIA HGX B200 system features a redesigned chassis with expanded thermal headroom to accommodate eight 1000 W TDP Blackwell GPUs. Up to 4 of the new 10U air-cooled systems can be installed and fully integrated in a rack, the same density as the previous generation, while providing up to 15x inference and 3x training performance.
Comments on Industry's First-to-Market Supermicro NVIDIA HGX B200 Systems Demonstrate AI Performance Leadership
There are no comments yet.