News Posts matching #Habana Labs

Return to Keyword Browsing

Intel Submits Gaudi 2 Results on MLCommons' Newest Benchmark

Today, MLCommons published results of its industry AI performance benchmark, MLPerf Training v4.0. Intel's results demonstrate the choice that Intel Gaudi 2 AI accelerators give enterprises and customers. Community-based software simplifies generative AI (GenAI) development and industry-standard Ethernet networking enables flexible scaling of AI systems. For the first time on the MLPerf benchmark, Intel submitted results on a large Gaudi 2 system (1,024 Gaudi 2 accelerators) trained in Intel Tiber Developer Cloud to demonstrate Gaudi 2 performance and scalability and Intel's cloud capacity for training MLPerf's GPT-3 175B1 parameter benchmark model.

"The industry has a clear need: address the gaps in today's generative AI enterprise offerings with high-performance, high-efficiency compute options. The latest MLPerf results published by MLCommons illustrate the unique value Intel Gaudi brings to market as enterprises and customers seek more cost-efficient, scalable systems with standard networking and open software, making GenAI more accessible to more customers," said Zane Ball, Intel corporate vice president and general manager, DCAI Product Management.

Intel Launches Gaudi 3 AI Accelerator: 70% Faster Training, 50% Faster Inference Compared to NVIDIA H100, Promises Better Efficiency Too

During the Vision 2024 event, Intel announced its latest Gaudi 3 AI accelerator, promising significant improvements over its predecessor. Intel claims the Gaudi 3 offers up to 70% improvement in training performance, 50% better inference, and 40% better efficiency than Nvidia's H100 processors. The new AI accelerator is presented as a PCIe Gen 5 dual-slot add-in card with a 600 W TDP or an OAM module with 900 W. The PCIe card has the same peak 1,835 TeraFLOPS of FP8 performance as the OAM module despite a 300 W lower TDP. The PCIe version works as a group of four per system, while the OAM HL-325L modules can be run in an eight-accelerator configuration per server. This likely will result in a lower sustained performance, given the lower TDP, but it confirms that the same silicon is used, just finetuned with a lower frequency. Built on TSMC's N5 5 nm node, the AI accelerator features 64 Tensor Cores, delivering double the FP8 and quadruple FP16 performance over the previous generation Gaudi 2.

The Gaudi 3 AI chip comes with 128 GB of HBM2E with 3.7 TB/s of bandwidth and 24 200 Gbps Ethernet NICs, with dual 400 Gbps NICs used for scale-out. All of that is laid out on 10 tiles that make up the Gaudi 3 accelerator, which you can see pictured below. There is 96 MB of SRAM split between two compute tiles, which acts as a low-level cache that bridges data communication between Tensor Cores and HBM memory. Intel also announced support for the new performance-boosting standardized MXFP4 data format and is developing an AI NIC ASIC for Ultra Ethernet Consortium-compliant networking. The Gaudi 3 supports clusters of up to 8192 cards, coming from 1024 nodes comprised of systems with eight accelerators. It is on track for volume production in Q3, offering a cost-effective alternative to NVIDIA accelerators with the additional promise of a more open ecosystem. More information and a deeper dive can be found in the Gaudi 3 Whitepaper.

Intel Preparing Habana "Gaudi2C" SKU for the Chinese AI Market

Intel's software team has added support in its open-source Linux drivers for an unannounced Habana "Gaudi2C" AI accelerator variant. Little is documented about the mystery Gaudi2C, which shares a core identity with Intel's flagship Gaudi2 data center training and inference chip, otherwise broadly available. The new revision is distinguished only by a PCI ID of "3" in the latest patch set for Linux 6.8. Speculations circulate that Gaudi2C may be a version tailored to meet China-specific demands, similar to Intel's Gaudi2 HL-225B SKU launched in July with reduced interconnect links. With US export bans restricting sales of advanced hardware to China, including Intel's leading Gaudi2 products, creating reduced-capability spinoffs that meet export regulations lets Intel maintain crucial Chinese revenue.

Meanwhile, Intel's upstream Linux contributions remain focused on hardening Gaudi/Gaudi2 support, now considered "very stable" by lead driver developer Oded Gabbay. Minor new additions reflect maturity, not instability. The open-sourced foundations contrast NVIDIA's proprietary driver model, a key Intel competitive argument for service developers using Habana Labs hardware. With the SynapseAI software suite reaching stability, some enterprises could consider Gaudi accelerators as an alternative to NVIDIA. And with Gaudi3 arriving next year, the ecosystem will get a better competitive advantage with increased performance targets.
Return to Keyword Browsing
Nov 19th, 2024 01:53 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts