• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Intel Launches Gaudi 3 AI Accelerator: 70% Faster Training, 50% Faster Inference Compared to NVIDIA H100, Promises Better Efficiency Too

AleksandarK

News Editor
Staff member
Joined
Aug 19, 2017
Messages
2,644 (0.99/day)
During the Vision 2024 event, Intel announced its latest Gaudi 3 AI accelerator, promising significant improvements over its predecessor. Intel claims the Gaudi 3 offers up to 70% improvement in training performance, 50% better inference, and 40% better efficiency than Nvidia's H100 processors. The new AI accelerator is presented as a PCIe Gen 5 dual-slot add-in card with a 600 W TDP or an OAM module with 900 W. The PCIe card has the same peak 1,835 TeraFLOPS of FP8 performance as the OAM module despite a 300 W lower TDP. The PCIe version works as a group of four per system, while the OAM HL-325L modules can be run in an eight-accelerator configuration per server. This likely will result in a lower sustained performance, given the lower TDP, but it confirms that the same silicon is used, just finetuned with a lower frequency. Built on TSMC's N5 5 nm node, the AI accelerator features 64 Tensor Cores, delivering double the FP8 and quadruple FP16 performance over the previous generation Gaudi 2.

The Gaudi 3 AI chip comes with 128 GB of HBM2E with 3.7 TB/s of bandwidth and 24 200 Gbps Ethernet NICs, with dual 400 Gbps NICs used for scale-out. All of that is laid out on 10 tiles that make up the Gaudi 3 accelerator, which you can see pictured below. There is 96 MB of SRAM split between two compute tiles, which acts as a low-level cache that bridges data communication between Tensor Cores and HBM memory. Intel also announced support for the new performance-boosting standardized MXFP4 data format and is developing an AI NIC ASIC for Ultra Ethernet Consortium-compliant networking. The Gaudi 3 supports clusters of up to 8192 cards, coming from 1024 nodes comprised of systems with eight accelerators. It is on track for volume production in Q3, offering a cost-effective alternative to NVIDIA accelerators with the additional promise of a more open ecosystem. More information and a deeper dive can be found in the Gaudi 3 Whitepaper.



View at TechPowerUp Main Site | Source
 

Space Lynx

Astronaut
Joined
Oct 17, 2014
Messages
17,417 (4.69/day)
Location
Kepler-186f
Processor 7800X3D -25 all core
Motherboard B650 Steel Legend
Cooling Frost Commander 140
Video Card(s) Merc 310 7900 XT @3100 core -.75v
Display(s) Agon 27" QD-OLED Glossy 240hz 1440p
Case NZXT H710 (Red/Black)
Audio Device(s) Asgard 2, Modi 3, HD58X
Power Supply Corsair RM850x Gold
Shares a production line on TSMC? lol bad move Intel. Nvidia already bought all the production time from TSMC. Vaporware.
 
Joined
Dec 29, 2010
Messages
3,809 (0.75/day)
Processor AMD 5900x
Motherboard Asus x570 Strix-E
Cooling Hardware Labs
Memory G.Skill 4000c17 2x16gb
Video Card(s) RTX 3090
Storage Sabrent
Display(s) Samsung G9
Case Phanteks 719
Audio Device(s) Fiio K5 Pro
Power Supply EVGA 1000 P2
Mouse Logitech G600
Keyboard Corsair K95
Pat is begging Wallstreet to believe... lmao.
 
Joined
Jan 2, 2019
Messages
147 (0.07/day)
Simply to note: Intel evaluates performance of its latest hardware with already outdated line of NVIDIA H100 accelerators.
 
Joined
May 3, 2018
Messages
2,881 (1.19/day)
Hardly a single person in the AI field believes Intel can be trusted to support the hardware long-term and then there is also the question of their software SYCL no one uses. ROCm on the other hand is well liked. Nearly all analysts says only AMD is is competitor to Nvidia.
 
Joined
Aug 22, 2007
Messages
3,589 (0.57/day)
Location
Terra
System Name :)
Processor Intel 13700k
Motherboard Gigabyte z790 UD AC
Cooling Noctua NH-D15
Memory 64GB GSKILL DDR5
Video Card(s) Gigabyte RTX 4090 Gaming OC
Storage 960GB Optane 905P U.2 SSD + 4TB PCIe4 U.2 SSD
Display(s) Alienware AW3423DW 175Hz QD-OLED + AOC Agon Pro AG276QZD2 240Hz QD-OLED
Case Fractal Design Torrent
Audio Device(s) MOTU M4 - JBL 305P MKII w/2x JL Audio 10 Sealed --- X-Fi Titanium HD - Presonus Eris E5 - JBL 4412
Power Supply Silverstone 1000W
Mouse Roccat Kain 122 AIMO
Keyboard KBD67 Lite / Mammoth75
VR HMD Reverb G2 V2
Software Win 11 Pro
Hardly a single person in the AI field believes Intel can be trusted to support the hardware long-term and then there is also the question of their software SYCL no one uses. ROCm on the other hand is well liked. Nearly all analysts says only AMD is is competitor to Nvidia.
Analysts lmao
 
Joined
Apr 19, 2018
Messages
1,227 (0.50/day)
Processor AMD Ryzen 9 5950X
Motherboard Asus ROG Crosshair VIII Hero WiFi
Cooling Arctic Liquid Freezer II 420
Memory 32Gb G-Skill Trident Z Neo @3806MHz C14
Video Card(s) MSI GeForce RTX2070
Storage Seagate FireCuda 530 1TB
Display(s) Samsung G9 49" Curved Ultrawide
Case Cooler Master Cosmos
Audio Device(s) O2 USB Headphone AMP
Power Supply Corsair HX850i
Mouse Logitech G502
Keyboard Cherry MX
Software Windows 11
If only there was an open-source alternative to CUDA. nGreedia would be back to "for the gamers" in a heartbeat!
 

Space Lynx

Astronaut
Joined
Oct 17, 2014
Messages
17,417 (4.69/day)
Location
Kepler-186f
Processor 7800X3D -25 all core
Motherboard B650 Steel Legend
Cooling Frost Commander 140
Video Card(s) Merc 310 7900 XT @3100 core -.75v
Display(s) Agon 27" QD-OLED Glossy 240hz 1440p
Case NZXT H710 (Red/Black)
Audio Device(s) Asgard 2, Modi 3, HD58X
Power Supply Corsair RM850x Gold
Intel will use 5nm. Nvidia Blackwell will use N4P.

won't AI still be using 5nm node though? and that node is sold out until late 2025 last I read.
 
Joined
Aug 21, 2013
Messages
1,935 (0.47/day)
won't AI still be using 5nm node though? and that node is sold out until late 2025 last I read.
Both nodes are confirmed and tho they are both technically "5nm class" then i doubt that Intel will have capacity issues at TSMC.
I mean it also depends on demand. I doubt that the demand for Intel's products will rise that sharply.

Actually Intel using 5nm is a sign that they have not managed to secure more advanced nodes from TSMC. Nvidia already confirmed that they will use N4P and i suspect AMD will the same or similar. A top tier AI product going into volume production in Q3 2024 is generally expected to be made on 4nm or even 3nm already, not 5nm.
 
Joined
May 26, 2021
Messages
138 (0.11/day)
won't AI still be using 5nm node though? and that node is sold out until late 2025 last I read.
And you think Intel just started their supply chain activities? Sold out to who? I think the answer is in your question itself.
 
Joined
May 30, 2015
Messages
1,941 (0.56/day)
Location
Seattle, WA
People don't understand how important those efficiency numbers are over H100. Right now it's not always a matter of how fast the core is, it's a matter of how many can be running at once. Even if Intel's chip was slower, if they offered more performance per watt than NVIDIA or AMD they would be a better buy. In the US at least we are literally hitting the limit of how many of these massive AI clusters we can have operating. A single state's power grid can only sustain maybe 100,000 H100 systems before capacity is exceeded. This creates an incredibly massive network bottleneck as all these clusters have to be sharing the load across great distances to train new models. If Intel comes in and says, "Hey, we can put 180,000 units into your maximum power budget with higher performance," that's a big win for them.

The other factor is availability, which we have seen referenced a few times now in regards to NVIDIA. NVIDIA has been delaying orders, withholding systems, or outright limiting purchase quantities with many of their clients that can't wait for the long lead times on getting H100 systems in hand and running. Intel and AMD are in prime position to take those clients from NVIDIA, and if Intel shows they can outright beat the H100s that these clients have already likely been trying to buy, and can offer shorter delivery windows, they become the defacto choice for those clients.
 
Joined
Aug 22, 2007
Messages
3,589 (0.57/day)
Location
Terra
System Name :)
Processor Intel 13700k
Motherboard Gigabyte z790 UD AC
Cooling Noctua NH-D15
Memory 64GB GSKILL DDR5
Video Card(s) Gigabyte RTX 4090 Gaming OC
Storage 960GB Optane 905P U.2 SSD + 4TB PCIe4 U.2 SSD
Display(s) Alienware AW3423DW 175Hz QD-OLED + AOC Agon Pro AG276QZD2 240Hz QD-OLED
Case Fractal Design Torrent
Audio Device(s) MOTU M4 - JBL 305P MKII w/2x JL Audio 10 Sealed --- X-Fi Titanium HD - Presonus Eris E5 - JBL 4412
Power Supply Silverstone 1000W
Mouse Roccat Kain 122 AIMO
Keyboard KBD67 Lite / Mammoth75
VR HMD Reverb G2 V2
Software Win 11 Pro
As in people working in the field. You butt hurt Intel isn't a player?
I like how you turn this into a personal attack. I know what an analyst is... something you don't based on your reply of "As in people working in the field."
I never mentioned Intel and don't have a horse in the race. I merely laughed at using analysts as a source of truth. Analysts often times have a vested interest in steering people's opinion (and money) in certain directions.
 

Ahhzz

Super Moderator
Staff member
Joined
Feb 27, 2008
Messages
8,994 (1.46/day)
System Name OrangeHaze / Silence
Processor i7-13700KF / i5-10400 /
Motherboard ROG STRIX Z690-E / MSI Z490 A-Pro Motherboard
Cooling Corsair H75 / TT ToughAir 510
Memory 64Gb GSkill Trident Z5 / 32GB Team Dark Za 3600
Video Card(s) Palit GeForce RTX 2070 / Sapphire R9 290 Vapor-X 4Gb
Storage Hynix Plat P41 2Tb\Samsung MZVL21 1Tb / Samsung 980 Pro 1Tb
Display(s) 22" Dell Wide/24" Asus
Case Lian Li PC-101 ATX custom mod / Antec Lanboy Air Black & Blue
Audio Device(s) SB Audigy 7.1
Power Supply Corsair Enthusiast TX750
Mouse Logitech G502 Lightspeed Wireless / Logitech G502 Proteus Spectrum
Keyboard K68 RGB — CHERRY® MX Red
Software Win10 Pro \ RIP:Win 7 Ult 64 bit
This is not the thread, the section, or the forums for personal attacks. Keep it aimed at the topic and not each other. Only warning.
 
Top