Microsoft Azure Announces New Scalable Generative AI VMs Featuring NVIDIA H100

Fouquin · Mar 14, 2023

Microsoft Azure announced their new ND H100 v5 virtual machine which packs Intel's Sapphire Rapids Xeon Scalable processors with NVIDIA's Hopper H100 GPUs, as well as NVIDIA's Quantum-2 CX7 interconnect. Inside each physical machine sits eight H100s—presumably the SXM5 variant packing a whopping 132 SMs and 528 4th generation tensor cores—interconnected by NVLink 4.0 which ties them all together with 3.6 TB/s bisectional bandwidth. Outside each local machine is a network of thousands more H100s connected together with 400 GB/s Quantum-2 CX7 InfiniBand, which Microsoft says allows 3.2 Tb/s per VM for on-demand scaling to accelerate the largest AI training workloads.

Generative AI solutions like ChatGPT have accelerated demand for multi-ExaOP cloud services that can handle the large training sets and utilize the latest development tools. Azure's new ND H100 v5 VMs offer that capability to organizations of any size, whether you're a smaller startup or a larger company looking to implement large-scale AI training deployments. While Microsoft is not making any direct claims for performance, NVIDIA has advertised H100 as running up to 30x faster than the preceding Ampere architecture that is currently offered with the ND A100 v4 VMs.

Microsoft Azure provides the following technical specifications for the new VMs:

8x NVIDIA H100 Tensor Core GPUs interconnected via next gen NVSwitch and NVLink 4.0
400 Gb/s NVIDIA Quantum-2 CX7 InfiniBand per GPU with 3.2 Tb/s per VM in a non-blocking fat-tree network
NVSwitch and NVLink 4.0 with 3.6 TB/s bisectional bandwidth between 8 local GPUs within each VM
4th Gen Intel Xeon Scalable processors
PCIE Gen 5 host to GPU interconnect with 64 GB/s bandwidth per GPU
16 Channels of 4800 MHz DDR5 DIMMs

Judging by what we know of NVIDIA Hopper this likely means Microsoft is using either their own racks filled with DGX H100s, or utilizing NVIDIA's DGX SuperPOD which packs the DGX H100s five-high and as many as 16 across for a total 640 GPUs packing 337,920 tensor cores. Don't forget that each DGX H100 also contains two Intel Xeon Scalable processors. Since Microsoft has already specified their systems use Intel's latest Sapphire Rapids Xeons that can feature as many as 60 cores each, than there are potentially 9,600 x86 cores available to help feed those massive GPUs.

Microsoft Azure has opened up the preview of the ND H100 v5 VM service and you can sign up to request access here.

View at TechPowerUp Main Site | Source

Denver · Mar 14, 2023

Wow, Xeon still exists and Microsoft insists on using it for some random and unknown reason lol

Easo · Mar 14, 2023

Denver said:
Wow, Xeon still exists and Microsoft insists on using it for some random and unknown reason lol

Uhhh... what now? Is this supposed to be a joke?

Jism · Mar 14, 2023

Buying large quantities of CPU's kind of guarantees you a discount as well. Intel is known for it.

Denver · Mar 14, 2023

Easo said:
Uhhh... what now? Is this supposed to be a joke?

https://www.google.com/amp/s/www.hardwaretimes.com/amds-96-core-epyc-genoa-cpu-is-over-70-faster-than-intels-xeon-sapphire-rapids-flagship-in-2s-mode/amp/

Considering the Xeon loses in pretty much every possible way, I think it's a pretty good joke. Maybe intel has returned to the strategy of giving generous discounts, I hope it doesn't suffer any more lawsuits.

Easo · Mar 14, 2023

Denver said:
https://www.google.com/amp/s/www.hardwaretimes.com/amds-96-core-epyc-genoa-cpu-is-over-70-faster-than-intels-xeon-sapphire-rapids-flagship-in-2s-mode/amp/

Considering the Xeon loses in pretty much every possible way, I think it's a pretty good joke. Maybe intel has returned to the strategy of giving generous discounts, I hope it doesn't suffer any more lawsuits.

Every major purchase at that level has bulk discounts. You can also be sure that this deal was made quite some time before, likely even before that Epyc came out. Also... Let's be honest, 70% difference sreams like cherrypicking at best.

System Name	Personal \\ Work - HP EliteBook 840 G6
Processor	7700X \\ i7-8565U
Motherboard	Asrock X670E PG Lightning
Cooling	Noctua DH-15
Memory	G.SKILL Trident Z5 RGB Black 32GB 6000MHz CL36 \\ 16GB DDR4-2400
Video Card(s)	ASUS RoG Strix 1070 Ti \\ Intel UHD Graphics 620
Storage	2x KC3000 2TB, Samsung 970 EVO 512GB \\ OEM 256GB NVMe SSD
Display(s)	BenQ XL2411Z \\ FullHD + 2x HP Z24i external screens via docking station
Case	Fractal Design Define Arc Midi R2 with window
Audio Device(s)	Realtek ALC1150 with Logitech Z533
Power Supply	Corsair AX860i
Mouse	Logitech G502
Keyboard	Corsair K55 RGB PRO
Software	Windows 11 \\ Windows 10

System Name	Personal \\ Work - HP EliteBook 840 G6
Processor	7700X \\ i7-8565U
Motherboard	Asrock X670E PG Lightning
Cooling	Noctua DH-15
Memory	G.SKILL Trident Z5 RGB Black 32GB 6000MHz CL36 \\ 16GB DDR4-2400
Video Card(s)	ASUS RoG Strix 1070 Ti \\ Intel UHD Graphics 620
Storage	2x KC3000 2TB, Samsung 970 EVO 512GB \\ OEM 256GB NVMe SSD
Display(s)	BenQ XL2411Z \\ FullHD + 2x HP Z24i external screens via docking station
Case	Fractal Design Define Arc Midi R2 with window
Audio Device(s)	Realtek ALC1150 with Logitech Z533
Power Supply	Corsair AX860i
Mouse	Logitech G502
Keyboard	Corsair K55 RGB PRO
Software	Windows 11 \\ Windows 10

Microsoft Azure Announces New Scalable Generative AI VMs Featuring NVIDIA H100

Fouquin

Staff

Denver

Easo

Jism

Denver

Easo