Microsoft Azure announced their new ND H100 v5 virtual machine which packs Intel's Sapphire Rapids Xeon Scalable processors with NVIDIA's Hopper H100 GPUs, as well as NVIDIA's Quantum-2 CX7 interconnect. Inside each physical machine sits eight H100s—presumably the SXM5 variant packing a whopping 132 SMs and 528 4th generation tensor cores—interconnected by NVLink 4.0 which ties them all together with 3.6 TB/s bisectional bandwidth. Outside each local machine is a network of thousands more H100s connected together with 400 GB/s Quantum-2 CX7 InfiniBand, which Microsoft says allows 3.2 Tb/s per VM for on-demand scaling to accelerate the largest AI training workloads.
Generative AI solutions like ChatGPT have accelerated demand for multi-ExaOP cloud services that can handle the large training sets and utilize the latest development tools. Azure's new ND H100 v5 VMs offer that capability to organizations of any size, whether you're a smaller startup or a larger company looking to implement large-scale AI training deployments. While Microsoft is not making any direct claims for performance, NVIDIA has advertised H100 as running up to 30x faster than the preceding Ampere architecture that is currently offered with the ND A100 v4 VMs.
Microsoft Azure provides the following technical specifications for the new VMs:
Judging by what we know of NVIDIA Hopper this likely means Microsoft is using either their own racks filled with DGX H100s, or utilizing NVIDIA's DGX SuperPOD which packs the DGX H100s five-high and as many as 16 across for a total 640 GPUs packing 337,920 tensor cores. Don't forget that each DGX H100 also contains two Intel Xeon Scalable processors. Since Microsoft has already specified their systems use Intel's latest Sapphire Rapids Xeons that can feature as many as 60 cores each, than there are potentially 9,600 x86 cores available to help feed those massive GPUs.
Microsoft Azure has opened up the preview of the ND H100 v5 VM service and you can sign up to request access here.
View at TechPowerUp Main Site | Source
Generative AI solutions like ChatGPT have accelerated demand for multi-ExaOP cloud services that can handle the large training sets and utilize the latest development tools. Azure's new ND H100 v5 VMs offer that capability to organizations of any size, whether you're a smaller startup or a larger company looking to implement large-scale AI training deployments. While Microsoft is not making any direct claims for performance, NVIDIA has advertised H100 as running up to 30x faster than the preceding Ampere architecture that is currently offered with the ND A100 v4 VMs.
Microsoft Azure provides the following technical specifications for the new VMs:
- 8x NVIDIA H100 Tensor Core GPUs interconnected via next gen NVSwitch and NVLink 4.0
- 400 Gb/s NVIDIA Quantum-2 CX7 InfiniBand per GPU with 3.2 Tb/s per VM in a non-blocking fat-tree network
- NVSwitch and NVLink 4.0 with 3.6 TB/s bisectional bandwidth between 8 local GPUs within each VM
- 4th Gen Intel Xeon Scalable processors
- PCIE Gen 5 host to GPU interconnect with 64 GB/s bandwidth per GPU
- 16 Channels of 4800 MHz DDR5 DIMMs
Judging by what we know of NVIDIA Hopper this likely means Microsoft is using either their own racks filled with DGX H100s, or utilizing NVIDIA's DGX SuperPOD which packs the DGX H100s five-high and as many as 16 across for a total 640 GPUs packing 337,920 tensor cores. Don't forget that each DGX H100 also contains two Intel Xeon Scalable processors. Since Microsoft has already specified their systems use Intel's latest Sapphire Rapids Xeons that can feature as many as 60 cores each, than there are potentially 9,600 x86 cores available to help feed those massive GPUs.
Microsoft Azure has opened up the preview of the ND H100 v5 VM service and you can sign up to request access here.
View at TechPowerUp Main Site | Source