- Joined
- Aug 19, 2017
- Messages
- 2,579 (0.97/day)
NVIDIA's H100 accelerator is one of the most powerful solutions for powering AI workloads. And, of course, every company and government wants to use it to power its AI workload. However, in countries like China, shipment of US-made goods is challenging. With export regulations in place, NVIDIA had to get creative and make a specific version of its H100 GPU for the Chinese market, labeled the H800 model. Late last year, NVIDIA also created a China-specific version of the A100 model called A800, with the only difference being the chip-to-chip interconnect bandwidth being dropped from 600 GB/s to 400 GB/s.
This year's H800 SKU also features similar restrictions, and the company appears to have made similar sacrifices for shipping its chips to China. From the 600 GB/s bandwidth of the regular H100 PCIe model, the H800 is gutted to only 300 GB/s of bi-directional chip-to-chip interconnect bandwidth speed. While we have no data if the CUDA or Tensor core count has been adjusted, the sacrifice of bandwidth to comply with export regulations will have consequences. As the communication speed is reduced, training large models will increase the latency and slow the workload compared to the regular H100 chip. This is due to the massive data size that needs to travel from one chip to another. According to Reuters, an NVIDIA spokesperson declined to discuss other differences, stating that "our 800 series products are fully compliant with export control regulations."
View at TechPowerUp Main Site | Source
This year's H800 SKU also features similar restrictions, and the company appears to have made similar sacrifices for shipping its chips to China. From the 600 GB/s bandwidth of the regular H100 PCIe model, the H800 is gutted to only 300 GB/s of bi-directional chip-to-chip interconnect bandwidth speed. While we have no data if the CUDA or Tensor core count has been adjusted, the sacrifice of bandwidth to comply with export regulations will have consequences. As the communication speed is reduced, training large models will increase the latency and slow the workload compared to the regular H100 chip. This is due to the massive data size that needs to travel from one chip to another. According to Reuters, an NVIDIA spokesperson declined to discuss other differences, stating that "our 800 series products are fully compliant with export control regulations."
View at TechPowerUp Main Site | Source