News Posts matching #H800

Return to Keyword Browsing

Microsoft Acquired Nearly 500,000 NVIDIA "Hopper" GPUs This Year

Microsoft is heavily investing in enabling its company and cloud infrastructure to support the massive AI expansion. The Redmond giant has acquired nearly half a million of the NVIDIA "Hopper" family of GPUs to support this effort. According to market research company Omdia, Microsoft was the biggest hyperscaler, with data center CapEx and GPU expenditure reaching a record high. The company acquired precisely 485,000 NVIDIA "Hopper" GPUs, including H100, H200, and H20, resulting in more than $30 billion spent on servers alone. To put things into perspective, this is about double that of the next-biggest GPU purchaser, Chinese ByteDance, who acquired about 230,000 sanction-abiding H800 GPUs and regular H100s sources from third parties.

Regarding US-based companies, the only ones that have come close to the GPU acquisition rate are Meta, Tesla/xAI, Amazon, and Google. They have acquired around 200,000 GPUs on average while significantly boosting their in-house chip design efforts. "NVIDIA GPUs claimed a tremendously high share of the server capex," Vlad Galabov, director of cloud and data center research at Omdia, noted, adding, "We're close to the peak." Hyperscalers like Amazon, Google, and Meta have been working on their custom solutions for AI training and inference. For example, Google has its TPU, Amazon has its Trainium and Inferentia chips, and Meta has its MTIA. Hyperscalers are eager to develop their in-house solutions, but NVIDIA's grip on the software stack paired with timely product updates seems hard to break. The latest "Blackwell" chips are projected to get even bigger orders, so only the sky (and the local power plant) is the limit.

U.S. Updates Advanced Semiconductor Ban, Actual Impact on the Industry Will Be Insignificant

On March 29th, the United States announced another round of updates to its export controls, targeting advanced computing, supercomputers, semiconductor end-uses, and semiconductor manufacturing products. These new regulations, which took effect on April 4th, are designed to prevent certain countries and businesses from circumventing U.S. restrictions to access sensitive chip technologies and equipment. Despite these tighter controls, TrendForce believes the practical impact on the industry will be minimal.

The latest updates aim to refine the language and parameters of previous regulations, tightening the criteria for exports to Macau and D:5 countries (China, North Korea, Russia, Iran, etc.). They require a detailed examination of all technology products' Total Processing Performance (TPP) and Performance Density (PD). If a product exceeds certain computing power thresholds, it must undergo a case-by-case review. Nevertheless, a new provision, Advanced Computing Authorized (ACA), allows for specific exports and re-exports among selected countries, including the transshipment of particular products between Macau and D:5 countries.

Chinese Research Institute Utilizing "Banned" NVIDIA H100 AI GPUs

NVIDIA's freshly unveiled "Blackwell" B200 and GB200 AI GPUs will be getting plenty of coverage this year, but many organizations will be sticking with current or prior generation hardware. Team Green is in the process of shipping out compromised "Hopper" designs to customers in China, but the region's appetite for powerful AI-crunching hardware is growing. Last year's China-specific H800 design, and the older "Ampere" A800 chip were deemed too potent—new regulations prevented further sales. Recently, AMD's Instinct MI309 AI accelerator was considered "too powerful to gain unconditional approval from the US Department of Commerce." Natively-developed solutions are catching up with Western designs, but some institutions are not prepared to queue up for emerging technologies.

NVIDIA's new H20 AI GPU as well as Ada Lovelace-based L20 PCIe and L2 PCIe models are weakened enough to get a thumbs up from trade regulators, but likely not compelling enough for discerning clients. The Telegraph believes that NVIDIA's uncompromised H100 AI GPU is currently in use at several Chinese establishments—the report cites information presented within four academic papers published on ArXiv, an open access science website. The Telegraph's news piece highlights one of the studies—it was: "co-authored by a researcher at 4paradigm, an AI company that was last year placed on an export control list by the US Commerce Department for attempting to acquire US technology to support China's military." Additionally, the Chinese Academy of Sciences appears to have conducted several AI-accelerated experiments, involving the solving of complex mathematical and logical problems. The article suggests that this research organization has acquired a very small batch of NVIDIA H100 GPUs (up to eight units). A "thriving black market" for high-end NVIDIA processors has emerged in the region—last Autumn, the Center for a New American Security (CNAS) published an in-depth article about ongoing smuggling activities.

AMD Stalls on Instinct MI309 China AI Chip Launch Amid US Export Hurdles

According to the latest report from Bloomberg, AMD has hit a roadblock in offering its top-of-the-line AI accelerator in the Chinese market. The newest AI chip is called Instinct MI309, a lower-performance Instinct MI300 variant tailored to meet the latest US export rules for selling advanced chips to China-based entities. However, the Instinct MI309 still appears too powerful to gain unconditional approval from the US Department of Commerce, leaving AMD in need of an export license. Originally, the US Department of Commerce made a rule: Total Processing Performance (TPP) score should not exceed 4800, effectively capping AI performance at 600 FP8 TFLOPS. This rule ensures that processors with slightly lower performance may still be sold to Chinese customers, provided their performance density (PD) is sufficiently low.

However, AMD's latest creation, Instinct MI309, is everything but slow. Based on the powerful Instinct MI300, AMD has not managed to bring it down to acceptable levels to acquire a US export license from the Department of Commerce. It is still unknown which Chinese customer was trying to acquire AMD's Instinct MI309; however, it could be one of the Chinese AI labs trying to get ahold of more training hardware for their domestic models. NVIDIA has employed a similar tactic, selling A800 and H800 chips to China, until the US also ended the export of these chips to China. AI labs located in China can only use domestic hardware, including accelerators from Alibaba, Huawei, and Baidu. Cloud services hosting GPUs in US can still be accessed by Chinese companies, but that is currently under US regulators watchlist.

NVIDIA Readying H20 AI GPU for Chinese Market

NVIDIA's H800 AI GPU was rolled out last year to appease the Sanction Gods—but later on, the US Government deemed the cutdown "Hopper" part to be far too potent for Team Green's Chinese enterprise customers. Last October, newly amended export conditions banned sales of the H800, as well as the slightly older (plus similarly gimped) A800 "Ampere" GPU in the region. NVIDIA's engineering team returned to the drawing board, and developed a new range of compliantly weakened products. An exclusive Reuters report suggests that Team Green is taking pre-orders for a refreshed "Hopper" GPU—the latest China-specific flagship is called "HGX H20." NVIDIA web presences have not been updated with this new model, as well as Ada Lovelace-based L20 PCIe and L2 PCIe GPUs. Huawei's competing Ascend 910B is said to be slightly more performant in "some areas"—when compared to the H20—according to insiders within the distribution network.

The leakers reckon that NVIDIA's mainland distributors will be selling H20 models within a price range of $12,000 - $15,000—Huawei's locally developed Ascend 910B is priced at 120,000 RMB (~$16,900). One Reuters source stated that: "some distributors have started advertising the (NVIDIA H20) chips with a significant markup to the lower end of that range at about 110,000 yuan ($15,320). The report suggests that NVIDIA refused to comment on this situation. Another insider claimed that: "distributors are offering H20 servers, which are pre-configured with eight of the AI chips, for 1.4 million yuan. By comparison, servers that used eight of the H800 chips were sold at around 2 million yuan when they were launched a year ago." Small batches of H20 products are expected to reach important clients within the first quarter of 2024, followed by a wider release in Q2. It is believed that mass production will begin around Spring time.

Manufacturers Anticipate Completion of NVIDIA's HBM3e Verification by 1Q24; HBM4 Expected to Launch in 2026

TrendForce's latest research into the HBM market indicates that NVIDIA plans to diversify its HBM suppliers for more robust and efficient supply chain management. Samsung's HBM3 (24 GB) is anticipated to complete verification with NVIDIA by December this year. The progress of HBM3e, as outlined in the timeline below, shows that Micron provided its 8hi (24 GB) samples to NVIDIA by the end of July, SK hynix in mid-August, and Samsung in early October.

Given the intricacy of the HBM verification process—estimated to take two quarters—TrendForce expects that some manufacturers might learn preliminary HBM3e results by the end of 2023. However, it's generally anticipated that major manufacturers will have definite results by 1Q24. Notably, the outcomes will influence NVIDIA's procurement decisions for 2024, as final evaluations are still underway.

NVIDIA Experiences Strong Cloud AI Demand but Faces Challenges in China, with High-End AI Server Shipments Expected to Be Below 4% in 2024

NVIDIA's most recent FY3Q24 financial reports reveal record-high revenue coming from its data center segment, driven by escalating demand for AI servers from major North American CSPs. However, TrendForce points out that recent US government sanctions targeting China have impacted NVIDIA's business in the region. Despite strong shipments of NVIDIA's high-end GPUs—and the rapid introduction of compliant products such as the H20, L20, and L2—Chinese cloud operators are still in the testing phase, making substantial revenue contributions to NVIDIA unlikely in Q4. Gradual shipments increases are expected from the first quarter of 2024.

The US ban continues to influence China's foundry market as Chinese CSPs' high-end AI server shipments potentially drop below 4% next year
TrendForce reports that North American CSPs like Microsoft, Google, and AWS will remain key drivers of high-end AI servers (including those with NVIDIA, AMD, or other high-end ASIC chips) from 2023 to 2024. Their estimated shipments are expected to be 24%, 18.6%, and 16.3%, respectively, for 2024. Chinese CSPs such as ByteDance, Baidu, Alibaba, and Tencent (BBAT) are projected to have a combined shipment share of approximately 6.3% in 2023. However, this could decrease to less than 4% in 2024, considering the current and potential future impacts of the ban.

U.S. Restricts Exports of NVIDIA GeForce RTX 4090 to China

The GeForce RTX 4090 gaming graphics card, both as an NVIDIA first-party Founders Edition, and custom-design by AIC partners, undergoes assembly in China. A new U.S. Government trade regulation restricts NVIDIA from selling it in the Chinese domestic market. The enthusiast-segment graphics card joins several other high performance AI processors, such as the "Hopper" H800, and "Ampere" A800. If you recall, the H800 and A800 are special China-specific variants of the H100 and A100, respectively, which come with performance reductions at the hardware-level, to fly below the AI processor performance limits set by the U.S. Government. The only reasons we can think of why these chips are on the list is if end-users in China have figured out ways around these performance limiters, or are buying in greater scale to achieve the desired performance. The fresh trade embargo released on October 17 covers the A100, A800, H100, H800, L40, L40S, and RTX 4090.

New AI Accelerator Chips Boost HBM3 and HBM3e to Dominate 2024 Market

TrendForce reports that the HBM (High Bandwidth Memory) market's dominant product for 2023 is HBM2e, employed by the NVIDIA A100/A800, AMD MI200, and most CSPs' (Cloud Service Providers) self-developed accelerator chips. As the demand for AI accelerator chips evolves, manufacturers plan to introduce new HBM3e products in 2024, with HBM3 and HBM3e expected to become mainstream in the market next year.

The distinctions between HBM generations primarily lie in their speed. The industry experienced a proliferation of confusing names when transitioning to the HBM3 generation. TrendForce clarifies that the so-called HBM3 in the current market should be subdivided into two categories based on speed. One category includes HBM3 running at speeds between 5.6 to 6.4 Gbps, while the other features the 8 Gbps HBM3e, which also goes by several names including HBM3P, HBM3A, HBM3+, and HBM3 Gen2.

Report Suggests NVIDIA Prioritizing H800 GPU Production For Chinese AI Market

NVIDIA could be adjusting its enterprise-grade GPU production strategies for the Chinese market, according to an article published by MyDriver—despite major sanctions placed on semiconductor imports, Team Green is doing plenty of business with tech firms operating in the region thanks to an uptick in AI-related activities. NVIDIA offers two market specific accelerator models that have been cut down to conform to rules and regulations—the more powerful and expensive (250K RMB/~$35K) H800 is an adaptation of the western H100 GPU, while the A800 is a legal market alternative to the older A100.

The report proposes that NVIDIA is considering plans to reduce factory output of the A800 (sold for 100K RMB/~$14K per unit), so clients will be semi-forced into purchasing the higher-end H800 model instead (if they require a significant number of GPUs). The A800 seems to be the more popular choice for the majority of companies at the moment, with the heavy hitters—Alibaba, Baidu, Tencent, Jitwei and ByteDance—flexing their spending muscles and splurging on mixed shipments of the two accelerators. By limiting supplies of the lesser A800, Team Green could be generating more profit by prioritizing the more expensive (and readily available) model.

Chinese Tech Firms Buying Plenty of NVIDIA Enterprise GPUs

TikTok developer ByteDance, and other major Chinese tech firms including Tencent, Alibaba and Baidu are reported (by local media) to be snapping up lots of NVIDIA HPC GPUs, with even more orders placed this year. ByteDance is alleged to have spent enough on new products in 2023 to match the expenditure of the entire Chinese tech market on similar NVIDIA purchases for FY2022. According to news publication Jitwei, ByteDance has placed orders totaling $1 billion so far this year with Team Green—the report suggests that a mix of A100 and H800 GPU shipments have been sent to the company's mainland data centers.

The older Ampere-based A100 units were likely ordered prior to trade sanctions enforced on China post-August 2022, with further wiggle room allowed—meaning that shipments continued until September. The H800 GPU is a cut-down variant of 2022's flagship "Hopper" H100 model, designed specifically for the Chinese enterprise market—with reduced performance in order to meet export restriction standards. The H800 costs around $10,000 (average sale price per accelerator) according to Tom's Hardware, so it must offer some level of potency at that price. ByteDance has ordered roughly 100,000 units—with an unspecified split between H800 and A100 stock. Despite the development of competing HPC products within China, it seems that the nation's top-flight technology companies are heading directly to NVIDIA to acquire the best-of-the-best and highly mature AI processing hardware.

NVIDIA A800 China-Tailored GPU Performance within 70% of A100

The recent growth in demand for training Large Language Models (LLMs) like Generative Pre-trained Transformer (GPT) has sparked the interest of many companies to invest in GPU solutions that are used to train these models. However, countries like China have struggled with US sanctions, and NVIDIA has to create custom models that meet US export regulations. Carrying two GPUs, H800 and A800, they represent cut-down versions of the original H100 and A100, respectively. We reported about H800; however, it remained as mysterious as A800 that we are talking about today. Thanks to MyDrivers, we have information that the A800 GPU performance is within 70% of the regular A100.

The regular A100 GPU manages 9.7 TeraFLOPs of FP64, 19.5 TeraFLOPS of FP64 Tensor, and up to 624 BF16/FP16 TeraFLOPS with sparsity. A rough napkin math would suggest that 70% performance of the original (a 30% cut) would equal 6.8 TeraFLOPs of FP64 precision, 13.7 TeraFLOPs of FP64 Tensor, and 437 BF16/FP16 TeraFLOPs with sparsity. MyDrivers notes that A800 can be had for 100,000 Yuan, translating to about 14,462 USD at the time of writing. This is not the most capable GPU that Chinese companies can acquire, as H800 exists. However, we don't have any information about its performance for now.

NVIDIA Prepares H800 Adaptation of H100 GPU for the Chinese Market

NVIDIA's H100 accelerator is one of the most powerful solutions for powering AI workloads. And, of course, every company and government wants to use it to power its AI workload. However, in countries like China, shipment of US-made goods is challenging. With export regulations in place, NVIDIA had to get creative and make a specific version of its H100 GPU for the Chinese market, labeled the H800 model. Late last year, NVIDIA also created a China-specific version of the A100 model called A800, with the only difference being the chip-to-chip interconnect bandwidth being dropped from 600 GB/s to 400 GB/s.

This year's H800 SKU also features similar restrictions, and the company appears to have made similar sacrifices for shipping its chips to China. From the 600 GB/s bandwidth of the regular H100 PCIe model, the H800 is gutted to only 300 GB/s of bi-directional chip-to-chip interconnect bandwidth speed. While we have no data if the CUDA or Tensor core count has been adjusted, the sacrifice of bandwidth to comply with export regulations will have consequences. As the communication speed is reduced, training large models will increase the latency and slow the workload compared to the regular H100 chip. This is due to the massive data size that needs to travel from one chip to another. According to Reuters, an NVIDIA spokesperson declined to discuss other differences, stating that "our 800 series products are fully compliant with export control regulations."
Return to Keyword Browsing
Dec 21st, 2024 23:23 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts