News Posts matching #H100

Return to Keyword Browsing

US Bans Export of NVIDIA H20 Accelerators to China, a Potential $5.5 Billion Loss for NVIDIA

President Trump's administration has announced that NVIDIA's H20 AI chip will require a special export license for any shipment to China, Hong Kong, or Macau for the indefinite future. The Commerce Department delivered the news to NVIDIA on April 14, 2025, citing worries that the H20 could be redirected into Chinese supercomputers with potential military applications. NVIDIA designed the H20 specifically to comply with earlier US curbs by scaling back performance from its flagship H100 model. The H20 features 96 GB of HBM3 memory running at up to 4.0 TB/s, delivers roughly 296 TeraFLOPS of mixed‑precision compute power, and offers a performance density of about 2.9 TeraFLOPS per die. Its single‑precision (FP32) throughput is around 74 TeraFLOPS, with FP16 performance reaching approximately 148 TeraFLOPS. In a regulatory filing on April 15, NVIDIA warned that it will record about $5.5 billion in writedowns this quarter related to H20 inventory and purchase commitments now blocked by the license requirement.

Shares of NVIDIA fell roughly 6 percent in after‑hours trading on April 15, triggering a wider sell‑off in semiconductor stocks from the US to Japan. South Korea's Samsung and SK Hynix each slid about 3 percent, while AMD also dropped on concerns about broader chip‑export curbs. Analysts at Bloomberg Intelligence project that, if the restrictions persist, NVIDIA's China‑related data center revenue could shrink to low‑ or mid‑single digits as a percentage of total sales, down from roughly 13 percent in fiscal 2024. Chinese AI players such as Huawei stand to gain as customers seek alternative inference accelerators. Commerce Secretary Howard Lutnick has pledged to maintain a tough stance on chip exports to China even as NVIDIA commits up to $500 billion in US AI infrastructure investments over the next four years. Everyone is now watching closely to see whether any H20 export licenses are approved and how long the ban might remain in place.

Meta's Llama 4 Can Process 10 Million Tokens as Input, Lives in Native Multimodality

Meta has prepared a leap-forward update for its Llama model series with the v4 release, entering an era of native multimodality within the company's AI models. At the forefront is Llama 4 Scout, a model boasting 17 billion active parameters distributed across 16 experts in a mixture-of-experts (MoE) configuration. With FP4 precision, this model is engineered to run entirely on a single NVIDIA H100 GPU. Scout now supports an industry-leading input context window of up to 10 million tokens, a substantial leap from previous limits like Google's old Gemini 1.5 Pro, which came with 2 million token input content. Llama 4 Scout is built using a hybrid dense and MoE architecture, which selectively activates only a subset of each token's total parameters, optimizing training and inference efficiency. This architecture not only accelerates computation but also reduces associated costs.

Meanwhile, Llama 4 Maverick, another model in the series, also features 17 billion active parameters but incorporates 128 experts, scaling to 400 billion total parameters. Maverick has demonstrated superior performance in coding, image understanding, multilingual processing, and logical reasoning, even outperforming several leading models in its class. Both models embrace native multimodality by integrating text and image data early in the processing pipeline. Utilizing a custom MetaCLIP-based vision encoder, these models can simultaneously process multiple images and text, combining tokens into a single backend processor. This ensures robust visual comprehension and precise object anchoring, powering applications such as detailed image description, visual question-answering, and analysis of temporal image sequences.

MangoBoost Achieves Record-Breaking MLPerf Inference v5.0 Results with AMD Instinct MI300X

MangoBoost, a provider of cutting-edge system solutions designed to maximize AI data center efficiency, has set a new industry benchmark with its latest MLPerf Inference v5.0 submission. The company's Mango LLMBoost AI Enterprise MLOps software has demonstrated unparalleled performance on AMD Instinct MI300X GPUs, delivering the highest-ever recorded results for Llama2-70B in the offline inference category. This milestone marks the first-ever multi-node MLPerf inference result on AMD Instinct MI300X GPUs. By harnessing the power of 32 MI300X GPUs across four server nodes, Mango LLMBoost has surpassed all previous MLPerf inference results, including those from competitors using NVIDIA H100 GPUs.

Unmatched Performance and Cost Efficiency
MangoBoost's MLPerf submission demonstrates a 24% performance advantage over the best-published MLPerf result from Juniper Networks utilizing 32 NVIDIA H100 GPUs. Mango LLMBoost achieved 103,182 tokens per second (TPS) in the offline scenario and 93,039 TPS in the server scenario on AMD MI300X GPUs, outperforming the previous best result of 82,749 TPS on NVIDIA H100 GPUs. In addition to superior performance, Mango LLMBoost + MI300X offers significant cost advantages. With AMD MI300X GPUs priced between $15,000 and $17,000—compared to the $32,000-$40,000 cost of NVIDIA H100 GPUs (source: Tom's Hardware—H100 vs. MI300X Pricing)—Mango LLMBoost delivers up to 62% cost savings while maintaining industry-leading inference throughput.

Supermicro Expands Enterprise AI Portfolio With Support for Upcoming NVIDIA RTX PRO 6000 Blackwell Server Edition and NVIDIA H200 NVL Platform

Supermicro, Inc., a Total IT Solution Provider for AI/ML, HPC, Cloud, Storage, and 5G/Edge, today announced support for the new NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs on a range of workload-optimized GPU servers and workstations. Specifically optimized for the NVIDIA Blackwell generation of PCIe GPUs, the broad range of Supermicro servers will enable more enterprises to leverage accelerated computing for LLM-inference and fine-tuning, agentic AI, visualization, graphics & rendering, and virtualization. Many Supermicro GPU-optimized systems are NVIDIA Certified, guaranteeing compatibility and support for NVIDIA AI Enterprise to simplify the process of developing and deploying production AI.

"Supermicro leads the industry with its broad portfolio of application optimized GPU servers that can be deployed in a wide range of enterprise environments with very short lead times," said Charles Liang, president and CEO of Supermicro. "Our support for the NVIDIA RTX PRO 6000 Blackwell Server Edition GPU adds yet another dimension of performance and flexibility for customers looking to deploy the latest in accelerated computing capabilities from the data center to the intelligent edge. Supermicro's broad range of PCIe GPU-optimized products also support NVIDIA H200 NVL in 2-way and 4-way NVIDIA NVLink configurations to maximize inference performance for today's state-of-the-art AI models, as well as accelerating HPC workloads."

Global Top 10 IC Design Houses See 49% YoY Growth in 2024, NVIDIA Commands Half the Market

TrendForce reveals that the combined revenue of the world's top 10 IC design houses reached approximately US$249.8 billion in 2024, marking a 49% YoY increase. The booming AI industry has fueled growth across the semiconductor sector, with NVIDIA leading the charge, posting an astonishing 125% revenue growth, widening its lead over competitors, and solidifying its dominance in the IC industry.

Looking ahead to 2025, advancements in semiconductor manufacturing will further enhance AI computing power, with LLMs continuing to emerge. Open-source models like DeepSeek could lower AI adoption costs, accelerating AI penetration from servers to personal devices. This shift positions edge AI devices as the next major growth driver for the semiconductor industry.

NVIDIA to Consume 77% of Silicon Wafers Dedicated to AI Accelerators in 2025

Investment bank Morgan Stanley has estimated that an astonishing 77% of all globally produced silicon wafers dedicated to AI accelerators will be consumed by none other than NVIDIA. Often, investment research by large investment banks like Morgan Stanley includes information from the semiconductor supply chain, which is constantly expanding to meet NVIDIA's demands. When looking at wafer volume for AI accelerators, it is estimated that in 2024, NVIDIA captured nearly 51% of wafer consumption for its chips, more than half of all demand. With NVIDIA's volume projected to grow to 77%, this represents more than a 50% year-over-year increase, which is incredible for a company of NVIDIA's size. Right now, NVIDIA is phasing out its H100 accelerators in favor of Blackwell 100/200 and the upcoming 300 series of GPUs paired with Grace CPUs.

NVIDIA is accelerating its product deployment timeline and investing a lot in its internal research and development. Morgan Stanley also projects that NVIDIA will invest almost $16 billion in its R&D budget, enough to endure four to five years of development cycles running three design teams sequentially and still delivering new products on an 18-24 month cadence. The scale of this efficiency and development rivals everyone else in the industry. With all this praise, NVIDIA's Q4 revenue report is coming in exactly a week on February 26, so we have to see what its CEO, Jensen Huang, will deliver and show some estimates for the coming months.

US Investigates Possible "Singapore" Loophole in China's Access to NVIDIA GPUs

Today, Bloomberg reported that the US government under Trump administration is probing whether Chinese AI company DeepSeek circumvented export restrictions to acquire advanced NVIDIA GPUs through Singaporean intermediaries. The investigation follows concerns that DeepSeek's AI model, R1—reportedly rivaling leading systems from OpenAI and Google—may have been trained using restricted hardware that is blocked from exporting to China. Singapore's role in NVIDIA's global sales has surged, with the nation accounting for 22% of the chipmaker's revenue in Q3 FY2025, up from 9% in Q3 FY2023. This spike coincides with tightened US export controls on AI chips to China, prompting speculation that Singapore serves as a pipe for Chinese firms to access high-end GPUs like the H100, which cannot be sold directly to China.

DeepSeek has not disclosed hardware details for R1 but revealed its earlier V3 model was trained using 2,048 H800 GPUs (2.8 million GPU hours), achieving efficiency surpassing Meta's Llama 3, which required 30.8 million GPU hours. Analysts suggest R1's performance implies even more powerful infrastructure, potentially involving restricted chips. US authorities, including the White House and FBI, are examining whether third parties in Singapore facilitated the transfer of controlled GPUs to DeepSeek. A well-known semiconductor analyst firm, SemiAnalysis, believes that DeepSeek acquired around 50,000 NVIDIA Hopper GPUs, which includes a mix of H100, H800, and H20. NVIDIA clarified that its reported Singapore revenue reflects "bill to" customer locations, not final destinations, stating most products are routed to the US or Western markets.

MediaTek Adopts AI-Driven Cadence Virtuoso Studio and Spectre Simulation on NVIDIA Accelerated Computing Platform for 2nm Designs

Cadence today announced that MediaTek has adopted the AI-driven Cadence Virtuoso Studio and Spectre X Simulator on the NVIDIA accelerated computing platform for its 2 nm development. As design size and complexity continue to escalate, advanced-node technology development has become increasingly challenging for SoC providers. To meet the aggressive performance and turnaround time (TAT) requirements for its 2 nm high-speed analog IP, MediaTek is leveraging Cadence's proven custom/analog design solutions, enhanced by AI, to achieve a 30% productivity gain.

"As MediaTek continues to push technology boundaries for 2 nm development, we need a trusted design solution with strong AI-powered tools to achieve our goals," said Ching San Wu, corporate vice president at MediaTek. "Closely collaborating with Cadence, we have adopted the Cadence Virtuoso Studio and Spectre X Simulator, which deliver the performance and accuracy necessary to achieve our tight design turnaround time requirements. Cadence's comprehensive automation features enhance our throughput and efficiency, enabling our designers to be 30% more productive."

Supermicro Empowers AI-driven Capabilities for Enterprise, Retail, and Edge Server Solutions

Supermicro, Inc. (SMCI), a Total IT Solution Provider for AI/ML, HPC, Cloud, Storage, and 5G/Edge, is showcasing the latest solutions for the retail industry in collaboration with NVIDIA at the National Retail Federation (NRF) annual show. As generative AI (GenAI) grows in capability and becomes more easily accessible, retailers are leveraging NVIDIA NIM microservices, part of the NVIDIA AI Enterprise software platform, for a broad spectrum of applications.

"Supermicro's innovative server, storage, and edge computing solutions improve retail operations, store security, and operational efficiency," said Charles Liang, president and CEO of Supermicro. "At NRF, Supermicro is excited to introduce retailers to AI's transformative potential and to revolutionize the customer's experience. Our systems here will help resolve day-to-day concerns and elevate the overall buying experience."

Gigabyte Demonstrates Omni-AI Capabilities at CES 2025

GIGABYTE Technology, internationally renowned for its R&D capabilities and a leading innovator in server and data center solutions, continues to lead technological innovation during this critical period of AI and computing advancement. With its comprehensive AI product portfolio, GIGABYTE will showcase its complete range of AI computing solutions at CES 2025, from data center infrastructure to IoT applications and personal computing, demonstrating how its extensive product line enables digital transformation across all sectors in this AI-driven era.

Powering AI from the Cloud
With AI Large Language Models (LLMs) now routinely featuring parameters in the hundreds of billions to trillions, robust training environments (data centers) have become a critical requirement in the AI race. GIGABYTE offers three distinctive solutions for AI infrastructure.

AMD's Pain Point is ROCm Software, NVIDIA's CUDA Software is Still Superior for AI Development: Report

The battle of AI acceleration in the data center is, as most readers are aware, insanely competitive, with NVIDIA offering a top-tier software stack. However, AMD has tried in recent years to capture a part of the revenue that hyperscalers and OEMs are willing to spend with its Instinct MI300X accelerator lineup for AI and HPC. Despite having decent hardware, the company is not close to bridging the gap software-wise with its competitor, NVIDIA. According to the latest report from SemiAnalysis, a research and consultancy firm, they have run a five-month experiment using Instinct MI300X for training and benchmark runs. And the findings were surprising: even with better hardware, AMD's software stack, including ROCm, has massively degraded AMD's performance.

"When comparing NVIDIA's GPUs to AMD's MI300X, we found that the potential on paper advantage of the MI300X was not realized due to a lack within AMD public release software stack and the lack of testing from AMD," noted SemiAnalysis, breaking down arguments in the report further, adding that "AMD's software experience is riddled with bugs rendering out of the box training with AMD is impossible. We were hopeful that AMD could emerge as a strong competitor to NVIDIA in training workloads, but, as of today, this is unfortunately not the case. The CUDA moat has yet to be crossed by AMD due to AMD's weaker-than-expected software Quality Assurance (QA) culture and its challenging out-of-the-box experience."

Microsoft Acquired Nearly 500,000 NVIDIA "Hopper" GPUs This Year

Microsoft is heavily investing in enabling its company and cloud infrastructure to support the massive AI expansion. The Redmond giant has acquired nearly half a million of the NVIDIA "Hopper" family of GPUs to support this effort. According to market research company Omdia, Microsoft was the biggest hyperscaler, with data center CapEx and GPU expenditure reaching a record high. The company acquired precisely 485,000 NVIDIA "Hopper" GPUs, including H100, H200, and H20, resulting in more than $30 billion spent on servers alone. To put things into perspective, this is about double that of the next-biggest GPU purchaser, Chinese ByteDance, who acquired about 230,000 sanction-abiding H800 GPUs and regular H100s sources from third parties.

Regarding US-based companies, the only ones that have come close to the GPU acquisition rate are Meta, Tesla/xAI, Amazon, and Google. They have acquired around 200,000 GPUs on average while significantly boosting their in-house chip design efforts. "NVIDIA GPUs claimed a tremendously high share of the server capex," Vlad Galabov, director of cloud and data center research at Omdia, noted, adding, "We're close to the peak." Hyperscalers like Amazon, Google, and Meta have been working on their custom solutions for AI training and inference. For example, Google has its TPU, Amazon has its Trainium and Inferentia chips, and Meta has its MTIA. Hyperscalers are eager to develop their in-house solutions, but NVIDIA's grip on the software stack paired with timely product updates seems hard to break. The latest "Blackwell" chips are projected to get even bigger orders, so only the sky (and the local power plant) is the limit.

NVIDIA Announces Hopper H200 NVL PCIe GPU Availability at SC24, Promising 1.3x HPC Performance Over H100 NVL

Since its introduction, the NVIDIA Hopper architecture has transformed the AI and high-performance computing (HPC) landscape, helping enterprises, researchers and developers tackle the world's most complex challenges with higher performance and greater energy efficiency. During the Supercomputing 2024 conference, NVIDIA announced the availability of the NVIDIA H200 NVL PCIe GPU - the latest addition to the Hopper family. H200 NVL is ideal for organizations with data centers looking for lower-power, air-cooled enterprise rack designs with flexible configurations to deliver acceleration for every AI and HPC workload, regardless of size.

According to a recent survey, roughly 70% of enterprise racks are 20kW and below and use air cooling. This makes PCIe GPUs essential, as they provide granularity of node deployment, whether using one, two, four or eight GPUs - enabling data centers to pack more computing power into smaller spaces. Companies can then use their existing racks and select the number of GPUs that best suits their needs. Enterprises can use H200 NVL to accelerate AI and HPC applications, while also improving energy efficiency through reduced power consumption. With a 1.5x memory increase and 1.2x bandwidth increase over NVIDIA H100 NVL, companies can use H200 NVL to fine-tune LLMs within a few hours and deliver up to 1.7x faster inference performance. For HPC workloads, performance is boosted up to 1.3x over H100 NVL and 2.5x over the NVIDIA Ampere architecture generation.

Cisco Unveils Plug-and-Play AI Solutions Powered by NVIDIA H100 and H200 Tensor Core GPUs

Today, Cisco announced new additions to its data center infrastructure portfolio: an AI server family purpose-built for GPU-intensive AI workloads with NVIDIA accelerated computing, and AI PODs to simplify and de-risk AI infrastructure investment. They give organizations an adaptable and scalable path to AI, supported by Cisco's industry-leading networking capabilities.

"Enterprise customers are under pressure to deploy AI workloads, especially as we move toward agentic workflows and AI begins solving problems on its own," said Jeetu Patel, Chief Product Officer, Cisco. "Cisco innovations like AI PODs and the GPU server strengthen the security, compliance, and processing power of those workloads as customers navigate their AI journeys from inferencing to training."

Intel Won't Compete Against NVIDIA's High-End AI Dominance Soon, Starts Laying Off Over 2,200 Workers Across US

Intel's taking a different path with its Gaudi 3 accelerator chips. It's staying away from the high-demand market for training big AI models, which has made NVIDIA so successful. Instead, Intel wants to help businesses that need cheaper AI solutions to train and run smaller specific models and open-source options. At a recent event, Intel talked up Gaudi 3's "price performance advantage" over NVIDIA's H100 GPU for inference tasks. Intel says Gaudi 3 is faster and more cost-effective than the H100 when running Llama 3 and Llama 2 models of different sizes.

Intel also claims that Gaudi 3 is as power-efficient as the H100 for large language model (LLM) inference with small token outputs and does even better with larger outputs. The company even suggests Gaudi 3 beats NVIDIA's newer H200 in LLM inference throughput for large token outputs. However, Gaudi 3 doesn't match up to the H100 in overall floating-point operation throughput for 16-bit and 8-bit formats. For bfloat16 and 8-bit floating-point precision matrix math, Gaudi 3 hits 1,835 TFLOPS in each format, while the H100 reaches 1,979 TFLOPS for BF16 and 3,958 TFLOPS for FP8.

NVIDIA cuLitho Computational Lithography Platform is Moving to Production at TSMC

TSMC, the world leader in semiconductor manufacturing, is moving to production with NVIDIA's computational lithography platform, called cuLitho, to accelerate manufacturing and push the limits of physics for the next generation of advanced semiconductor chips. A critical step in the manufacture of computer chips, computational lithography is involved in the transfer of circuitry onto silicon. It requires complex computation - involving electromagnetic physics, photochemistry, computational geometry, iterative optimization and distributed computing. A typical foundry dedicates massive data centers for this computation, and yet this step has traditionally been a bottleneck in bringing new technology nodes and computer architectures to market.

Computational lithography is also the most compute-intensive workload in the entire semiconductor design and manufacturing process. It consumes tens of billions of hours per year on CPUs in the leading-edge foundries. A typical mask set for a chip can take 30 million or more hours of CPU compute time, necessitating large data centers within semiconductor foundries. With accelerated computing, 350 NVIDIA H100 Tensor Core GPU-based systems can now replace 40,000 CPU systems, accelerating production time, while reducing costs, space and power.

Huawei Starts Shipping "Ascend 910C" AI Accelerator Samples to Large NVIDIA Customers

Huawei has reportedly started shipping its Ascend 910C accelerator—the company's domestic alternative to NVIDIA's H100 accelerator for AI training and inference. As the report from China South Morning Post notes, Huawei is shipping samples of its accelerator to large NVIDIA customers. This includes companies like Alibaba, Baidu, and Tencent, which have ordered massive amounts of NVIDIA accelerators. However, Huawei is on track to deliver 70,000 chips, potentially worth $2 billion. With NVIDIA working on a B20 accelerator SKU that complies with US government export regulations, the Huawei Ascend 910C accelerator could potentially outperform NVIDIA's B20 processor, per some analyst expectations.

If the Ascend 910C receives positive results from Chinese tech giants, it could be the start of Huawei's expansion into data center accelerators, once hindered by the company's ability to manufacture advanced chips. Now, with foundries like SMIC printing 7 nm designs and possibly 5 nm coming soon, Huawei will leverage this technology to satisfy the domestic demand for more AI processing power. Competing on a global scale, though, remains a challenge. Companies like NVIDIA, AMD, and Intel have access to advanced nodes, which gives their AI accelerators more efficiency and performance.

Bang & Olufsen Presents the Beoplay H100 Headphones

Bang & Olufsen today announced its new flagship headphones Beoplay H100, reaching back into almost a century of history to redefine the listening future. Building on the success of Beoplay H95, Bang & Olufsen's most successful headphones to date, Beoplay H100 is improved in every discipline: unrivalled high-quality sound, outstanding digital noise cancellation, a new modular construction and a beautiful design that builds on Bang & Olufsen's design icons.

"Beoplay H100 elevates what we have accomplished over the past ten decades and defines our future: an era where beautiful sound is built to last. It represents the true potential of what a Bang & Olufsen audio wearable can be, and we cannot wait to bring the headphones to our customers," says Bang & Olufsen CEO Kristian Teär and continues: "Drawing inspiration from the unparalleled performance of our Beolab speakers, our iconic designs from the past and the modular construction that embraces material excellence and circularity, Beoplay H100 truly embodies our design and innovation capabilities."

NVIDIA Blackwell Sets New Standard for Generative AI in MLPerf Inference Benchmark

As enterprises race to adopt generative AI and bring new services to market, the demands on data center infrastructure have never been greater. Training large language models is one challenge, but delivering LLM-powered real-time services is another. In the latest round of MLPerf industry benchmarks, Inference v4.1, NVIDIA platforms delivered leading performance across all data center tests. The first-ever submission of the upcoming NVIDIA Blackwell platform revealed up to 4x more performance than the NVIDIA H100 Tensor Core GPU on MLPerf's biggest LLM workload, Llama 2 70B, thanks to its use of a second-generation Transformer Engine and FP4 Tensor Cores.

The NVIDIA H200 Tensor Core GPU delivered outstanding results on every benchmark in the data center category - including the latest addition to the benchmark, the Mixtral 8x7B mixture of experts (MoE) LLM, which features a total of 46.7 billion parameters, with 12.9 billion parameters active per token. MoE models have gained popularity as a way to bring more versatility to LLM deployments, as they're capable of answering a wide variety of questions and performing more diverse tasks in a single deployment. They're also more efficient since they only activate a few experts per inference - meaning they deliver results much faster than dense models of a similar size.

Huawei Reportedly Developing New Ascend 910C AI Chip to Rival NVIDIA's H100 GPU

Amidst escalating tensions in the U.S.-China semiconductor industry, Huawei is reportedly working on a new AI chip called the Ascend 910C. This development appears to be the Chinese tech giant's attempt to compete with NVIDIA's AI processors in the Chinese market. According to a Wall Street Journal report, Huawei has begun testing the Ascend 910C with various Chinese internet and telecom companies to evaluate its performance and capabilities. Notable firms such as ByteDance, Baidu, and China Mobile are said to have received samples of the chip.

Huawei has reportedly informed its clients that the Ascend 910C can match the performance of NVIDIA's H100 chip. The company has been conducting tests for several weeks, suggesting that the new processor is nearing completion. The Wall Street Journal indicates that Huawei could start shipping the chip as early as October 2024. The report also mentions that Huawei and potential customers have discussed orders for over 70,000 chips, potentially worth $2 billion.

NVIDIA's New B200A Targets OEM Customers; High-End GPU Shipments Expected to Grow 55% in 2025

Despite recent rumors speculating on NVIDIA's supposed cancellation of the B100 in favor of the B200A, TrendForce reports that NVIDIA is still on track to launch both the B100 and B200 in the 2H24 as it aims to target CSP customers. Additionally, a scaled-down B200A is planned for other enterprise clients, focusing on edge AI applications.

TrendForce reports that NVIDIA will prioritize the B100 and B200 for CSP customers with higher demand due to the tight production capacity of CoWoS-L. Shipments are expected to commence after 3Q24. In light of yield and mass production challenges with CoWoS-L, NVIDIA is also planning the B200A for other enterprise clients, utilizing CoWoS-S packaging technology.

Apple Trained its Apple Intelligence Models on Google TPUs, Not NVIDIA GPUs

Apple has disclosed that its newly announced Apple Intelligence features were developed using Google's Tensor Processing Units (TPUs) rather than NVIDIA's widely adopted hardware accelerators like H100. This unexpected choice was detailed in an official Apple research paper, shedding light on the company's approach to AI development. The paper outlines how systems equipped with Google's TPUv4 and TPUv5 chips played a crucial role in creating Apple Foundation Models (AFMs). These models, including AFM-server and AFM-on-device, are designed to power both online and offline Apple Intelligence features introduced at WWDC 2024. For the training of the 6.4 billion parameter AFM-server, Apple's largest language model, the company utilized an impressive array of 8,192 TPUv4 chips, provisioned as 8×1024 chip slices. The training process involved a three-stage approach, processing a total of 7.4 trillion tokens. Meanwhile, the more compact 3 billion parameter AFM-on-device model, optimized for on-device processing, was trained using 2,048 TPUv5p chips.

Apple's training data came from various sources, including the Applebot web crawler and licensed high-quality datasets. The company also incorporated carefully selected code, math, and public datasets to enhance the models' capabilities. Benchmark results shared in the paper suggest that both AFM-server and AFM-on-device excel in areas such as Instruction Following, Tool Use, and Writing, positioning Apple as a strong contender in the AI race despite its relatively late entry. However, Apple's penetration tactic into the AI market is much more complex than any other AI competitor. Given Apple's massive user base and millions of devices compatible with Apple Intelligence, the AFM has the potential to change user interaction with devices for good, especially for everyday tasks. Hence, refining AI models for these tasks is critical before massive deployment. Another unexpected feature is transparency from Apple, a company typically known for its secrecy. The AI boom is changing some of Apple's ways, and revealing these inner workings is always interesting.

Global AI Server Demand Surge Expected to Drive 2024 Market Value to US$187 Billion; Represents 65% of Server Market

TrendForce's latest industry report on AI servers reveals that high demand for advanced AI servers from major CSPs and brand clients is expected to continue in 2024. Meanwhile, TSMC, SK hynix, Samsung, and Micron's gradual production expansion has significantly eased shortages in 2Q24. Consequently, the lead time for NVIDIA's flagship H100 solution has decreased from the previous 40-50 weeks to less than 16 weeks.

TrendForce estimates that AI server shipments in the second quarter will increase by nearly 20% QoQ, and has revised the annual shipment forecast up to 1.67 million units—marking a 41.5% YoY growth.

AI Startup Etched Unveils Transformer ASIC Claiming 20x Speed-up Over NVIDIA H100

A new startup emerged out of stealth mode today to power the next generation of generative AI. Etched is a company that makes an application-specific integrated circuit (ASIC) to process "Transformers." The transformer is an architecture for designing deep learning models developed by Google and is now the powerhouse behind models like OpenAI's GPT-4o in ChatGPT, Anthropic Claude, Google Gemini, and Meta's Llama family. Etched wanted to create an ASIC for processing only the transformer models, making a chip called Sohu. The claim is Sohu outperforms NVIDIA's latest and greatest by an entire order of magnitude. Where a server configuration with eight NVIDIA H100 GPU clusters pushes Llama-3 70B models at 25,000 tokens per second, and the latest eight B200 "Blackwell" GPU cluster pushes 43,000 tokens/s, the eight Sohu clusters manage to output 500,000 tokens per second.

Why is this important? Not only does the ASIC outperform Hopper by 20x and Blackwell by 10x, but it also serves so many tokens per second that it enables an entirely new fleet of AI applications requiring real-time output. The Sohu architecture is so efficient that 90% of the FLOPS can be used, while traditional GPUs boast a 30-40% FLOP utilization rate. This translates into inefficiency and waste of power, which Etched hopes to solve by building an accelerator dedicated to power transformers (the "T" in GPT) at massive scales. Given that the frontier model development costs more than one billion US dollars, and hardware costs are measured in tens of billions of US Dollars, having an accelerator dedicated to powering a specific application can help advance AI faster. AI researchers often say that "scale is all you need" (resembling the legendary "attention is all you need" paper), and Etched wants to build on that.

NVIDIA MLPerf Training Results Showcase Unprecedented Performance and Elasticity

The full-stack NVIDIA accelerated computing platform has once again demonstrated exceptional performance in the latest MLPerf Training v4.0 benchmarks. NVIDIA more than tripled the performance on the large language model (LLM) benchmark, based on GPT-3 175B, compared to the record-setting NVIDIA submission made last year. Using an AI supercomputer featuring 11,616 NVIDIA H100 Tensor Core GPUs connected with NVIDIA Quantum-2 InfiniBand networking, NVIDIA achieved this remarkable feat through larger scale - more than triple that of the 3,584 H100 GPU submission a year ago - and extensive full-stack engineering.

Thanks to the scalability of the NVIDIA AI platform, Eos can now train massive AI models like GPT-3 175B even faster, and this great AI performance translates into significant business opportunities. For example, in NVIDIA's recent earnings call, we described how LLM service providers can turn a single dollar invested into seven dollars in just four years running the Llama 3 70B model on NVIDIA HGX H200 servers. This return assumes an LLM service provider serving Llama 3 70B at $0.60/M tokens, with an HGX H200 server throughput of 24,000 tokens/second.
Return to Keyword Browsing
Apr 16th, 2025 23:31 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts