News Posts matching #H100

Return to Keyword Browsing

Microsoft Acquired Nearly 500,000 NVIDIA "Hopper" GPUs This Year

Microsoft is heavily investing in enabling its company and cloud infrastructure to support the massive AI expansion. The Redmond giant has acquired nearly half a million of the NVIDIA "Hopper" family of GPUs to support this effort. According to market research company Omdia, Microsoft was the biggest hyperscaler, with data center CapEx and GPU expenditure reaching a record high. The company acquired precisely 485,000 NVIDIA "Hopper" GPUs, including H100, H200, and H20, resulting in more than $30 billion spent on servers alone. To put things into perspective, this is about double that of the next-biggest GPU purchaser, Chinese ByteDance, who acquired about 230,000 sanction-abiding H800 GPUs and regular H100s sources from third parties.

Regarding US-based companies, the only ones that have come close to the GPU acquisition rate are Meta, Tesla/xAI, Amazon, and Google. They have acquired around 200,000 GPUs on average while significantly boosting their in-house chip design efforts. "NVIDIA GPUs claimed a tremendously high share of the server capex," Vlad Galabov, director of cloud and data center research at Omdia, noted, adding, "We're close to the peak." Hyperscalers like Amazon, Google, and Meta have been working on their custom solutions for AI training and inference. For example, Google has its TPU, Amazon has its Trainium and Inferentia chips, and Meta has its MTIA. Hyperscalers are eager to develop their in-house solutions, but NVIDIA's grip on the software stack paired with timely product updates seems hard to break. The latest "Blackwell" chips are projected to get even bigger orders, so only the sky (and the local power plant) is the limit.

NVIDIA Announces Hopper H200 NVL PCIe GPU Availability at SC24, Promising 1.3x HPC Performance Over H100 NVL

Since its introduction, the NVIDIA Hopper architecture has transformed the AI and high-performance computing (HPC) landscape, helping enterprises, researchers and developers tackle the world's most complex challenges with higher performance and greater energy efficiency. During the Supercomputing 2024 conference, NVIDIA announced the availability of the NVIDIA H200 NVL PCIe GPU - the latest addition to the Hopper family. H200 NVL is ideal for organizations with data centers looking for lower-power, air-cooled enterprise rack designs with flexible configurations to deliver acceleration for every AI and HPC workload, regardless of size.

According to a recent survey, roughly 70% of enterprise racks are 20kW and below and use air cooling. This makes PCIe GPUs essential, as they provide granularity of node deployment, whether using one, two, four or eight GPUs - enabling data centers to pack more computing power into smaller spaces. Companies can then use their existing racks and select the number of GPUs that best suits their needs. Enterprises can use H200 NVL to accelerate AI and HPC applications, while also improving energy efficiency through reduced power consumption. With a 1.5x memory increase and 1.2x bandwidth increase over NVIDIA H100 NVL, companies can use H200 NVL to fine-tune LLMs within a few hours and deliver up to 1.7x faster inference performance. For HPC workloads, performance is boosted up to 1.3x over H100 NVL and 2.5x over the NVIDIA Ampere architecture generation.

Cisco Unveils Plug-and-Play AI Solutions Powered by NVIDIA H100 and H200 Tensor Core GPUs

Today, Cisco announced new additions to its data center infrastructure portfolio: an AI server family purpose-built for GPU-intensive AI workloads with NVIDIA accelerated computing, and AI PODs to simplify and de-risk AI infrastructure investment. They give organizations an adaptable and scalable path to AI, supported by Cisco's industry-leading networking capabilities.

"Enterprise customers are under pressure to deploy AI workloads, especially as we move toward agentic workflows and AI begins solving problems on its own," said Jeetu Patel, Chief Product Officer, Cisco. "Cisco innovations like AI PODs and the GPU server strengthen the security, compliance, and processing power of those workloads as customers navigate their AI journeys from inferencing to training."

Intel Won't Compete Against NVIDIA's High-End AI Dominance Soon, Starts Laying Off Over 2,200 Workers Across US

Intel's taking a different path with its Gaudi 3 accelerator chips. It's staying away from the high-demand market for training big AI models, which has made NVIDIA so successful. Instead, Intel wants to help businesses that need cheaper AI solutions to train and run smaller specific models and open-source options. At a recent event, Intel talked up Gaudi 3's "price performance advantage" over NVIDIA's H100 GPU for inference tasks. Intel says Gaudi 3 is faster and more cost-effective than the H100 when running Llama 3 and Llama 2 models of different sizes.

Intel also claims that Gaudi 3 is as power-efficient as the H100 for large language model (LLM) inference with small token outputs and does even better with larger outputs. The company even suggests Gaudi 3 beats NVIDIA's newer H200 in LLM inference throughput for large token outputs. However, Gaudi 3 doesn't match up to the H100 in overall floating-point operation throughput for 16-bit and 8-bit formats. For bfloat16 and 8-bit floating-point precision matrix math, Gaudi 3 hits 1,835 TFLOPS in each format, while the H100 reaches 1,979 TFLOPS for BF16 and 3,958 TFLOPS for FP8.

NVIDIA cuLitho Computational Lithography Platform is Moving to Production at TSMC

TSMC, the world leader in semiconductor manufacturing, is moving to production with NVIDIA's computational lithography platform, called cuLitho, to accelerate manufacturing and push the limits of physics for the next generation of advanced semiconductor chips. A critical step in the manufacture of computer chips, computational lithography is involved in the transfer of circuitry onto silicon. It requires complex computation - involving electromagnetic physics, photochemistry, computational geometry, iterative optimization and distributed computing. A typical foundry dedicates massive data centers for this computation, and yet this step has traditionally been a bottleneck in bringing new technology nodes and computer architectures to market.

Computational lithography is also the most compute-intensive workload in the entire semiconductor design and manufacturing process. It consumes tens of billions of hours per year on CPUs in the leading-edge foundries. A typical mask set for a chip can take 30 million or more hours of CPU compute time, necessitating large data centers within semiconductor foundries. With accelerated computing, 350 NVIDIA H100 Tensor Core GPU-based systems can now replace 40,000 CPU systems, accelerating production time, while reducing costs, space and power.

Huawei Starts Shipping "Ascend 910C" AI Accelerator Samples to Large NVIDIA Customers

Huawei has reportedly started shipping its Ascend 910C accelerator—the company's domestic alternative to NVIDIA's H100 accelerator for AI training and inference. As the report from China South Morning Post notes, Huawei is shipping samples of its accelerator to large NVIDIA customers. This includes companies like Alibaba, Baidu, and Tencent, which have ordered massive amounts of NVIDIA accelerators. However, Huawei is on track to deliver 70,000 chips, potentially worth $2 billion. With NVIDIA working on a B20 accelerator SKU that complies with US government export regulations, the Huawei Ascend 910C accelerator could potentially outperform NVIDIA's B20 processor, per some analyst expectations.

If the Ascend 910C receives positive results from Chinese tech giants, it could be the start of Huawei's expansion into data center accelerators, once hindered by the company's ability to manufacture advanced chips. Now, with foundries like SMIC printing 7 nm designs and possibly 5 nm coming soon, Huawei will leverage this technology to satisfy the domestic demand for more AI processing power. Competing on a global scale, though, remains a challenge. Companies like NVIDIA, AMD, and Intel have access to advanced nodes, which gives their AI accelerators more efficiency and performance.

Bang & Olufsen Presents the Beoplay H100 Headphones

Bang & Olufsen today announced its new flagship headphones Beoplay H100, reaching back into almost a century of history to redefine the listening future. Building on the success of Beoplay H95, Bang & Olufsen's most successful headphones to date, Beoplay H100 is improved in every discipline: unrivalled high-quality sound, outstanding digital noise cancellation, a new modular construction and a beautiful design that builds on Bang & Olufsen's design icons.

"Beoplay H100 elevates what we have accomplished over the past ten decades and defines our future: an era where beautiful sound is built to last. It represents the true potential of what a Bang & Olufsen audio wearable can be, and we cannot wait to bring the headphones to our customers," says Bang & Olufsen CEO Kristian Teär and continues: "Drawing inspiration from the unparalleled performance of our Beolab speakers, our iconic designs from the past and the modular construction that embraces material excellence and circularity, Beoplay H100 truly embodies our design and innovation capabilities."

NVIDIA Blackwell Sets New Standard for Generative AI in MLPerf Inference Benchmark

As enterprises race to adopt generative AI and bring new services to market, the demands on data center infrastructure have never been greater. Training large language models is one challenge, but delivering LLM-powered real-time services is another. In the latest round of MLPerf industry benchmarks, Inference v4.1, NVIDIA platforms delivered leading performance across all data center tests. The first-ever submission of the upcoming NVIDIA Blackwell platform revealed up to 4x more performance than the NVIDIA H100 Tensor Core GPU on MLPerf's biggest LLM workload, Llama 2 70B, thanks to its use of a second-generation Transformer Engine and FP4 Tensor Cores.

The NVIDIA H200 Tensor Core GPU delivered outstanding results on every benchmark in the data center category - including the latest addition to the benchmark, the Mixtral 8x7B mixture of experts (MoE) LLM, which features a total of 46.7 billion parameters, with 12.9 billion parameters active per token. MoE models have gained popularity as a way to bring more versatility to LLM deployments, as they're capable of answering a wide variety of questions and performing more diverse tasks in a single deployment. They're also more efficient since they only activate a few experts per inference - meaning they deliver results much faster than dense models of a similar size.

Huawei Reportedly Developing New Ascend 910C AI Chip to Rival NVIDIA's H100 GPU

Amidst escalating tensions in the U.S.-China semiconductor industry, Huawei is reportedly working on a new AI chip called the Ascend 910C. This development appears to be the Chinese tech giant's attempt to compete with NVIDIA's AI processors in the Chinese market. According to a Wall Street Journal report, Huawei has begun testing the Ascend 910C with various Chinese internet and telecom companies to evaluate its performance and capabilities. Notable firms such as ByteDance, Baidu, and China Mobile are said to have received samples of the chip.

Huawei has reportedly informed its clients that the Ascend 910C can match the performance of NVIDIA's H100 chip. The company has been conducting tests for several weeks, suggesting that the new processor is nearing completion. The Wall Street Journal indicates that Huawei could start shipping the chip as early as October 2024. The report also mentions that Huawei and potential customers have discussed orders for over 70,000 chips, potentially worth $2 billion.

NVIDIA's New B200A Targets OEM Customers; High-End GPU Shipments Expected to Grow 55% in 2025

Despite recent rumors speculating on NVIDIA's supposed cancellation of the B100 in favor of the B200A, TrendForce reports that NVIDIA is still on track to launch both the B100 and B200 in the 2H24 as it aims to target CSP customers. Additionally, a scaled-down B200A is planned for other enterprise clients, focusing on edge AI applications.

TrendForce reports that NVIDIA will prioritize the B100 and B200 for CSP customers with higher demand due to the tight production capacity of CoWoS-L. Shipments are expected to commence after 3Q24. In light of yield and mass production challenges with CoWoS-L, NVIDIA is also planning the B200A for other enterprise clients, utilizing CoWoS-S packaging technology.

Apple Trained its Apple Intelligence Models on Google TPUs, Not NVIDIA GPUs

Apple has disclosed that its newly announced Apple Intelligence features were developed using Google's Tensor Processing Units (TPUs) rather than NVIDIA's widely adopted hardware accelerators like H100. This unexpected choice was detailed in an official Apple research paper, shedding light on the company's approach to AI development. The paper outlines how systems equipped with Google's TPUv4 and TPUv5 chips played a crucial role in creating Apple Foundation Models (AFMs). These models, including AFM-server and AFM-on-device, are designed to power both online and offline Apple Intelligence features introduced at WWDC 2024. For the training of the 6.4 billion parameter AFM-server, Apple's largest language model, the company utilized an impressive array of 8,192 TPUv4 chips, provisioned as 8×1024 chip slices. The training process involved a three-stage approach, processing a total of 7.4 trillion tokens. Meanwhile, the more compact 3 billion parameter AFM-on-device model, optimized for on-device processing, was trained using 2,048 TPUv5p chips.

Apple's training data came from various sources, including the Applebot web crawler and licensed high-quality datasets. The company also incorporated carefully selected code, math, and public datasets to enhance the models' capabilities. Benchmark results shared in the paper suggest that both AFM-server and AFM-on-device excel in areas such as Instruction Following, Tool Use, and Writing, positioning Apple as a strong contender in the AI race despite its relatively late entry. However, Apple's penetration tactic into the AI market is much more complex than any other AI competitor. Given Apple's massive user base and millions of devices compatible with Apple Intelligence, the AFM has the potential to change user interaction with devices for good, especially for everyday tasks. Hence, refining AI models for these tasks is critical before massive deployment. Another unexpected feature is transparency from Apple, a company typically known for its secrecy. The AI boom is changing some of Apple's ways, and revealing these inner workings is always interesting.

Global AI Server Demand Surge Expected to Drive 2024 Market Value to US$187 Billion; Represents 65% of Server Market

TrendForce's latest industry report on AI servers reveals that high demand for advanced AI servers from major CSPs and brand clients is expected to continue in 2024. Meanwhile, TSMC, SK hynix, Samsung, and Micron's gradual production expansion has significantly eased shortages in 2Q24. Consequently, the lead time for NVIDIA's flagship H100 solution has decreased from the previous 40-50 weeks to less than 16 weeks.

TrendForce estimates that AI server shipments in the second quarter will increase by nearly 20% QoQ, and has revised the annual shipment forecast up to 1.67 million units—marking a 41.5% YoY growth.

AI Startup Etched Unveils Transformer ASIC Claiming 20x Speed-up Over NVIDIA H100

A new startup emerged out of stealth mode today to power the next generation of generative AI. Etched is a company that makes an application-specific integrated circuit (ASIC) to process "Transformers." The transformer is an architecture for designing deep learning models developed by Google and is now the powerhouse behind models like OpenAI's GPT-4o in ChatGPT, Anthropic Claude, Google Gemini, and Meta's Llama family. Etched wanted to create an ASIC for processing only the transformer models, making a chip called Sohu. The claim is Sohu outperforms NVIDIA's latest and greatest by an entire order of magnitude. Where a server configuration with eight NVIDIA H100 GPU clusters pushes Llama-3 70B models at 25,000 tokens per second, and the latest eight B200 "Blackwell" GPU cluster pushes 43,000 tokens/s, the eight Sohu clusters manage to output 500,000 tokens per second.

Why is this important? Not only does the ASIC outperform Hopper by 20x and Blackwell by 10x, but it also serves so many tokens per second that it enables an entirely new fleet of AI applications requiring real-time output. The Sohu architecture is so efficient that 90% of the FLOPS can be used, while traditional GPUs boast a 30-40% FLOP utilization rate. This translates into inefficiency and waste of power, which Etched hopes to solve by building an accelerator dedicated to power transformers (the "T" in GPT) at massive scales. Given that the frontier model development costs more than one billion US dollars, and hardware costs are measured in tens of billions of US Dollars, having an accelerator dedicated to powering a specific application can help advance AI faster. AI researchers often say that "scale is all you need" (resembling the legendary "attention is all you need" paper), and Etched wants to build on that.

NVIDIA MLPerf Training Results Showcase Unprecedented Performance and Elasticity

The full-stack NVIDIA accelerated computing platform has once again demonstrated exceptional performance in the latest MLPerf Training v4.0 benchmarks. NVIDIA more than tripled the performance on the large language model (LLM) benchmark, based on GPT-3 175B, compared to the record-setting NVIDIA submission made last year. Using an AI supercomputer featuring 11,616 NVIDIA H100 Tensor Core GPUs connected with NVIDIA Quantum-2 InfiniBand networking, NVIDIA achieved this remarkable feat through larger scale - more than triple that of the 3,584 H100 GPU submission a year ago - and extensive full-stack engineering.

Thanks to the scalability of the NVIDIA AI platform, Eos can now train massive AI models like GPT-3 175B even faster, and this great AI performance translates into significant business opportunities. For example, in NVIDIA's recent earnings call, we described how LLM service providers can turn a single dollar invested into seven dollars in just four years running the Llama 3 70B model on NVIDIA HGX H200 servers. This return assumes an LLM service provider serving Llama 3 70B at $0.60/M tokens, with an HGX H200 server throughput of 24,000 tokens/second.

Blackwell Shipments Imminent, Total CoWoS Capacity Expected to Surge by Over 70% in 2025

TrendForce reports that NVIDIA's Hopper H100 began to see a reduction in shortages in 1Q24. The new H200 from the same platform is expected to gradually ramp in Q2, with the Blackwell platform entering the market in Q3 and expanding to data center customers in Q4. However, this year will still primarily focus on the Hopper platform, which includes the H100 and H200 product lines. The Blackwell platform—based on how far supply chain integration has progressed—is expected to start ramping up in Q4, accounting for less than 10% of the total high-end GPU market.

The die size of Blackwell platform chips like the B100 is twice that of the H100. As Blackwell becomes mainstream in 2025, the total capacity of TSMC's CoWoS is projected to grow by 150% in 2024 and by over 70% in 2025, with NVIDIA's demand occupying nearly half of this capacity. For HBM, the NVIDIA GPU platform's evolution sees the H100 primarily using 80 GB of HBM3, while the 2025 B200 will feature 288 GB of HBM3e—a 3-4 fold increase in capacity per chip. The three major manufacturers' expansion plans indicate that HBM production volume will likely double by 2025.

NVIDIA Blackwell GB200 Superchip to Cost up to 70,000 US Dollars

According to analysts at HSBC, NVIDIA's upcoming Blackwell GPUs for AI workloads are expected to carry premium pricing significantly higher than the company's current Hopper-based processors. The analysts estimate that NVIDIA's "entry-level" Blackwell GPU, the B100, will have an average selling price between $30,000 and $35,000 per chip. That's already on par with the flagship H100 GPU from the previous Hopper generation. But the real premium lies with the top-end GB200 "superchip" that combines a Grace CPU with two enhanced B200 GPUs. HSBC analysts peg pricing for this monster chip at a staggering $60,000 to $70,000 per unit. NVIDIA may opt to primarily sell complete servers powered by Blackwell rather than individual chips. The estimates suggest a fully-loaded GB200 NVL72 server with 72 GB200 Superchips could fetch around $3 million.

The sky-high pricing continues NVIDIA's aggressive strategy of charging a premium for its leading AI and accelerator hardware. With rivals like AMD and Intel still lagging in this space, NVIDIA can essentially name its price for now. The premium pricing reflects the massive performance uplift promised by Blackwell. A single GB200 Superchip is rated for five PetaFLOPs at TF32 of AI compute power with sparsity, a 5x increase over the H100's one PetaFLOP. Of course, actual street pricing will depend on volume and negotiating power. Hyperscalers like Amazon and Microsoft may secure significant discounts, while smaller players could pay even more than these eye-watering analyst projections. NVIDIA is betting that the industry's insatiable demand for more AI compute power will make these premium price tags palatable, at least for a while. But it's also raising the stakes for competitors to catch up quickly before losing too much ground.

Intel Ponte Vecchio Waves Goodbye, Company Focuses on Falcon Shores for 2025 Release

According to ServeTheHome, Intel has decided to discontinue its high-performance computing (HPC) product line, Ponte Vecchio, and shift its focus towards developing its next-generation data center GPU, codenamed Falcon Shores. This decision comes as Intel aims to streamline its operations and concentrate its resources on the most promising and competitive offerings. The Ponte Vecchio GPU, released in January of 2023, was intended to be Intel's flagship product for the HPC market, competing against the likes of NVIDIA's H100 and AMD's Instinct MI series. However, despite its impressive specifications and features, Ponte Vecchio faced significant delays and challenges in its development and production cycle. Intel's decision to abandon Ponte Vecchio is pragmatic, recognizing the intense competition and rapidly evolving landscape of the data center GPU market.

By pivoting its attention to Falcon Shores, Intel aims to deliver a more competitive and cutting-edge solution that can effectively challenge the dominance of its rivals. Falcon Shores, slated for release in 2025, is expected to leverage Intel's latest process node and architectural innovations. Currently, Intel has Gaudi 2 and Gaudi 3 accelerators for AI. However, the HPC segment is left without a clear leader in the company's product offerings. Intel's Ponte Vecchio is powering Aurora exascale supercomputer, which is the latest submission to the TOP500 supercomputer lists. This is also coming after the Rialto Bridge cancellation, which was supposed to be an HPC-focused card. In the future, the company will focus only on the Falcon Shores accelerator, which will unify HPC and AI needs for high-precision FP64 and lower-precision FP16/INT8.

TOP500: Frontier Keeps Top Spot, Aurora Officially Becomes the Second Exascale Machine

The 63rd edition of the TOP500 reveals that Frontier has once again claimed the top spot, despite no longer being the only exascale machine on the list. Additionally, a new system has found its way into the Top 10.

The Frontier system at Oak Ridge National Laboratory in Tennessee, USA remains the most powerful system on the list with an HPL score of 1.206 EFlop/s. The system has a total of 8,699,904 combined CPU and GPU cores, an HPE Cray EX architecture that combines 3rd Gen AMD EPYC CPUs optimized for HPC and AI with AMD Instinct MI250X accelerators, and it relies on Cray's Slingshot 11 network for data transfer. On top of that, this machine has an impressive power efficiency rating of 52.93 GFlops/Watt - putting Frontier at the No. 13 spot on the GREEN500.

NVIDIA Blackwell Platform Pushes the Boundaries of Scientific Computing

Quantum computing. Drug discovery. Fusion energy. Scientific computing and physics-based simulations are poised to make giant steps across domains that benefit humanity as advances in accelerated computing and AI drive the world's next big breakthroughs. NVIDIA unveiled at GTC in March the NVIDIA Blackwell platform, which promises generative AI on trillion-parameter large language models (LLMs) at up to 25x less cost and energy consumption than the NVIDIA Hopper architecture.

Blackwell has powerful implications for AI workloads, and its technology capabilities can also help to deliver discoveries across all types of scientific computing applications, including traditional numerical simulation. By reducing energy costs, accelerated computing and AI drive sustainable computing. Many scientific computing applications already benefit. Weather can be simulated at 200x lower cost and with 300x less energy, while digital twin simulations have 65x lower cost and 58x less energy consumption versus traditional CPU-based systems and others.

Demand for NVIDIA's Blackwell Platform Expected to Boost TSMC's CoWoS Total Capacity by Over 150% in 2024

NVIDIA's next-gen Blackwell platform, which includes B-series GPUs and integrates NVIDIA's own Grace Arm CPU in models such as the GB200, represents a significant development. TrendForce points out that the GB200 and its predecessor, the GH200, both feature a combined CPU+GPU solution, primarily equipped with the NVIDIA Grace CPU and H200 GPU. However, the GH200 accounted for only approximately 5% of NVIDIA's high-end GPU shipments. The supply chain has high expectations for the GB200, with projections suggesting that its shipments could exceed millions of units by 2025, potentially making up nearly 40 to 50% of NVIDIA's high-end GPU market.

Although NVIDIA plans to launch products such as the GB200 and B100 in the second half of this year, upstream wafer packaging will need to adopt more complex and high-precision CoWoS-L technology, making the validation and testing process time-consuming. Additionally, more time will be required to optimize the B-series for AI server systems in aspects such as network communication and cooling performance. It is anticipated that the GB200 and B100 products will not see significant production volumes until 4Q24 or 1Q25.

U.S. Updates Advanced Semiconductor Ban, Actual Impact on the Industry Will Be Insignificant

On March 29th, the United States announced another round of updates to its export controls, targeting advanced computing, supercomputers, semiconductor end-uses, and semiconductor manufacturing products. These new regulations, which took effect on April 4th, are designed to prevent certain countries and businesses from circumventing U.S. restrictions to access sensitive chip technologies and equipment. Despite these tighter controls, TrendForce believes the practical impact on the industry will be minimal.

The latest updates aim to refine the language and parameters of previous regulations, tightening the criteria for exports to Macau and D:5 countries (China, North Korea, Russia, Iran, etc.). They require a detailed examination of all technology products' Total Processing Performance (TPP) and Performance Density (PD). If a product exceeds certain computing power thresholds, it must undergo a case-by-case review. Nevertheless, a new provision, Advanced Computing Authorized (ACA), allows for specific exports and re-exports among selected countries, including the transshipment of particular products between Macau and D:5 countries.

NVIDIA Hopper Leaps Ahead in Generative AI at MLPerf

It's official: NVIDIA delivered the world's fastest platform in industry-standard tests for inference on generative AI. In the latest MLPerf benchmarks, NVIDIA TensorRT-LLM—software that speeds and simplifies the complex job of inference on large language models—boosted the performance of NVIDIA Hopper architecture GPUs on the GPT-J LLM nearly 3x over their results just six months ago. The dramatic speedup demonstrates the power of NVIDIA's full-stack platform of chips, systems and software to handle the demanding requirements of running generative AI. Leading companies are using TensorRT-LLM to optimize their models. And NVIDIA NIM—a set of inference microservices that includes inferencing engines like TensorRT-LLM—makes it easier than ever for businesses to deploy NVIDIA's inference platform.

Raising the Bar in Generative AI
TensorRT-LLM running on NVIDIA H200 Tensor Core GPUs—the latest, memory-enhanced Hopper GPUs—delivered the fastest performance running inference in MLPerf's biggest test of generative AI to date. The new benchmark uses the largest version of Llama 2, a state-of-the-art large language model packing 70 billion parameters. The model is more than 10x larger than the GPT-J LLM first used in the September benchmarks. The memory-enhanced H200 GPUs, in their MLPerf debut, used TensorRT-LLM to produce up to 31,000 tokens/second, a record on MLPerf's Llama 2 benchmark. The H200 GPU results include up to 14% gains from a custom thermal solution. It's one example of innovations beyond standard air cooling that systems builders are applying to their NVIDIA MGX designs to take the performance of Hopper GPUs to new heights.

Intel Gaudi 2 Remains Only Benchmarked Alternative to NV H100 for Generative AI Performance

Today, MLCommons published results of the industry-standard MLPerf v4.0 benchmark for inference. Intel's results for Intel Gaudi 2 accelerators and 5th Gen Intel Xeon Scalable processors with Intel Advanced Matrix Extensions (Intel AMX) reinforce the company's commitment to bring "AI Everywhere" with a broad portfolio of competitive solutions. The Intel Gaudi 2 AI accelerator remains the only benchmarked alternative to Nvidia H100 for generative AI (GenAI) performance and provides strong performance-per-dollar. Further, Intel remains the only server CPU vendor to submit MLPerf results. Intel's 5th Gen Xeon results improved by an average of 1.42x compared with 4th Gen Intel Xeon processors' results in MLPerf Inference v3.1.

"We continue to improve AI performance on industry-standard benchmarks across our portfolio of accelerators and CPUs. Today's results demonstrate that we are delivering AI solutions that deliver to our customers' dynamic and wide-ranging AI requirements. Both Intel Gaudi and Xeon products provide our customers with options that are ready to deploy and offer strong price-to-performance advantages," said Zane Ball, Intel corporate vice president and general manager, DCAI Product Management.

Nvidia CEO Reiterates Solid Partnership with TSMC

One key takeaway from the ongoing GTC is that Nvidia's AI empire has taken shape with strong partnerships from TSMC and other Taiwanese makers, such as those major server ODMs.

According to the news report from the technology-focused media DIGITIMES Asia, during his keynote at GTC on March 18, Huang underscored his company's partnerships with TSMC, as well as the supply chain in Taiwan. Speaking to the press later, Huang said Nvidia will have a very strong demand for CoWoS, the advanced packaging services TSMC offers.

Samsung Prepares Mach-1 Chip to Rival NVIDIA in AI Inference

During its 55th annual shareholders' meeting, Samsung Electronics announced its entry into the AI processor market with the upcoming launch of its Mach-1 AI accelerator chips in early 2025. The South Korean tech giant revealed its plans to compete with established players like NVIDIA in the rapidly growing AI hardware sector. The Mach-1 generation of chips is an application-specific integrated circuit (ASIC) design equipped with LPDDR memory that is envisioned to excel in edge computing applications. While Samsung does not aim to directly rival NVIDIA's ultra-high-end AI solutions like the H100, B100, or B200, the company's strategy focuses on carving out a niche in the market by offering unique features and performance enhancements at the edge, where low power and efficient computing is what matters the most.

According to SeDaily, the Mach-1 chips boast a groundbreaking feature that significantly reduces memory bandwidth requirements for inference to approximately 0.125x compared to existing designs, which is an 87.5% reduction. This innovation could give Samsung a competitive edge in terms of efficiency and cost-effectiveness. As the demand for AI-powered devices and services continues to soar, Samsung's foray into the AI chip market is expected to intensify competition and drive innovation in the industry. While NVIDIA currently holds a dominant position, Samsung's cutting-edge technology and access to advanced semiconductor manufacturing nodes could make it a formidable contender. The Mach-1 has been field-verified on an FPGA, while the final design is currently going through a physical design for SoC, which includes placement, routing, and other layout optimizations.
Return to Keyword Browsing
Dec 20th, 2024 06:32 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts