News Posts matching #Hopper

Return to Keyword Browsing

NVIDIA's GB200 "Blackwell" Racks Face Overheating Issues

NVIDIA's new GB200 "Blackwell" racks are running into trouble (again). Big cloud companies like Microsoft, Amazon, Google, and Meta Platforms are cutting back their orders because of heat problems, Reuters reports, quoting The Information. The first shipments of racks with Blackwell chips are getting too hot and have connection issues between chips, the report says. These tech hiccups have made some customers who ordered $10 billion or more worth of racks think twice about buying.

Some are putting off their orders until NVIDIA has better versions of the racks. Others are looking at buying older NVIDIA AI chips instead. For example, Microsoft planned to set up GB200 racks with no less than 50,000 Blackwell chips at one of its Phoenix sites. However, The Information reports that OpenAI has asked Microsoft to provide NVIDIA's older "Hopper" chips instead pointing to delays linked to the Blackwell racks. NVIDIA's problems with its Blackwell GPUs housed in high-density racks are not something new; in November 2024, Reuters, also referencing The Information, uncovered overheating issues in servers that housed 72 processors. NVIDIA has made several changes to its server rack designs to tackle these problems, however, it seems that the problem was not entirely solved.

Microsoft Acquired Nearly 500,000 NVIDIA "Hopper" GPUs This Year

Microsoft is heavily investing in enabling its company and cloud infrastructure to support the massive AI expansion. The Redmond giant has acquired nearly half a million of the NVIDIA "Hopper" family of GPUs to support this effort. According to market research company Omdia, Microsoft was the biggest hyperscaler, with data center CapEx and GPU expenditure reaching a record high. The company acquired precisely 485,000 NVIDIA "Hopper" GPUs, including H100, H200, and H20, resulting in more than $30 billion spent on servers alone. To put things into perspective, this is about double that of the next-biggest GPU purchaser, Chinese ByteDance, who acquired about 230,000 sanction-abiding H800 GPUs and regular H100s sources from third parties.

Regarding US-based companies, the only ones that have come close to the GPU acquisition rate are Meta, Tesla/xAI, Amazon, and Google. They have acquired around 200,000 GPUs on average while significantly boosting their in-house chip design efforts. "NVIDIA GPUs claimed a tremendously high share of the server capex," Vlad Galabov, director of cloud and data center research at Omdia, noted, adding, "We're close to the peak." Hyperscalers like Amazon, Google, and Meta have been working on their custom solutions for AI training and inference. For example, Google has its TPU, Amazon has its Trainium and Inferentia chips, and Meta has its MTIA. Hyperscalers are eager to develop their in-house solutions, but NVIDIA's grip on the software stack paired with timely product updates seems hard to break. The latest "Blackwell" chips are projected to get even bigger orders, so only the sky (and the local power plant) is the limit.

NVIDIA's Next-Gen "Rubin" AI GPU Development 6 Months Ahead of Schedule: Report

The "Rubin" architecture succeeds NVIDIA's current "Blackwell," which powers the company's AI GPUs, as well as the upcoming GeForce RTX 50-series gaming GPUs. NVIDIA will likely not build gaming GPUs with "Rubin," just like it didn't with "Hopper," and for the most part, "Volta." NVIDIA's AI GPU product roadmap put out at SC'24 puts "Blackwell" firmly in charge of the company's AI GPU product stack throughout 2025, with "Rubin" only succeeding it in the following year, for a two-year run in the market, being capped off with a "Rubin Ultra" larger GPU slated for 2027. A new report by United Daily News (UDN), a Taiwan-based publication, says that the development of "Rubin" is running 6 months ahead of schedule.

Being 6 months ahead of schedule doesn't necessarily mean that the product will launch sooner. It would give NVIDIA headroom to get "Rubin" better evaluated in the industry, and make last-minute changes to the product if needed; or even advance the launch if it wants to. The first AI GPU powered by "Rubin" will feature 8-high HBM4 memory stacks. The company will also introduce the "Vera" CPU, the long-awaited successor to "Grace." It will also introduce the X1600 InfiniBand/Ethernet network processor. According to the SC'24 roadmap by NVIDIA, these three would've seen a 2026 launch. Then in 2027, the company would follow up with an even larger AI GPU based on the same "Rubin" architecture, codenamed "Rubin Ultra." This features 12-high HBM4 stacks. NVIDIA's current GB200 "Blackwell" is a tile-based GPU, with two dies that have full cache-coherence. "Rubin" is rumored to feature four tiles.

NVIDIA Announces Hopper H200 NVL PCIe GPU Availability at SC24, Promising 1.3x HPC Performance Over H100 NVL

Since its introduction, the NVIDIA Hopper architecture has transformed the AI and high-performance computing (HPC) landscape, helping enterprises, researchers and developers tackle the world's most complex challenges with higher performance and greater energy efficiency. During the Supercomputing 2024 conference, NVIDIA announced the availability of the NVIDIA H200 NVL PCIe GPU - the latest addition to the Hopper family. H200 NVL is ideal for organizations with data centers looking for lower-power, air-cooled enterprise rack designs with flexible configurations to deliver acceleration for every AI and HPC workload, regardless of size.

According to a recent survey, roughly 70% of enterprise racks are 20kW and below and use air cooling. This makes PCIe GPUs essential, as they provide granularity of node deployment, whether using one, two, four or eight GPUs - enabling data centers to pack more computing power into smaller spaces. Companies can then use their existing racks and select the number of GPUs that best suits their needs. Enterprises can use H200 NVL to accelerate AI and HPC applications, while also improving energy efficiency through reduced power consumption. With a 1.5x memory increase and 1.2x bandwidth increase over NVIDIA H100 NVL, companies can use H200 NVL to fine-tune LLMs within a few hours and deliver up to 1.7x faster inference performance. For HPC workloads, performance is boosted up to 1.3x over H100 NVL and 2.5x over the NVIDIA Ampere architecture generation.

NVIDIA B200 "Blackwell" Records 2.2x Performance Improvement Over its "Hopper" Predecessor

We know that NVIDIA's latest "Blackwell" GPUs are fast, but how much faster are they over the previous generation "Hopper"? Thanks to the latest MLPerf Training v4.1 results, NVIDIA's HGX B200 Blackwell platform has demonstrated massive performance gains, measuring up to 2.2x improvement per GPU compared to its HGX H200 Hopper. The latest results, verified by MLCommons, reveal impressive achievements in large language model (LLM) training. The Blackwell architecture, featuring HBM3e high-bandwidth memory and fifth-generation NVLink interconnect technology, achieved double the performance per GPU for GPT-3 pre-training and a 2.2x boost for Llama 2 70B fine-tuning compared to the previous Hopper generation. Each benchmark system incorporated eight Blackwell GPUs operating at a 1,000 W TDP, connected via NVLink Switch for scale-up.

The network infrastructure utilized NVIDIA ConnectX-7 SuperNICs and Quantum-2 InfiniBand switches, enabling high-speed node-to-node communication for distributed training workloads. While previous Hopper-based systems required 256 GPUs to optimize performance for the GPT-3 175B benchmark, Blackwell accomplished the same task with just 64 GPUs, leveraging its larger HBM3e memory capacity and bandwidth. One thing to look out for is the upcoming GB200 NVL72 system, which promises even more significant gains past the 2.2x. It features expanded NVLink domains, higher memory bandwidth, and tight integration with NVIDIA Grace CPUs, complemented by ConnectX-8 SuperNIC and Quantum-X800 switch technologies. With faster switching and better data movement with Grace-Blackwell integration, we could see even more software optimization from NVIDIA to push the performance envelope.

NVIDIA Ethernet Networking Accelerates World's Largest AI Supercomputer, Built by xAI

NVIDIA today announced that xAI's Colossus supercomputer cluster comprising 100,000 NVIDIA Hopper GPUs in Memphis, Tennessee, achieved this massive scale by using the NVIDIA Spectrum-X Ethernet networking platform, which is designed to deliver superior performance to multi-tenant, hyperscale AI factories using standards-based Ethernet, for its Remote Direct Memory Access (RDMA) network.

Colossus, the world's largest AI supercomputer, is being used to train xAI's Grok family of large language models, with chatbots offered as a feature for X Premium subscribers. xAI is in the process of doubling the size of Colossus to a combined total of 200,000 NVIDIA Hopper GPUs.

NVIDIA "Blackwell" GPUs are Sold Out for 12 Months, Customers Ordering in 100K GPU Quantities

NVIDIA's "Blackwell" series of GPUs, including B100, B200, and GB200, are reportedly sold out for 12 months or an entire year. This directly means that if a new customer is willing to order a new Blackwell GPU now, there is a 12-month waitlist to get that GPU. Analyst from Morgan Stanley Joe Moore confirmed that in a meeting with NVIDIA and its investors, NVIDIA executives confirmed that the demand for "Blackwell" is so great that there is a 12-month backlog to fulfill first before shipping to anyone else. We expect that this includes customers like Amazon, META, Microsoft, Google, Oracle, and others, who are ordering GPUs in insane quantities to keep up with the demand from their customers.

The previous generation of "Hopper" GPUs was ordered in 10s of thousands of GPUs, while this "Blackwell" generation was ordered in 100s of thousands of GPUs simultaneously. For NVIDIA, that is excellent news, as that demand is expected to continue. The only one standing in the way of customers is TSMC, which manufactures these GPUs as fast as possible to meet demand. NVIDIA is one of TSMC's largest customers, so wafer allocation at TSMC's facilities is only expected to grow. We are now officially in the era of the million-GPU data centers, and we can only question at what point this massive growth stops or if it will stop at all in the near future.

NVIDIA's Jensen Huang to Lead CES 2025 Keynote

NVIDIA CEO Jensen Huang will be leading the keynote address at the coveted 2025 International CES in Las Vegas, which opens on January 7. The keynote address is slated for January 6, 6:30 am PT. There is of course no word from NVIDIA on what to expect, but we have some fairly easy guesswork. NVIDIA's refresh of the GeForce RTX product stack is due, and the company is expected to either debut or expand its next-generation GeForce RTX 50-series "Blackwell" gaming GPU stack, bringing in generational improvements in performance and performance-per-Watt, besides new technology.

The company could also make more announcements related to its "Blackwell" AI GPU lineup, which is expected to ramp through 2025, succeeding the current "Hopper" H100 and H200 series. The company could also tease "Rubin," which it referenced recently at GTC in May, "Rubin" succeeds "Blackwell," and will debut as an AI GPU toward the end of 2025, with a 2026 ramp toward customers. It's unclear if NVIDIA will make gaming GPUs on "Rubin," since GeForce RTX generations tend to have a 2-year cadence, and there was no gaming GPU based on "Hopper."

AMD MI300X Accelerators are Competitive with NVIDIA H100, Crunch MLPerf Inference v4.1

The MLCommons consortium on Wednesday posted MLPerf Inference v4.1 benchmark results for popular AI inferencing accelerators available in the market, across brands that include NVIDIA, AMD, and Intel. AMD's Instinct MI300X accelerators emerged competitive to NVIDIA's "Hopper" H100 series AI GPUs. AMD also used the opportunity to showcase the kind of AI inferencing performance uplifts customers can expect from its next-generation EPYC "Turin" server processors powering these MI300X machines. "Turin" features "Zen 5" CPU cores, sporting a 512-bit FPU datapath, and improved performance in AI-relevant 512-bit SIMD instruction-sets, such as AVX-512, and VNNI. The MI300X, on the other hand, banks on the strengths of its memory sub-system, FP8 data format support, and efficient KV cache management.

The MLPerf Inference v4.1 benchmark focused on the 70 billion-parameter LLaMA2-70B model. AMD's submissions included machines featuring the Instinct MI300X, powered by the current EPYC "Genoa" (Zen 4), and next-gen EPYC "Turin" (Zen 5). The GPUs are backed by AMD's ROCm open-source software stack. The benchmark evaluated inference performance using 24,576 Q&A samples from the OpenORCA dataset, with each sample containing up to 1024 input and output tokens. Two scenarios were assessed: the offline scenario, focusing on batch processing to maximize throughput in tokens per second, and the server scenario, which simulates real-time queries with strict latency limits (TTFT ≤ 2 seconds, TPOT ≤ 200 ms). This lets you see the chip's mettle in both high-throughput and low-latency queries.

NVIDIA Blackwell Sets New Standard for Generative AI in MLPerf Inference Benchmark

As enterprises race to adopt generative AI and bring new services to market, the demands on data center infrastructure have never been greater. Training large language models is one challenge, but delivering LLM-powered real-time services is another. In the latest round of MLPerf industry benchmarks, Inference v4.1, NVIDIA platforms delivered leading performance across all data center tests. The first-ever submission of the upcoming NVIDIA Blackwell platform revealed up to 4x more performance than the NVIDIA H100 Tensor Core GPU on MLPerf's biggest LLM workload, Llama 2 70B, thanks to its use of a second-generation Transformer Engine and FP4 Tensor Cores.

The NVIDIA H200 Tensor Core GPU delivered outstanding results on every benchmark in the data center category - including the latest addition to the benchmark, the Mixtral 8x7B mixture of experts (MoE) LLM, which features a total of 46.7 billion parameters, with 12.9 billion parameters active per token. MoE models have gained popularity as a way to bring more versatility to LLM deployments, as they're capable of answering a wide variety of questions and performing more diverse tasks in a single deployment. They're also more efficient since they only activate a few experts per inference - meaning they deliver results much faster than dense models of a similar size.

NVIDIA's New B200A Targets OEM Customers; High-End GPU Shipments Expected to Grow 55% in 2025

Despite recent rumors speculating on NVIDIA's supposed cancellation of the B100 in favor of the B200A, TrendForce reports that NVIDIA is still on track to launch both the B100 and B200 in the 2H24 as it aims to target CSP customers. Additionally, a scaled-down B200A is planned for other enterprise clients, focusing on edge AI applications.

TrendForce reports that NVIDIA will prioritize the B100 and B200 for CSP customers with higher demand due to the tight production capacity of CoWoS-L. Shipments are expected to commence after 3Q24. In light of yield and mass production challenges with CoWoS-L, NVIDIA is also planning the B200A for other enterprise clients, utilizing CoWoS-S packaging technology.

NVIDIA Hit with DOJ Antitrust Probe over AI GPUs, Unfair Sales Tactics and Pricing Alleged

NVIDIA has reportedly been hit with a US Department of Justice (DOJ) antitrust probe over the tactics the company allegedly employs to sell or lease its AI GPUs and data-center networking equipment, "The Information" reported. Shares of the NVIDIA stock fell 3.6% in the pre-market trading on Friday (08/02). The main complainants behind the probe appear to be a special interest group among the customers of AI GPUs, and not NVIDIA's competitors in the AI GPU industry per se. US Senator Elizabeth Warren and US progressives have been most vocal about calling upon the DOJ to investigate antitrust allegations against NVIDIA.

Meanwhile, US officials are reportedly reaching out to NVIDIA's competitors, including AMD and Intel, to gather information about the complaints. NVIDIA holds 80% of the AI GPU market, while AMD, and to a much lesser extent, Intel, have received spillover demand for AI GPUs. "The Information" report says that the complaint alleges NVIDIA pressured cloud customers to buy "multiple products". We don't know what this means, one theory holds that NVIDIA is getting them to commit to buying multiple generations of products (eg: Ampere, Hopper, and over to Blackwell); while another holds that it's getting them to buy multiple kinds of products, which include not just the AI GPUs, but also NVIDIA's first-party server systems and networking equipment. Yet another theory holds that it is bundle first-party software and services to go with the hardware, far beyond the basic software needed to get the hardware to work.

NVIDIA Blackwell's High Power Consumption Drives Cooling Demands; Liquid Cooling Penetration Expected to Reach 10% by Late 2024

With the growing demand for high-speed computing, more effective cooling solutions for AI servers are gaining significant attention. TrendForce's latest report on AI servers reveals that NVIDIA is set to launch its next-generation Blackwell platform by the end of 2024. Major CSPs are expected to start building AI server data centers based on this new platform, potentially driving the penetration rate of liquid cooling solutions to 10%.

Air and liquid cooling systems to meet higher cooling demands
TrendForce reports that the NVIDIA Blackwell platform will officially launch in 2025, replacing the current Hopper platform and becoming the dominant solution for NVIDIA's high-end GPUs, accounting for nearly 83% of all high-end products. High-performance AI server models like the B200 and GB200 are designed for maximum efficiency, with individual GPUs consuming over 1,000 W. HGX models will house 8 GPUs each, while NVL models will support 36 or 72 GPUs per rack, significantly boosting the growth of the liquid cooling supply chain for AI servers.

Global AI Server Demand Surge Expected to Drive 2024 Market Value to US$187 Billion; Represents 65% of Server Market

TrendForce's latest industry report on AI servers reveals that high demand for advanced AI servers from major CSPs and brand clients is expected to continue in 2024. Meanwhile, TSMC, SK hynix, Samsung, and Micron's gradual production expansion has significantly eased shortages in 2Q24. Consequently, the lead time for NVIDIA's flagship H100 solution has decreased from the previous 40-50 weeks to less than 16 weeks.

TrendForce estimates that AI server shipments in the second quarter will increase by nearly 20% QoQ, and has revised the annual shipment forecast up to 1.67 million units—marking a 41.5% YoY growth.

AI Startup Etched Unveils Transformer ASIC Claiming 20x Speed-up Over NVIDIA H100

A new startup emerged out of stealth mode today to power the next generation of generative AI. Etched is a company that makes an application-specific integrated circuit (ASIC) to process "Transformers." The transformer is an architecture for designing deep learning models developed by Google and is now the powerhouse behind models like OpenAI's GPT-4o in ChatGPT, Anthropic Claude, Google Gemini, and Meta's Llama family. Etched wanted to create an ASIC for processing only the transformer models, making a chip called Sohu. The claim is Sohu outperforms NVIDIA's latest and greatest by an entire order of magnitude. Where a server configuration with eight NVIDIA H100 GPU clusters pushes Llama-3 70B models at 25,000 tokens per second, and the latest eight B200 "Blackwell" GPU cluster pushes 43,000 tokens/s, the eight Sohu clusters manage to output 500,000 tokens per second.

Why is this important? Not only does the ASIC outperform Hopper by 20x and Blackwell by 10x, but it also serves so many tokens per second that it enables an entirely new fleet of AI applications requiring real-time output. The Sohu architecture is so efficient that 90% of the FLOPS can be used, while traditional GPUs boast a 30-40% FLOP utilization rate. This translates into inefficiency and waste of power, which Etched hopes to solve by building an accelerator dedicated to power transformers (the "T" in GPT) at massive scales. Given that the frontier model development costs more than one billion US dollars, and hardware costs are measured in tens of billions of US Dollars, having an accelerator dedicated to powering a specific application can help advance AI faster. AI researchers often say that "scale is all you need" (resembling the legendary "attention is all you need" paper), and Etched wants to build on that.

Noctua Shows Ampere Altra and NVIDIA GH200 CPU Coolers at Computex 2024

Noctua unveiled its new Ampere Altra family of CPU coolers for Ampere Altra and Altra Max Arm processors at the Computex 2024 show, as well as the upcoming NVIDIA GH200 Grace Hopper superchip cooler. In addition, it also showcased its new cooperation with Seasonic with PRIME TX-1600 Noctua Edition power supply and a rather unique Kaelo wine cooler.

In addition to the new and upcoming standard CPU coolers and fans, Noctua also unveiled the new Ampere Altra family of CPU coolers at the Computex 2024 show, aimed to be used with recently launched Ampere Altra and Altra Max Arm processors with up to 128 cores. The new Noctua Ampere Altra CPU coolers are based on the proven models for Intel Xeon and AMD Threadripper or EPYC platforms. The Noctua Ampere Altra family of CPU coolers use Noctua's SecuFirm2 mounting system for LGA4926 socket and come with pre-applied NT-H2 thermal paste. According to Noctua, these provide exceptional performance and whisper-quiet operation which are ideal for Arm based workstations in noise-sensitive environments. The Ampere Altra lineup should be already available over at Newegg. In addition, Nocuta has unveiled its new prototype of NVIDIA GH200 Grace Hopper superchip cooler, which integrates two custom NH-U12A heatsinks in order to cool both the Grace CPU and Hopper GPU. It supports up to 1,000 W of heat emissions, and aimed at noise-sensitive environments like local HPC applications and self-hosted open source LLMs. The NVIDIA GH200 cooler is expected in Q4 this year and offered to clients on pre-order basis.

TOP500: Frontier Keeps Top Spot, Aurora Officially Becomes the Second Exascale Machine

The 63rd edition of the TOP500 reveals that Frontier has once again claimed the top spot, despite no longer being the only exascale machine on the list. Additionally, a new system has found its way into the Top 10.

The Frontier system at Oak Ridge National Laboratory in Tennessee, USA remains the most powerful system on the list with an HPL score of 1.206 EFlop/s. The system has a total of 8,699,904 combined CPU and GPU cores, an HPE Cray EX architecture that combines 3rd Gen AMD EPYC CPUs optimized for HPC and AI with AMD Instinct MI250X accelerators, and it relies on Cray's Slingshot 11 network for data transfer. On top of that, this machine has an impressive power efficiency rating of 52.93 GFlops/Watt - putting Frontier at the No. 13 spot on the GREEN500.

NVIDIA Blackwell Platform Pushes the Boundaries of Scientific Computing

Quantum computing. Drug discovery. Fusion energy. Scientific computing and physics-based simulations are poised to make giant steps across domains that benefit humanity as advances in accelerated computing and AI drive the world's next big breakthroughs. NVIDIA unveiled at GTC in March the NVIDIA Blackwell platform, which promises generative AI on trillion-parameter large language models (LLMs) at up to 25x less cost and energy consumption than the NVIDIA Hopper architecture.

Blackwell has powerful implications for AI workloads, and its technology capabilities can also help to deliver discoveries across all types of scientific computing applications, including traditional numerical simulation. By reducing energy costs, accelerated computing and AI drive sustainable computing. Many scientific computing applications already benefit. Weather can be simulated at 200x lower cost and with 300x less energy, while digital twin simulations have 65x lower cost and 58x less energy consumption versus traditional CPU-based systems and others.

NVIDIA Grace Hopper Ignites New Era of AI Supercomputing

Driving a fundamental shift in the high-performance computing industry toward AI-powered systems, NVIDIA today announced nine new supercomputers worldwide are using NVIDIA Grace Hopper Superchips to speed scientific research and discovery. Combined, the systems deliver 200 exaflops, or 200 quintillion calculations per second, of energy-efficient AI processing power.

New Grace Hopper-based supercomputers coming online include EXA1-HE, in France, from CEA and Eviden; Helios at Academic Computer Centre Cyfronet, in Poland, from Hewlett Packard Enterprise (HPE); Alps at the Swiss National Supercomputing Centre, from HPE; JUPITER at the Jülich Supercomputing Centre, in Germany; DeltaAI at the National Center for Supercomputing Applications at the University of Illinois Urbana-Champaign; and Miyabi at Japan's Joint Center for Advanced High Performance Computing - established between the Center for Computational Sciences at the University of Tsukuba and the Information Technology Center at the University of Tokyo.

NVIDIA Hopper Leaps Ahead in Generative AI at MLPerf

It's official: NVIDIA delivered the world's fastest platform in industry-standard tests for inference on generative AI. In the latest MLPerf benchmarks, NVIDIA TensorRT-LLM—software that speeds and simplifies the complex job of inference on large language models—boosted the performance of NVIDIA Hopper architecture GPUs on the GPT-J LLM nearly 3x over their results just six months ago. The dramatic speedup demonstrates the power of NVIDIA's full-stack platform of chips, systems and software to handle the demanding requirements of running generative AI. Leading companies are using TensorRT-LLM to optimize their models. And NVIDIA NIM—a set of inference microservices that includes inferencing engines like TensorRT-LLM—makes it easier than ever for businesses to deploy NVIDIA's inference platform.

Raising the Bar in Generative AI
TensorRT-LLM running on NVIDIA H200 Tensor Core GPUs—the latest, memory-enhanced Hopper GPUs—delivered the fastest performance running inference in MLPerf's biggest test of generative AI to date. The new benchmark uses the largest version of Llama 2, a state-of-the-art large language model packing 70 billion parameters. The model is more than 10x larger than the GPT-J LLM first used in the September benchmarks. The memory-enhanced H200 GPUs, in their MLPerf debut, used TensorRT-LLM to produce up to 31,000 tokens/second, a record on MLPerf's Llama 2 benchmark. The H200 GPU results include up to 14% gains from a custom thermal solution. It's one example of innovations beyond standard air cooling that systems builders are applying to their NVIDIA MGX designs to take the performance of Hopper GPUs to new heights.

Chinese Research Institute Utilizing "Banned" NVIDIA H100 AI GPUs

NVIDIA's freshly unveiled "Blackwell" B200 and GB200 AI GPUs will be getting plenty of coverage this year, but many organizations will be sticking with current or prior generation hardware. Team Green is in the process of shipping out compromised "Hopper" designs to customers in China, but the region's appetite for powerful AI-crunching hardware is growing. Last year's China-specific H800 design, and the older "Ampere" A800 chip were deemed too potent—new regulations prevented further sales. Recently, AMD's Instinct MI309 AI accelerator was considered "too powerful to gain unconditional approval from the US Department of Commerce." Natively-developed solutions are catching up with Western designs, but some institutions are not prepared to queue up for emerging technologies.

NVIDIA's new H20 AI GPU as well as Ada Lovelace-based L20 PCIe and L2 PCIe models are weakened enough to get a thumbs up from trade regulators, but likely not compelling enough for discerning clients. The Telegraph believes that NVIDIA's uncompromised H100 AI GPU is currently in use at several Chinese establishments—the report cites information presented within four academic papers published on ArXiv, an open access science website. The Telegraph's news piece highlights one of the studies—it was: "co-authored by a researcher at 4paradigm, an AI company that was last year placed on an export control list by the US Commerce Department for attempting to acquire US technology to support China's military." Additionally, the Chinese Academy of Sciences appears to have conducted several AI-accelerated experiments, involving the solving of complex mathematical and logical problems. The article suggests that this research organization has acquired a very small batch of NVIDIA H100 GPUs (up to eight units). A "thriving black market" for high-end NVIDIA processors has emerged in the region—last Autumn, the Center for a New American Security (CNAS) published an in-depth article about ongoing smuggling activities.

Unwrapping the NVIDIA B200 and GB200 AI GPU Announcements

NVIDIA on Monday, at the 2024 GTC conference, unveiled the "Blackwell" B200 and GB200 AI GPUs. These are designed to offer an incredible 5X the AI inferencing performance gain over the current-gen "Hopper" H100, and come with four times the on-package memory. The B200 "Blackwell" is the largest chip physically possible using existing foundry tech, according to its makers. The chip is an astonishing 208 billion transistors, and is made up of two chiplets, which by themselves are the largest possible chips.

Each chiplet is built on the TSMC N4P foundry node, which is the most advanced 4 nm-class node by the Taiwanese foundry. Each chiplet has 104 billion transistors. The two chiplets have a high degree of connectivity with each other, thanks to a 10 TB/s custom interconnect. This is enough bandwidth and latency for the two to maintain cache coherency (i.e. address each other's memory as if they're their own). Each of the two "Blackwell" chiplets has a 4096-bit memory bus, and is wired to 96 GB of HBM3E spread across four 24 GB stacks; which totals to 192 GB for the B200 package. The GPU has a staggering 8 TB/s of memory bandwidth on tap. The B200 package features a 1.8 TB/s NVLink interface for host connectivity, and connectivity to another B200 chip.

NVIDIA Blackwell Platform Arrives to Power a New Era of Computing

Powering a new era of computing, NVIDIA today announced that the NVIDIA Blackwell platform has arrived—enabling organizations everywhere to build and run real-time generative AI on trillion-parameter large language models at up to 25x less cost and energy consumption than its predecessor.

The Blackwell GPU architecture features six transformative technologies for accelerated computing, which will help unlock breakthroughs in data processing, engineering simulation, electronic design automation, computer-aided drug design, quantum computing and generative AI—all emerging industry opportunities for NVIDIA.

NVIDIA's Selection of Micron HBM3E Supposedly Surprises Competing Memory Makers

SK Hynix believes that it leads the industry with the development and production of High Bandwidth Memory (HBM) solutions, but rival memory manufacturers are working hard on equivalent fifth generation packages. NVIDIA was expected to select SK Hynix as the main supplier of HBM3E parts for utilization on H200 "Hopper" AI GPUs, but a surprise announcement was issued by Micron's press team last month. The American firm revealed that HBM3E volume production had commenced: ""(our) 24 GB 8H HBM3E will be part of NVIDIA H200 Tensor Core GPUs, which will begin shipping in the second calendar quarter of 2024. This milestone positions Micron at the forefront of the industry, empowering artificial intelligence (AI) solutions with HBM3E's industry-leading performance and energy efficiency."

According to a Korea JoongAng Daily report, this boast has reportedly "shocked" the likes of SK Hynix and Samsung Electronics. They believe that Micron's: "announcement was a revolt from an underdog, as the US company barely held 10 percent of the global market last year." The article also points out some behind-the-scenes legal wrangling: "the cutthroat competition became more evident when the Seoul court sided with SK Hynix on Thursday (March 7) by granting a non-compete injunction to prevent its former researcher, who specialized in HBM, from working at Micron. He would be fined 10 million won for each day in violation." SK Hynix is likely pinning its next-gen AI GPU hopes on a 12-layer DRAM stacked HBM3E product—industry insiders posit that evaluation samples were submitted to NVIDIA last month. The outlook for these units is said to be very positive—mass production could start as early as this month.

Intel Gaudi2 Accelerator Beats NVIDIA H100 at Stable Diffusion 3 by 55%

Stability AI, the developers behind the popular Stable Diffusion generative AI model, have run some first-party performance benchmarks for Stable Diffusion 3 using popular data-center AI GPUs, including the NVIDIA H100 "Hopper" 80 GB, A100 "Ampere" 80 GB, and Intel's Gaudi2 96 GB accelerator. Unlike the H100, which is a super-scalar CUDA+Tensor core GPU; the Gaudi2 is purpose-built to accelerate generative AI and LLMs. Stability AI published its performance findings in a blog post, which reveals that the Intel Gaudi2 96 GB is posting a roughly 56% higher performance than the H100 80 GB.

With 2 nodes, 16 accelerators, and a constant batch size of 16 per accelerator (256 in all), the Intel Gaudi2 array is able to generate 927 images per second, compared to 595 images for the H100 array, and 381 images per second for the A100 array, keeping accelerator and node counts constant. Scaling things up a notch to 32 nodes, and 256 accelerators or a batch size of 16 per accelerator (total batch size of 4,096), the Gaudi2 array is posting 12,654 images per second; or 49.4 images per-second per-device; compared to 3,992 images per second or 15.6 images per-second per-device for the older-gen A100 "Ampere" array.
Return to Keyword Browsing
Jan 18th, 2025 19:34 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts