News Posts matching #B200

Return to Keyword Browsing

Supermicro Adds Portfolio for Next Wave of AI with NVIDIA Blackwell Ultra Solutions

Supermicro, Inc., a Total IT Solution Provider for AI, Cloud, Storage, and 5G/Edge, is announcing new systems and rack solutions powered by the NVIDIA's Blackwell Ultra platform, featuring the NVIDIA HGX B300 NVL16 and NVIDIA GB300 NVL72 platforms. Supermicro and NVIDIA's new AI solutions strengthen leadership in AI by delivering breakthrough performance for the most compute-intensive AI workloads, including AI reasoning, agentic AI, and video inference applications.

"At Supermicro, we are excited to continue our long-standing partnership with NVIDIA to bring the latest AI technology to market with the NVIDIA Blackwell Ultra Platforms," said Charles Liang, president and CEO, Supermicro. "Our Data Center Building Block Solutions approach has streamlined the development of new air and liquid-cooled systems, optimized to the thermals and internal topology of the NVIDIA HGX B300 NVL16 and GB300 NVL72. Our advanced liquid-cooling solution delivers exceptional thermal efficiency, operating with 40℃ warm water in our 8-node rack configuration, or 35℃ warm water in double-density 16-node rack configuration, leveraging our latest CDUs. This innovative solution reduces power consumption by up to 40% while conserving water resources, providing both environmental and operational cost benefits for enterprise data centers."

Dell Technologies Accelerates Enterprise AI Innovation from PC to Data Center with NVIDIA 

Marking one year since the launch of the Dell AI Factory with NVIDIA, Dell Technologies (NYSE: DELL) announces new AI PCs, infrastructure, software and services advancements to accelerate enterprise AI innovation at any scale. Successful AI deployments are vital for enterprises to remain competitive, but challenges like system integration and skill gaps can delay the value enterprises realize from AI. More than 75% of organizations want their infrastructure providers to deliver capabilities across all aspects of the AI adoption journey, driving customer demand for simplified AI deployments that can scale.

As the top provider of AI centric infrastructure, Dell Technologies - in collaboration with NVIDIA - provides a consistent experience across AI infrastructure, software and services, offering customers a one-stop shop to scale AI initiatives from deskside to large-scale data center deployments.

NVIDIA to Build Accelerated Quantum Computing Research Center

NVIDIA today announced it is building a Boston-based research center to provide cutting-edge technologies to advance quantum computing. The NVIDIA Accelerated Quantum Research Center, or NVAQC, will integrate leading quantum hardware with AI supercomputers, enabling what is known as accelerated quantum supercomputing. The NVAQC will help solve quantum computing's most challenging problems, ranging from qubit noise to transforming experimental quantum processors into practical devices.

Leading quantum computing innovators, including Quantinuum, Quantum Machines and QuEra Computing, will tap into the NVAQC to drive advancements through collaborations with researchers from leading universities, such as the Harvard Quantum Initiative in Science and Engineering (HQI) and the Engineering Quantum Systems (EQuS) group at the Massachusetts Institute of Technology (MIT).

Micron Innovates From the Data Center to the Edge With NVIDIA

Secular growth of AI is built on the foundation of high-performance, high-bandwidth memory solutions. These high-performing memory solutions are critical to unlock the capabilities of GPUs and processors. Micron Technology, Inc., today announced it is the world's first and only memory company shipping both HBM3E and SOCAMM (small outline compression attached memory module) products for AI servers in the data center. This extends Micron's industry leadership in designing and delivering low-power DDR (LPDDR) for data center applications.

Micron's SOCAMM, a modular LPDDR5X memory solution, was developed in collaboration with NVIDIA to support the NVIDIA GB300 Grace Blackwell Ultra Superchip. The Micron HBM3E 12H 36 GB is also designed into the NVIDIA HGX B300 NVL16 and GB300 NVL72 platforms, while the HBM3E 8H 24 GB is available for the NVIDIA HGX B200 and GB200 NVL72 platforms. The deployment of Micron HBM3E products in NVIDIA Hopper and NVIDIA Blackwell systems underscores Micron's critical role in accelerating AI workloads.

NVIDIA Accelerates Science and Engineering With CUDA-X Libraries Powered by GH200 and GB200 Superchips

Scientists and engineers of all kinds are equipped to solve tough problems a lot faster with NVIDIA CUDA-X libraries powered by NVIDIA GB200 and GH200 superchips. Announced today at the NVIDIA GTC global AI conference, developers can now take advantage of tighter automatic integration and coordination between CPU and GPU resources - enabled by CUDA-X working with these latest superchip architectures - resulting in up to 11x speedups for computational engineering tools and 5x larger calculations compared with using traditional accelerated computing architectures.

This greatly accelerates and improves workflows in engineering simulation, design optimization and more, helping scientists and researchers reach groundbreaking results faster. NVIDIA released CUDA in 2006, opening up a world of applications to the power of accelerated computing. Since then, NVIDIA has built more than 900 domain-specific NVIDIA CUDA-X libraries and AI models, making it easier to adopt accelerated computing and driving incredible scientific breakthroughs. Now, CUDA-X brings accelerated computing to a broad new set of engineering disciplines, including astronomy, particle physics, quantum physics, automotive, aerospace and semiconductor design.

Global Top 10 IC Design Houses See 49% YoY Growth in 2024, NVIDIA Commands Half the Market

TrendForce reveals that the combined revenue of the world's top 10 IC design houses reached approximately US$249.8 billion in 2024, marking a 49% YoY increase. The booming AI industry has fueled growth across the semiconductor sector, with NVIDIA leading the charge, posting an astonishing 125% revenue growth, widening its lead over competitors, and solidifying its dominance in the IC industry.

Looking ahead to 2025, advancements in semiconductor manufacturing will further enhance AI computing power, with LLMs continuing to emerge. Open-source models like DeepSeek could lower AI adoption costs, accelerating AI penetration from servers to personal devices. This shift positions edge AI devices as the next major growth driver for the semiconductor industry.

NVIDIA Confirms: "Blackwell Ultra" Coming This Year, "Vera Rubin" in 2026

During its latest FY2024 earnings call, NVIDIA's CEO Jensen Huang gave a few predictions about future products. The upcoming Blackwell B300 series, codenamed "Blackwell Ultra," is scheduled for release in the second half of 2025. It will feature significant performance enhancements over the B200 series. These GPUs will incorporate eight stacks of 12-Hi HBM3E memory, providing up to 288 GB of onboard memory, paired with the Mellanox Spectrum Ultra X800 Ethernet switch, which offers 512 ports. Earlier rumors suggested that this is a 1,400 W TBP chip, meaing that NVIDIA is packing a lot of compute in there. There is a potential 50% performance increase compared to current-generation products. However, NVIDIA has not officially confirmed these figures, but rough estimates of core count and memory bandwidth increase can make it happen.

Looking beyond Blackwell, NVIDIA is preparing to unveil its next-generation "Rubin" architecture, which promises to deliver what Huang described as a "big, big, huge step up" in AI compute capabilities. The Rubin platform, targeted for 2026, will integrate eight stacks of HBM4(E) memory, "Vera" CPUs, NVLink 6 switches delivering 3600 GB/s bandwidth, CX9 network cards supporting 1600 Gb/s, and X1600 switches—creating a comprehensive ecosystem for advanced AI workloads. More surprisingly, Huang indicated that NVIDIA will discuss post-Rubin developments at the upcoming GPU Technology Conference in March. This could include details on Rubin Ultra, projected for 2027, which may incorporate 12 stacks of HBM4E using 5.5-reticle-size CoWoS interposers and 100 mm × 100 mm TSMC substrates, representing another significant architectural leap forward in the company's accelerating AI infrastructure roadmap. While these may seem distant, NVIDIA is battling supply chain constraints to deliver these GPUs to its customers due to the massive demand for its solutions.

GIGABYTE Showcases Comprehensive AI Computing Portfolio at MWC 2025

GIGABYTE, a global leader in computing innovation and technology, will showcase its full-spectrum AI computing solutions that bridge development to deployment at MWC 2025, taking place from March 3-6.

"AI+" and "Enterprise-Reinvented" are two of the themes for MWC. As enterprises accelerate their digital transformation and intelligent upgrades, the transition of AI applications from experimental development to democratized commercial deployment has become a critical turning point in the industry. Continuing its "ACCEVOLUTION" initiative, GIGABYTE provides the comprehensive infrastructure products and solutions spanning cloud-based supercomputing centers to edge computing terminals, aiming to accelerate the next evolution and empower industries to scale AI applications efficiently.

HPE Announces First Shipment of NVIDIA "Grace Blackwell" System

Hewlett Packard Enterprise announced today that it has shipped its first NVIDIA Blackwell family-based solution, the NVIDIA GB200 NVL72. This rack-scale system by HPE is designed to help service providers and large enterprises quickly deploy very large, complex AI clusters with advanced, direct liquid cooling solutions to optimize efficiency and performance. "AI service providers and large enterprise model builders are under tremendous pressure to offer scalability, extreme performance, and fast time-to-deployment," said Trish Damkroger, senior vice president and general manager of HPC & AI Infrastructure Solutions, HPE. "As builders of the world's top three fastest systems with direct liquid cooling, HPE offers customers lower cost per token training and best-in-class performance with industry-leading services expertise."

The NVIDIA GB200 NVL72 features shared-memory, low-latency architecture with the latest GPU technology designed for extremely large AI models of over a trillion parameters, in one memory space. GB200 NVL72 offers seamless integration of NVIDIA CPUs, GPUs, compute and switch trays, networking, and software, bringing together extreme performance to address heavily parallelizable workloads, like generative AI (GenAI) model training and inferencing, along with NVIDIA software applications. "Engineers, scientists and researchers need cutting-edge liquid cooling technology to keep up with increasing power and compute requirements," said Bob Pette, vice president of enterprise platforms at NVIDIA. "Building on continued collaboration between HPE and NVIDIA, HPE's first shipment of NVIDIA GB200 NVL72 will help service providers and large enterprises efficiently build, deploy and scale large AI clusters."

CoreWeave Launches Debut Wave of NVIDIA GB200 NVL72-based Cloud Instances

AI reasoning models and agents are set to transform industries, but delivering their full potential at scale requires massive compute and optimized software. The "reasoning" process involves multiple models, generating many additional tokens, and demands infrastructure with a combination of high-speed communication, memory and compute to ensure real-time, high-quality results. To meet this demand, CoreWeave has launched NVIDIA GB200 NVL72-based instances, becoming the first cloud service provider to make the NVIDIA Blackwell platform generally available. With rack-scale NVIDIA NVLink across 72 NVIDIA Blackwell GPUs and 36 NVIDIA Grace CPUs, scaling to up to 110,000 GPUs with NVIDIA Quantum-2 InfiniBand networking, these instances provide the scale and performance needed to build and deploy the next generation of AI reasoning models and agents.

NVIDIA GB200 NVL72 on CoreWeave
NVIDIA GB200 NVL72 is a liquid-cooled, rack-scale solution with a 72-GPU NVLink domain, which enables the six dozen GPUs to act as a single massive GPU. NVIDIA Blackwell features many technological breakthroughs that accelerate inference token generation, boosting performance while reducing service costs. For example, fifth-generation NVLink enables 130 TB/s of GPU bandwidth in one 72-GPU NVLink domain, and the second-generation Transformer Engine enables FP4 for faster AI performance while maintaining high accuracy. CoreWeave's portfolio of managed cloud services is purpose-built for Blackwell. CoreWeave Kubernetes Service optimizes workload orchestration by exposing NVLink domain IDs, ensuring efficient scheduling within the same rack. Slurm on Kubernetes (SUNK) supports the topology block plug-in, enabling intelligent workload distribution across GB200 NVL72 racks. In addition, CoreWeave's Observability Platform provides real-time insights into NVLink performance, GPU utilization and temperatures.

Supermicro Ramps Full Production of NVIDIA Blackwell Rack-Scale Solutions With NVIDIA HGX B200

Supermicro, Inc., a Total IT Solution Provider for AI/ML, HPC, Cloud, Storage, and 5G/Edge, is announcing full production availability of its end-to-end AI data center Building Block Solutions accelerated by the NVIDIA Blackwell platform. The Supermicro Building Block portfolio provides the core infrastructure elements necessary to scale Blackwell solutions with exceptional time to deployment. The portfolio includes a broad range of air-cooled and liquid-cooled systems with multiple CPU options. These include superior thermal design supporting traditional air cooling, liquid-to-liquid (L2L) and liquid-to-air (L2A) cooling. In addition, a full data center management software suite, rack-level integration, including full network switching and cabling and cluster-level L12 solution validation can be delivered as turn-key offering with global delivery, professional support, and service.

"In this transformative moment of AI, where scaling laws are pushing the limits of data center capabilities, our latest NVIDIA Blackwell-powered solutions, developed through close collaboration with NVIDIA, deliver outstanding computational power," said Charles Liang, president and CEO of Supermicro. "Supermicro's NVIDIA Blackwell GPU offerings in plug-and-play scalable units with advanced liquid cooling and air cooling are empowering customers to deploy an infrastructure that supports increasingly complex AI workloads while maintaining exceptional efficiency. This reinforces our commitment to providing sustainable, cutting-edge solutions that accelerate AI innovation."

ASUS AI POD With NVIDIA GB200 NVL72 Platform Ready to Ramp-Up Production for Scheduled Shipment in March

ASUS is proud to announce that ASUS AI POD, featuring the NVIDIA GB200 NVL72 platform, is ready to ramp-up production for a scheduled shipping date of March 2025. ASUS remains dedicated to providing comprehensive end-to-end solutions and software services, encompassing everything from AI supercomputing to cloud services. With a strong focus on fostering AI adoption across industries, ASUS is positioned to empower clients in accelerating their time to market by offering a full spectrum of solutions.

Proof of concept, funded by ASUS
Honoring the commitment to delivering exceptional value to clients, ASUS is set to launch a proof of concept (POC) for the groundbreaking ASUS AI POD, powered by the NVIDIA Blackwell platform. This exclusive opportunity is now open to a select group of innovators who are eager to harness the full potential of AI computing. Innovators and enterprises can experience firsthand the full potential of AI and deep learning solutions at exceptional scale. To take advantage of this limited-time offer, please complete this surveyi at: forms.office.com/r/FrAbm5BfH2. The expert ASUS team of NVIDIA GB200 specialists will guide users through the next steps.

NVIDIA Revises "Blackwell" Architecture Production Roadmap for More Complex Packaging

According to a well-known industry analyst, Ming-Chi Kuo, NVIDIA has restructured its "Blackwell" architecture roadmap, emphasizing dual-die designs using CoWoS-L packaging technology. The new roadmap eliminates several single-die products that would have used CoWoS-S packaging, changing NVIDIA's manufacturing strategy. The 200 Series will exclusively use dual-die designs with CoWoS-L packaging, featuring the GB200 NVL72 and HGX B200 systems. Notably absent is the previously expected B200A single-die variant. The 300 Series will include both dual-die and single-die options, though NVIDIA and cloud providers are prioritizing the GB200 NVL72 dual-die system. Starting Q1 2025, NVIDIA will reduce H series production, which uses CoWoS-S packaging, while ramping up 200 Series production. This transition indicates significantly decreased demand for CoWoS-S capacity through 2025.

While B300 systems using single-die CoWoS-S are planned for 2026 mass production, the current focus remains on dual-die CoWoS-L products. From TSMC's perspective, the transition between Blackwell generations requires minimal process adjustments, as both use similar front-end-of-line processes with only back-end-of-line modifications needed. Supply chain partners heavily dependent on CoWoS-S production face significant impact, reflected in recent stock price corrections. However, NVIDIA maintains this change reflects product strategy evolution rather than market demand weakness. TSMC continues expanding CoWoS-R capacity while slowing CoWoS-S expansion, viewing AI and high-performance computing as sustained growth drivers despite these packaging technology transitions.

Gigabyte Expands Its Accelerated Computing Portfolio with New Servers Using the NVIDIA HGX B200 Platform

Giga Computing, a subsidiary of GIGABYTE and an industry leader in generative AI servers and advanced cooling technologies, announced new GIGABYTE G893 series servers using the NVIDIA HGX B200 platform. The launch of these flagship 8U air-cooled servers, the G893-SD1-AAX5 and G893-ZD1-AAX5, signifies a new architecture and platform change for GIGABYTE in the demanding world of high-performance computing and AI, setting new standards for speed, scalability, and versatility.

These servers join GIGABYTE's accelerated computing portfolio alongside the NVIDIA GB200 NVL72 platform, which is a rack-scale design that connects 36 NVIDIA Grace CPUs and 72 NVIDIA Blackwell GPUs. At CES 2025 (January 7-10), the GIGABYTE booth will display the NVIDIA GB200 NVL72, and attendees can engage in discussions about the benefits of GIGABYTE platforms with the NVIDIA Blackwell architecture.

NVIDIA GB300 "Blackwell Ultra" Will Feature 288 GB HBM3E Memory, 1400 W TDP

NVIDIA "Blackwell" series is barely out with B100, B200, and GB200 chips shipping to OEMs and hyperscalers, but the company is already setting in its upgraded "Blackwell Ultra" plans with its upcoming GB300 AI server. According to UDN, the next generation NVIDIA system will be powered by the B300 GPU chip, operating at 1400 W and delivering a remarkable 1.5x improvement in FP4 performance per card compared to its B200 predecessor. One of the most notable upgrades is the memory configuration, with each GPU now sporting 288 GB of HBM3e memory, a substantial increase from the previous 192 GB of GB200. The new design implements a 12-layer stack architecture, advancing from the GB200's 8-layer configuration. The system's cooling infrastructure has been completely reimagined, incorporating advanced water cooling plates and enhanced quick disconnects in the liquid cooling system.

Networking capabilities have also seen a substantial upgrade, with the implementation of ConnectX 8 network cards replacing the previous ConnectX 7 generation, while optical modules have been upgraded from 800G to 1.6T, ensuring faster data transmission. Regarding power management and reliability, the GB300 NVL72 cabinet will standardize capacitor tray implementation, with an optional Battery Backup Unit (BBU) system. Each BBU module costs approximately $300 to manufacture, with a complete GB300 system's BBU configuration totaling around $1,500. The system's supercapacitor requirements are equally substantial, with each NVL72 rack requiring over 300 units, priced between $20-25 per unit during production due to its high-power nature. The GB300, carrying Grace CPU and Blackwell Ultra GPU, also introduces the implementation of LPCAMM on its computing boards, indicating that the LPCAMM memory standard is about to take over servers, not just laptops and desktops. We have to wait for the official launch before seeing LPCAMM memory configurations.

GIGABYTE Sets the Benchmark for HPC at CES 2025

Renowned for its groundbreaking innovations in computing technology, GIGABYTE Technology continues to spearhead advancements that redefine the industry. From January 7 to 10, 2025, GIGABYTE will participate in CES, showcasing its latest breakthroughs in high-performance computing and personal computing solutions. Under the theme "ACCEVOLUTION," GIGABYTE's booth will present a diverse range of cutting-edge technologies. Highlights include:
  • Rack-scale cluster computing solution for large-scale data centers
  • AI servers supporting next-generation "superchips"
  • Advanced cooling technologies that optimize energy efficiency and performance
  • Workstations tailored for smaller-scale workloads
  • Mini-PCs, embedded systems, and in-vehicle platforms designed for endpoint deployment and commercial applications

NVIDIA and Microsoft Showcase Blackwell Preview, Omniverse Industrial AI and RTX AI PCs at Microsoft Ignite

NVIDIA and Microsoft today unveiled product integrations designed to advance full-stack NVIDIA AI development on Microsoft platforms and applications. At Microsoft Ignite, Microsoft announced the launch of the first cloud private preview of the Azure ND GB200 V6 VM series, based on the NVIDIA Blackwell platform. The Azure ND GB200 v6 will be a new AI-optimized virtual machine (VM) series and combines the NVIDIA GB200 NVL72 rack design with NVIDIA Quantum InfiniBand networking.

In addition, Microsoft revealed that Azure Container Apps now supports NVIDIA GPUs, enabling simplified and scalable AI deployment. Plus, the NVIDIA AI platform on Azure includes new reference workflows for industrial AI and an NVIDIA Omniverse Blueprint for creating immersive, AI-powered visuals. At Ignite, NVIDIA also announced multimodal small language models (SLMs) for RTX AI PCs and workstations, enhancing digital human interactions and virtual assistants with greater realism.

ASRock Rack Brings End-to-End AI and HPC Server Portfolio to SC24

ASRock Rack Inc., a leading innovative server company, today announces its presence at SC24, held at the Georgia World Congress Center in Atlanta from November 18-21. At booth #3609, ASRock Rack will showcase a comprehensive high-performance portfolio of server boards, systems, and rack solutions with NVIDIA accelerated computing platforms, helping address the needs of enterprises, organizations, and data centers.

Artificial intelligence (AI) and high-performance computing (HPC) continue to reshape technology. ASRock Rack is presenting a complete suite of solutions spanning edge, on-premise, and cloud environments, engineered to meet the demand of AI and HPC. The 2U short-depth MECAI, incorporating the NVIDIA GH200 Grace Hopper Superchip, is developed to supercharge accelerated computing and generative AI in space-constrained environments. The 4U10G-TURIN2 and 4UXGM-GNR2, supporting ten and eight NVIDIA H200 NVL PCIe GPUs respectively, are aiming to help enterprises and researchers tackle every AI and HPC challenge with enhanced performance and greater energy efficiency. NVIDIA H200 NVL is ideal for lower-power, air-cooled enterprise rack designs that require flexible configurations, delivering acceleration for AI and HPC workloads regardless of size.

CoolIT Announces the World's Highest Density Liquid-to-Liquid Coolant Distribution Unit

CoolIT Systems (CoolIT), the world leader in liquid cooling systems for AI and high-performance computing, introduces the CHx1000, the world's highest-density liquid-to-liquid coolant distribution unit (CDU). Designed for mission-critical applications, the CHx1000 is purpose-built to cool the NVIDIA Blackwell platform and other demanding AI workloads where liquid cooling is now necessary.

"CoolIT created the CHx1000 to provide the high capacity and pressure delivery required to direct liquid cool NVIDIA Blackwell and future generations of high-performance AI accelerators," said Patrick McGinn, CoolIT's COO. "Besides exceptional performance, serviceability and reliability are central to the CHx1000's design. The single rack-sized unit is fully front and back serviceable with hot-swappable critical components. Precision coolant controls and multiple levels of redundancy provide for steady, uninterrupted operation."

NVIDIA "Blackwell" NVL72 Servers Reportedly Require Redesign Amid Overheating Problems

According to The Information, NVIDIA's latest "Blackwell" processors are reportedly encountering significant thermal management issues in high-density server configurations, potentially affecting deployment timelines for major tech companies. The challenges emerge specifically in NVL72 GB200 racks housing 72 GB200 processors, which can consume up to 120 kilowatts of power per rack, weighting a "mere" 3,000 pounds (or about 1.5 tons). These thermal concerns have prompted NVIDIA to revisit and modify its server rack designs multiple times to prevent performance degradation and potential hardware damage. Hyperscalers like Google, Meta, and Microsoft, who rely heavily on NVIDIA GPUs for training their advanced language models, have allegedly expressed concerns about possible delays in their data center deployment schedules.

The thermal management issues follow earlier setbacks related to a design flaw in the Blackwell production process. The problem stemmed from the complex CoWoS-L packaging technology, which connects dual chiplets using RDL interposer and LSI bridges. Thermal expansion mismatches between various components led to warping issues, requiring modifications to the GPU's metal layers and bump structures. A company spokesperson characterized these modifications as part of the standard development process, noting that a new photomask resolved this issue. The Information states that mass production of the revised Blackwell GPUs began in late October, with shipments expected to commence in late January. However, these timelines are unconfirmed by NVIDIA, and some server makers like Dell confirmed that these GB200 NVL72 liquid-cooled systems are shipping now, not in January, with CoreWave GPU cloud provider as a customer. The original report could be using older information, as Dell is one of NVIDIA's most significant partners and among the first in the supply chain to gain access to new GPU batches.

NVIDIA B200 "Blackwell" Records 2.2x Performance Improvement Over its "Hopper" Predecessor

We know that NVIDIA's latest "Blackwell" GPUs are fast, but how much faster are they over the previous generation "Hopper"? Thanks to the latest MLPerf Training v4.1 results, NVIDIA's HGX B200 Blackwell platform has demonstrated massive performance gains, measuring up to 2.2x improvement per GPU compared to its HGX H200 Hopper. The latest results, verified by MLCommons, reveal impressive achievements in large language model (LLM) training. The Blackwell architecture, featuring HBM3e high-bandwidth memory and fifth-generation NVLink interconnect technology, achieved double the performance per GPU for GPT-3 pre-training and a 2.2x boost for Llama 2 70B fine-tuning compared to the previous Hopper generation. Each benchmark system incorporated eight Blackwell GPUs operating at a 1,000 W TDP, connected via NVLink Switch for scale-up.

The network infrastructure utilized NVIDIA ConnectX-7 SuperNICs and Quantum-2 InfiniBand switches, enabling high-speed node-to-node communication for distributed training workloads. While previous Hopper-based systems required 256 GPUs to optimize performance for the GPT-3 175B benchmark, Blackwell accomplished the same task with just 64 GPUs, leveraging its larger HBM3e memory capacity and bandwidth. One thing to look out for is the upcoming GB200 NVL72 system, which promises even more significant gains past the 2.2x. It features expanded NVLink domains, higher memory bandwidth, and tight integration with NVIDIA Grace CPUs, complemented by ConnectX-8 SuperNIC and Quantum-X800 switch technologies. With faster switching and better data movement with Grace-Blackwell integration, we could see even more software optimization from NVIDIA to push the performance envelope.

Google Shows Production NVIDIA "Blackwell" GB200 NVL System for Cloud

Last week, we got a preview of Microsoft's Azure production-ready NVIDIA "Blackwell" GB200 system, showing that only a third of the rack that goes in the data center is actually holding the compute elements, with the other two-thirds holding the cooling compartment to cool down the immense heat output from tens of GB200 GPUs. Today, Google is showing off a part of its own infrastructure ahead of the Google Cloud App Dev & Infrastructure Summit, taking place on October 30, digitally as an event. Shown below are two racks standing side by side, connecting NVIDIA "Blackwell" GB200 NVL cards with the rest of the Google infrastructure. Unlike Microsoft Azure, Google Cloud uses a different data center design in its facilities.

There is one rack with power distribution units, networking switches, and cooling distribution units, all connected to the compute rack, which houses power supplies, GPUs, and CPU servers. Networking equipment is present, and it connects to Google's "global" data center network, which is Google's own data center fabric. We are not sure what is the fabric connection of choice between these racks; as for optimal performance, NVIDIA recommends InfiniBand (Mellanox acquisition). However, given that Google's infrastructure is set up differently, there may be Ethernet switches present. Interestingly, Google's design of GB200 racks differs from Azure's, as it uses additional rack space to distribute the coolant to its local heat exchangers, i.e., coolers. We are curious to see if Google releases more information on infrastructure, as it has been known as the infrastructure king because of its ability to scale and keep everything organized.

NVIDIA Contributes Blackwell Platform Design to Open Hardware Ecosystem, Accelerating AI Infrastructure Innovation

To drive the development of open, efficient and scalable data center technologies, NVIDIA today announced that it has contributed foundational elements of its NVIDIA Blackwell accelerated computing platform design to the Open Compute Project (OCP) and broadened NVIDIA Spectrum-X support for OCP standards.

At this year's OCP Global Summit, NVIDIA will be sharing key portions of the NVIDIA GB200 NVL72 system electro-mechanical design with the OCP community — including the rack architecture, compute and switch tray mechanicals, liquid-cooling and thermal environment specifications, and NVIDIA NVLink cable cartridge volumetrics — to support higher compute density and networking bandwidth.

NVIDIA "Blackwell" GPUs are Sold Out for 12 Months, Customers Ordering in 100K GPU Quantities

NVIDIA's "Blackwell" series of GPUs, including B100, B200, and GB200, are reportedly sold out for 12 months or an entire year. This directly means that if a new customer is willing to order a new Blackwell GPU now, there is a 12-month waitlist to get that GPU. Analyst from Morgan Stanley Joe Moore confirmed that in a meeting with NVIDIA and its investors, NVIDIA executives confirmed that the demand for "Blackwell" is so great that there is a 12-month backlog to fulfill first before shipping to anyone else. We expect that this includes customers like Amazon, META, Microsoft, Google, Oracle, and others, who are ordering GPUs in insane quantities to keep up with the demand from their customers.

The previous generation of "Hopper" GPUs was ordered in 10s of thousands of GPUs, while this "Blackwell" generation was ordered in 100s of thousands of GPUs simultaneously. For NVIDIA, that is excellent news, as that demand is expected to continue. The only one standing in the way of customers is TSMC, which manufactures these GPUs as fast as possible to meet demand. NVIDIA is one of TSMC's largest customers, so wafer allocation at TSMC's facilities is only expected to grow. We are now officially in the era of the million-GPU data centers, and we can only question at what point this massive growth stops or if it will stop at all in the near future.

NVIDIA Might Consider Major Design Shift for Future 300 GPU Series

NVIDIA is reportedly considering a significant design change for its GPU products, shifting from the current on-board solution to an independent GPU socket design following the GB200 shipment in Q4, according to reports from MoneyDJ and the Economic Daily News quoted by TrendForce. This move is not new in the industry, AMD has already introduced socket design in 2023 with their MI300A series via Supermicro dedicated servers. The B300 series, expected to become NVIDIA's mainstream product in the second half of 2025, is rumored to be the main beneficiary of this design change that could improve yield rates, though it may come with some performance trade-offs.

According to the Economic Daily News, the socket design will simplify after-sales service and server board maintenance, allowing users to replace or upgrade the GPUs quickly. The report also pointed out that based on the slot design, boards will contain up to four NVIDIA GPUs and a CPU, with each GPU having its dedicated slot. This will bring benefits for Taiwanese manufacturers like Foxconn and LOTES, who will supply different components and connectors. The move seems logical since with the current on-board design, once a GPU becomes faulty, the entire motherboard needs to be replaced, leading to significant downtime and high operational and maintenance costs.
Return to Keyword Browsing
Mar 25th, 2025 07:01 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts