Mar 25th, 2025 10:52 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts

News Posts matching #GB200

Return to Keyword Browsing

NVIDIA to Build Accelerated Quantum Computing Research Center

NVIDIA today announced it is building a Boston-based research center to provide cutting-edge technologies to advance quantum computing. The NVIDIA Accelerated Quantum Research Center, or NVAQC, will integrate leading quantum hardware with AI supercomputers, enabling what is known as accelerated quantum supercomputing. The NVAQC will help solve quantum computing's most challenging problems, ranging from qubit noise to transforming experimental quantum processors into practical devices.

Leading quantum computing innovators, including Quantinuum, Quantum Machines and QuEra Computing, will tap into the NVAQC to drive advancements through collaborations with researchers from leading universities, such as the Harvard Quantum Initiative in Science and Engineering (HQI) and the Engineering Quantum Systems (EQuS) group at the Massachusetts Institute of Technology (MIT).

Micron Innovates From the Data Center to the Edge With NVIDIA

Secular growth of AI is built on the foundation of high-performance, high-bandwidth memory solutions. These high-performing memory solutions are critical to unlock the capabilities of GPUs and processors. Micron Technology, Inc., today announced it is the world's first and only memory company shipping both HBM3E and SOCAMM (small outline compression attached memory module) products for AI servers in the data center. This extends Micron's industry leadership in designing and delivering low-power DDR (LPDDR) for data center applications.

Micron's SOCAMM, a modular LPDDR5X memory solution, was developed in collaboration with NVIDIA to support the NVIDIA GB300 Grace Blackwell Ultra Superchip. The Micron HBM3E 12H 36 GB is also designed into the NVIDIA HGX B300 NVL16 and GB300 NVL72 platforms, while the HBM3E 8H 24 GB is available for the NVIDIA HGX B200 and GB200 NVL72 platforms. The deployment of Micron HBM3E products in NVIDIA Hopper and NVIDIA Blackwell systems underscores Micron's critical role in accelerating AI workloads.

NVIDIA Accelerates Science and Engineering With CUDA-X Libraries Powered by GH200 and GB200 Superchips

Scientists and engineers of all kinds are equipped to solve tough problems a lot faster with NVIDIA CUDA-X libraries powered by NVIDIA GB200 and GH200 superchips. Announced today at the NVIDIA GTC global AI conference, developers can now take advantage of tighter automatic integration and coordination between CPU and GPU resources - enabled by CUDA-X working with these latest superchip architectures - resulting in up to 11x speedups for computational engineering tools and 5x larger calculations compared with using traditional accelerated computing architectures.

This greatly accelerates and improves workflows in engineering simulation, design optimization and more, helping scientists and researchers reach groundbreaking results faster. NVIDIA released CUDA in 2006, opening up a world of applications to the power of accelerated computing. Since then, NVIDIA has built more than 900 domain-specific NVIDIA CUDA-X libraries and AI models, making it easier to adopt accelerated computing and driving incredible scientific breakthroughs. Now, CUDA-X brings accelerated computing to a broad new set of engineering disciplines, including astronomy, particle physics, quantum physics, automotive, aerospace and semiconductor design.

Global Top 10 IC Design Houses See 49% YoY Growth in 2024, NVIDIA Commands Half the Market

TrendForce reveals that the combined revenue of the world's top 10 IC design houses reached approximately US$249.8 billion in 2024, marking a 49% YoY increase. The booming AI industry has fueled growth across the semiconductor sector, with NVIDIA leading the charge, posting an astonishing 125% revenue growth, widening its lead over competitors, and solidifying its dominance in the IC industry.

Looking ahead to 2025, advancements in semiconductor manufacturing will further enhance AI computing power, with LLMs continuing to emerge. Open-source models like DeepSeek could lower AI adoption costs, accelerating AI penetration from servers to personal devices. This shift positions edge AI devices as the next major growth driver for the semiconductor industry.

Meta Reportedly Reaches Test Phase with First In-house AI Training Chip

According to a Reuters technology report, Meta's engineering department is engaged in the testing of their "first in-house chip for training artificial intelligence systems." Two inside sources have declared this significant development milestone; involving a small-scale deployment of early samples. The owner of Facebook could ramp up production, upon initial batches passing muster. Despite a recent-ish showcasing of an open-architecture NVIDIA "Blackwell" GB200 system for enterprise, Meta leadership is reported to be pursuing proprietary solutions. Multiple big players—in the field of artificial intelligence—are attempting to breakaway from a total reliance on Team Green. Last month, press outlets concentrated on OpenAI's alleged finalization of an in-house design, with rumored involvement coming from Broadcom and TSMC.

One of the Reuters industry moles believes that Meta has signed up with TSMC—supposedly, the Taiwanese foundry was responsible for the production of test batches. Tom's Hardware reckons that Meta and Broadcom were working together with the tape out of the social media giant's "first AI training accelerator." Development of the company's "Meta Training and Inference Accelerator" (MTIA) series has stretched back a couple of years—according to Reuters, this multi-part project: "had a wobbly start for years, and at one point scrapped a chip at a similar phase of development...Meta last year, started using an MTIA chip to perform inference, or the process involved in running an AI system as users interact with it, for the recommendation systems that determine which content shows up on Facebook and Instagram news feeds." Leadership is reportedly aiming to get custom silicon solutions up and running for AI training by next year. Past examples of MTIA hardware were deployed with open-source RISC-V cores (for inference tasks), but is not clear whether this architecture will form the basis of Meta's latest AI chip design.

Insiders Predict Introduction of NVIDIA "Blackwell Ultra" GB300 AI Series at GTC, with Fully Liquid-cooled Clusters

Supply chain insiders believe that NVIDIA's "Blackwell Ultra" GB300 AI chip design will get a formal introduction at next week's GTC 2025 conference. Jensen Huang's keynote presentation is scheduled—the company's calendar is marked with a very important date: Tuesday, March 18. Team Green's chief has already revealed a couple of Blackwell B300 series details to investors; a recent earnings call touched upon the subject of a second half (of 2025) launch window. Industry moles have put spotlights on the GB300 GPU's alleged energy hungry nature. According to inside tracks, power consumption has "significantly" increased when compared to a slightly older equivalent; NVIDIA's less refined "Blackwell" GB200 design.

A Taiwan Economic Daily news article predicts an upcoming "second cooling revolution," due to reports of "Blackwell Ultra" parts demanding greater heat dissipation solutions. Supply chain leakers have suggested effective countermeasures—in the form of fully liquid-cooled systems: "not only will more water cooling plates be introduced, but the use of water cooling quick connectors will increase four times compared to GB200." The pre-Christmas 2024 news cycle proposed a 1400 W TDP rating. Involved "Taiwanese cooling giants" are expected to pull in tidy sums of money from the supply of optimal heat dissipating gear, with local "water-cooling quick-connector" manufacturers also tipped to benefit greatly. The UDN report pulled quotes from a variety of regional cooling specialists; the consensus being that involved partners are struggling to keep up with demand across GB200 and GB300 product lines.

Finally, Some Good News: GeForce RTX 5090 Supply to Increase in Coming Months

It would be safe to state that the NVIDIA GeForce RTX 5090 and RTX 5080 launch was anything but ideal. Gamers had to deal with whacky NVIDIA marketing material with absurd performance claims, followed by disappointing generational improvement for the RTX 5080, only to be left dealing with abysmal supply leading to obscene shortages and scalper-induced price inflation. However, it does seem like things are about to take a positive turn - NVIDIA is rumored to have ramped up production for its GB202 GPU, which the RTX 5090 is based on, according to a reliable source.

Spotted by VideoCardz, MEGAsizeGPU has claimed that the supply for the GeForce RTX 5090 GPU will soon be "stupidly high", which is absolute music to our ears. In a reply thread, the source further claimed that at least one AIB partner already has "tons of cards", which sure does paint a promising picture for the future. As such, the source expects that the supply will reach customers in about a month, which is to be expected since production has been cranked only recently. Apparently, demand for the GB200 GPU has been lower than usual, forcing NVIDIA to switch to producing GeForce GPUs instead. Of course, the margins for the gaming GPUs are lower, but the production capacity has to go somewhere.

Supermicro Ramps Full Production of NVIDIA Blackwell Rack-Scale Solutions With NVIDIA HGX B200

Supermicro, Inc., a Total IT Solution Provider for AI/ML, HPC, Cloud, Storage, and 5G/Edge, is announcing full production availability of its end-to-end AI data center Building Block Solutions accelerated by the NVIDIA Blackwell platform. The Supermicro Building Block portfolio provides the core infrastructure elements necessary to scale Blackwell solutions with exceptional time to deployment. The portfolio includes a broad range of air-cooled and liquid-cooled systems with multiple CPU options. These include superior thermal design supporting traditional air cooling, liquid-to-liquid (L2L) and liquid-to-air (L2A) cooling. In addition, a full data center management software suite, rack-level integration, including full network switching and cabling and cluster-level L12 solution validation can be delivered as turn-key offering with global delivery, professional support, and service.

"In this transformative moment of AI, where scaling laws are pushing the limits of data center capabilities, our latest NVIDIA Blackwell-powered solutions, developed through close collaboration with NVIDIA, deliver outstanding computational power," said Charles Liang, president and CEO of Supermicro. "Supermicro's NVIDIA Blackwell GPU offerings in plug-and-play scalable units with advanced liquid cooling and air cooling are empowering customers to deploy an infrastructure that supports increasingly complex AI workloads while maintaining exceptional efficiency. This reinforces our commitment to providing sustainable, cutting-edge solutions that accelerate AI innovation."

ASUS AI POD With NVIDIA GB200 NVL72 Platform Ready to Ramp-Up Production for Scheduled Shipment in March

ASUS is proud to announce that ASUS AI POD, featuring the NVIDIA GB200 NVL72 platform, is ready to ramp-up production for a scheduled shipping date of March 2025. ASUS remains dedicated to providing comprehensive end-to-end solutions and software services, encompassing everything from AI supercomputing to cloud services. With a strong focus on fostering AI adoption across industries, ASUS is positioned to empower clients in accelerating their time to market by offering a full spectrum of solutions.

Proof of concept, funded by ASUS
Honoring the commitment to delivering exceptional value to clients, ASUS is set to launch a proof of concept (POC) for the groundbreaking ASUS AI POD, powered by the NVIDIA Blackwell platform. This exclusive opportunity is now open to a select group of innovators who are eager to harness the full potential of AI computing. Innovators and enterprises can experience firsthand the full potential of AI and deep learning solutions at exceptional scale. To take advantage of this limited-time offer, please complete this surveyi at: forms.office.com/r/FrAbm5BfH2. The expert ASUS team of NVIDIA GB200 specialists will guide users through the next steps.

NVIDIA Revises "Blackwell" Architecture Production Roadmap for More Complex Packaging

According to a well-known industry analyst, Ming-Chi Kuo, NVIDIA has restructured its "Blackwell" architecture roadmap, emphasizing dual-die designs using CoWoS-L packaging technology. The new roadmap eliminates several single-die products that would have used CoWoS-S packaging, changing NVIDIA's manufacturing strategy. The 200 Series will exclusively use dual-die designs with CoWoS-L packaging, featuring the GB200 NVL72 and HGX B200 systems. Notably absent is the previously expected B200A single-die variant. The 300 Series will include both dual-die and single-die options, though NVIDIA and cloud providers are prioritizing the GB200 NVL72 dual-die system. Starting Q1 2025, NVIDIA will reduce H series production, which uses CoWoS-S packaging, while ramping up 200 Series production. This transition indicates significantly decreased demand for CoWoS-S capacity through 2025.

While B300 systems using single-die CoWoS-S are planned for 2026 mass production, the current focus remains on dual-die CoWoS-L products. From TSMC's perspective, the transition between Blackwell generations requires minimal process adjustments, as both use similar front-end-of-line processes with only back-end-of-line modifications needed. Supply chain partners heavily dependent on CoWoS-S production face significant impact, reflected in recent stock price corrections. However, NVIDIA maintains this change reflects product strategy evolution rather than market demand weakness. TSMC continues expanding CoWoS-R capacity while slowing CoWoS-S expansion, viewing AI and high-performance computing as sustained growth drivers despite these packaging technology transitions.

NVIDIA's GB200 "Blackwell" Racks Face Overheating Issues

NVIDIA's new GB200 "Blackwell" racks are running into trouble (again). Big cloud companies like Microsoft, Amazon, Google, and Meta Platforms are cutting back their orders because of heat problems, Reuters reports, quoting The Information. The first shipments of racks with Blackwell chips are getting too hot and have connection issues between chips, the report says. These tech hiccups have made some customers who ordered $10 billion or more worth of racks think twice about buying.

Some are putting off their orders until NVIDIA has better versions of the racks. Others are looking at buying older NVIDIA AI chips instead. For example, Microsoft planned to set up GB200 racks with no less than 50,000 Blackwell chips at one of its Phoenix sites. However, The Information reports that OpenAI has asked Microsoft to provide NVIDIA's older "Hopper" chips instead pointing to delays linked to the Blackwell racks. NVIDIA's problems with its Blackwell GPUs housed in high-density racks are not something new; in November 2024, Reuters, also referencing The Information, uncovered overheating issues in servers that housed 72 processors. NVIDIA has made several changes to its server rack designs to tackle these problems, however, it seems that the problem was not entirely solved.

Gigabyte Demonstrates Omni-AI Capabilities at CES 2025

GIGABYTE Technology, internationally renowned for its R&D capabilities and a leading innovator in server and data center solutions, continues to lead technological innovation during this critical period of AI and computing advancement. With its comprehensive AI product portfolio, GIGABYTE will showcase its complete range of AI computing solutions at CES 2025, from data center infrastructure to IoT applications and personal computing, demonstrating how its extensive product line enables digital transformation across all sectors in this AI-driven era.

Powering AI from the Cloud
With AI Large Language Models (LLMs) now routinely featuring parameters in the hundreds of billions to trillions, robust training environments (data centers) have become a critical requirement in the AI race. GIGABYTE offers three distinctive solutions for AI infrastructure.

Gigabyte Expands Its Accelerated Computing Portfolio with New Servers Using the NVIDIA HGX B200 Platform

Giga Computing, a subsidiary of GIGABYTE and an industry leader in generative AI servers and advanced cooling technologies, announced new GIGABYTE G893 series servers using the NVIDIA HGX B200 platform. The launch of these flagship 8U air-cooled servers, the G893-SD1-AAX5 and G893-ZD1-AAX5, signifies a new architecture and platform change for GIGABYTE in the demanding world of high-performance computing and AI, setting new standards for speed, scalability, and versatility.

These servers join GIGABYTE's accelerated computing portfolio alongside the NVIDIA GB200 NVL72 platform, which is a rack-scale design that connects 36 NVIDIA Grace CPUs and 72 NVIDIA Blackwell GPUs. At CES 2025 (January 7-10), the GIGABYTE booth will display the NVIDIA GB200 NVL72, and attendees can engage in discussions about the benefits of GIGABYTE platforms with the NVIDIA Blackwell architecture.

NVIDIA GB300 "Blackwell Ultra" Will Feature 288 GB HBM3E Memory, 1400 W TDP

NVIDIA "Blackwell" series is barely out with B100, B200, and GB200 chips shipping to OEMs and hyperscalers, but the company is already setting in its upgraded "Blackwell Ultra" plans with its upcoming GB300 AI server. According to UDN, the next generation NVIDIA system will be powered by the B300 GPU chip, operating at 1400 W and delivering a remarkable 1.5x improvement in FP4 performance per card compared to its B200 predecessor. One of the most notable upgrades is the memory configuration, with each GPU now sporting 288 GB of HBM3e memory, a substantial increase from the previous 192 GB of GB200. The new design implements a 12-layer stack architecture, advancing from the GB200's 8-layer configuration. The system's cooling infrastructure has been completely reimagined, incorporating advanced water cooling plates and enhanced quick disconnects in the liquid cooling system.

Networking capabilities have also seen a substantial upgrade, with the implementation of ConnectX 8 network cards replacing the previous ConnectX 7 generation, while optical modules have been upgraded from 800G to 1.6T, ensuring faster data transmission. Regarding power management and reliability, the GB300 NVL72 cabinet will standardize capacitor tray implementation, with an optional Battery Backup Unit (BBU) system. Each BBU module costs approximately $300 to manufacture, with a complete GB300 system's BBU configuration totaling around $1,500. The system's supercapacitor requirements are equally substantial, with each NVL72 rack requiring over 300 units, priced between $20-25 per unit during production due to its high-power nature. The GB300, carrying Grace CPU and Blackwell Ultra GPU, also introduces the implementation of LPCAMM on its computing boards, indicating that the LPCAMM memory standard is about to take over servers, not just laptops and desktops. We have to wait for the official launch before seeing LPCAMM memory configurations.

NVIDIA's Next-Gen "Rubin" AI GPU Development 6 Months Ahead of Schedule: Report

The "Rubin" architecture succeeds NVIDIA's current "Blackwell," which powers the company's AI GPUs, as well as the upcoming GeForce RTX 50-series gaming GPUs. NVIDIA will likely not build gaming GPUs with "Rubin," just like it didn't with "Hopper," and for the most part, "Volta." NVIDIA's AI GPU product roadmap put out at SC'24 puts "Blackwell" firmly in charge of the company's AI GPU product stack throughout 2025, with "Rubin" only succeeding it in the following year, for a two-year run in the market, being capped off with a "Rubin Ultra" larger GPU slated for 2027. A new report by United Daily News (UDN), a Taiwan-based publication, says that the development of "Rubin" is running 6 months ahead of schedule.

Being 6 months ahead of schedule doesn't necessarily mean that the product will launch sooner. It would give NVIDIA headroom to get "Rubin" better evaluated in the industry, and make last-minute changes to the product if needed; or even advance the launch if it wants to. The first AI GPU powered by "Rubin" will feature 8-high HBM4 memory stacks. The company will also introduce the "Vera" CPU, the long-awaited successor to "Grace." It will also introduce the X1600 InfiniBand/Ethernet network processor. According to the SC'24 roadmap by NVIDIA, these three would've seen a 2026 launch. Then in 2027, the company would follow up with an even larger AI GPU based on the same "Rubin" architecture, codenamed "Rubin Ultra." This features 12-high HBM4 stacks. NVIDIA's current GB200 "Blackwell" is a tile-based GPU, with two dies that have full cache-coherence. "Rubin" is rumored to feature four tiles.

NVIDIA and Microsoft Showcase Blackwell Preview, Omniverse Industrial AI and RTX AI PCs at Microsoft Ignite

NVIDIA and Microsoft today unveiled product integrations designed to advance full-stack NVIDIA AI development on Microsoft platforms and applications. At Microsoft Ignite, Microsoft announced the launch of the first cloud private preview of the Azure ND GB200 V6 VM series, based on the NVIDIA Blackwell platform. The Azure ND GB200 v6 will be a new AI-optimized virtual machine (VM) series and combines the NVIDIA GB200 NVL72 rack design with NVIDIA Quantum InfiniBand networking.

In addition, Microsoft revealed that Azure Container Apps now supports NVIDIA GPUs, enabling simplified and scalable AI deployment. Plus, the NVIDIA AI platform on Azure includes new reference workflows for industrial AI and an NVIDIA Omniverse Blueprint for creating immersive, AI-powered visuals. At Ignite, NVIDIA also announced multimodal small language models (SLMs) for RTX AI PCs and workstations, enhancing digital human interactions and virtual assistants with greater realism.

NVIDIA Prepares GB200 NVL4: Four "Blackwell" GPUs and Two "Grace" CPUs in a 5,400 W Server

At SC24, NVIDIA announced its latest compute-dense AI accelerators in the form of GB200 NVL4, a single-server solution that expands the company's "Blackwell" series portfolio. The new platform features an impressive combination of four "Blackwell" GPUs and two "Grace" CPUs on a single board. The GB200 NVL4 boasts remarkable specifications for a single-server system, including 768 GB of HBM3E memory across its four Blackwell GPUs, delivering a combined memory bandwidth of 32 TB/s. The system's two Grace CPUs have 960 GB of LPDDR5X memory, making it a powerhouse for demanding AI workloads. A key feature of the NVL4 design is its NVLink interconnect technology, which enables communication between all processors on the board. This integration is important for maintaining optimal performance across the system's multiple processing units, especially during large training runs or inferencing a multi-trillion parameter model.

Performance comparisons with previous generations show significant improvements, with NVIDIA claiming the GB200 GPUs deliver 2.2x faster overall performance and 1.8x quicker training capabilities compared to their GH200 NVL4 predecessor. The system's power consumption reaches 5,400 watts, which effectively doubles the 2,700-watt requirement of the GB200 NVL2 model, its smaller sibling that features two GPUs instead of four. NVIDIA is working closely with OEM partners to bring various Blackwell solutions to market, including the DGX B200, GB200 Grace Blackwell Superchip, GB200 Grace Blackwell NVL2, GB200 Grace Blackwell NVL4, and GB200 Grace Blackwell NVL72. Fitting 5,400 W of TDP in a single server will require liquid cooling for optimal performance, and the GB200 NVL4 is expected to go inside server racks for hyperscaler customers, which usually have a custom liquid cooling systems inside their data centers.

Supermicro Delivers Direct-Liquid-Optimized NVIDIA Blackwell Solutions

Supermicro, Inc., a Total IT Solution Provider for AI, Cloud, Storage, and 5G/Edge, is announcing the highest-performing SuperCluster, an end-to-end AI data center solution featuring the NVIDIA Blackwell platform for the era of trillion-parameter-scale generative AI. The new SuperCluster will significantly increase the number of NVIDIA HGX B200 8-GPU systems in a liquid-cooled rack, resulting in a large increase in GPU compute density compared to Supermicro's current industry-leading liquid-cooled NVIDIA HGX H100 and H200-based SuperClusters. In addition, Supermicro is enhancing the portfolio of its NVIDIA Hopper systems to address the rapid adoption of accelerated computing for HPC applications and mainstream enterprise AI.

"Supermicro has the expertise, delivery speed, and capacity to deploy the largest liquid-cooled AI data center projects in the world, containing 100,000 GPUs, which Supermicro and NVIDIA contributed to and recently deployed," said Charles Liang, president and CEO of Supermicro. "These Supermicro SuperClusters reduce power needs due to DLC efficiencies. We now have solutions that use the NVIDIA Blackwell platform. Using our Building Block approach allows us to quickly design servers with NVIDIA HGX B200 8-GPU, which can be either liquid-cooled or air-cooled. Our SuperClusters provide unprecedented density, performance, and efficiency, and pave the way toward even more dense AI computing solutions in the future. The Supermicro clusters use direct liquid cooling, resulting in higher performance, lower power consumption for the entire data center, and reduced operational expenses."

ASRock Rack Brings End-to-End AI and HPC Server Portfolio to SC24

ASRock Rack Inc., a leading innovative server company, today announces its presence at SC24, held at the Georgia World Congress Center in Atlanta from November 18-21. At booth #3609, ASRock Rack will showcase a comprehensive high-performance portfolio of server boards, systems, and rack solutions with NVIDIA accelerated computing platforms, helping address the needs of enterprises, organizations, and data centers.

Artificial intelligence (AI) and high-performance computing (HPC) continue to reshape technology. ASRock Rack is presenting a complete suite of solutions spanning edge, on-premise, and cloud environments, engineered to meet the demand of AI and HPC. The 2U short-depth MECAI, incorporating the NVIDIA GH200 Grace Hopper Superchip, is developed to supercharge accelerated computing and generative AI in space-constrained environments. The 4U10G-TURIN2 and 4UXGM-GNR2, supporting ten and eight NVIDIA H200 NVL PCIe GPUs respectively, are aiming to help enterprises and researchers tackle every AI and HPC challenge with enhanced performance and greater energy efficiency. NVIDIA H200 NVL is ideal for lower-power, air-cooled enterprise rack designs that require flexible configurations, delivering acceleration for AI and HPC workloads regardless of size.

GIGABYTE Showcases a Leading AI and Enterprise Portfolio at Supercomputing 2024

Giga Computing, a subsidiary of GIGABYTE and an industry leader in generative AI servers and advanced cooling technologies, shows off at SC24 how the GIGABYTE enterprise portfolio provides solutions for all applications, from cloud computing to AI to enterprise IT, including energy-efficient liquid-cooling technologies. This portfolio is made more complete by long-term collaborations with leading technology companies and emerging industry leaders, which will be showcased at GIGABYTE booth #3123 at SC24 (Nov. 19-21) in Atlanta. The booth is sectioned to put the spotlight on strategic technology collaborations, as well as direct liquid cooling partners.

The GIGABYTE booth will showcase an array of NVIDIA platforms built to keep up with the diversity of workloads and degrees of demands in applications of AI & HPC hardware. For a rack-scale AI solution using the NVIDIA GB200 NVL72 design, GIGABYTE displays how seventy-two GPUs can be in one rack with eighteen GIGABYTE servers each housing two NVIDIA Grace CPUs and four NVIDIA Blackwell GPUs. Another platform at the GIGABYTE booth is the NVIDIA HGX H200 platform. GIGABYTE exhibits both its liquid-cooling G4L3-SD1 server and an air-cooled version, G593-SD1.

ASUS Presents Next-Gen Infrastructure Solutions With Advanced Cooling Portfolio at SC24

ASUS today announced its next-generation infrastructure solutions at SC24, unveiling an extensive server lineup and advanced cooling solutions, all designed to propel the future of AI. The product showcase will reveal how ASUS is working with NVIDIA and Ubitus/Ubilink to prove the immense computational power of supercomputers, using AI-powered avatar and robot demonstrations that leverage the newly-inaugurated data center. It is Taiwan's largest supercomputing facility, constructed by ASUS, and is also notable for offering flexible green-energy options to customers that desire them. As a total solution provider with a proven track record in pioneering AI supercomputing, ASUS continuously drives maximized value for customers.

To fuel digital transformation in enterprise through high-performance computing (HPC) and AI-driven architecture, ASUS provides a full line-up of server systems—ready for every scenario. ASUS AI POD, a complete rack solution equipped with NVIDIA GB200 NVL72 platform, integrates GPUs, CPUs and switches in seamless, high-speed direct communication, enhancing the training of trillion-parameter LLMs and enabling real-time inference. It features the NVIDIA GB200 Grace Blackwell Superchip and fifth-generation NVIDIA NVLink technology, while offering both liquid-to-air and liquid-to-liquid cooling options to maximize AI computing performance.

CoolIT Announces the World's Highest Density Liquid-to-Liquid Coolant Distribution Unit

CoolIT Systems (CoolIT), the world leader in liquid cooling systems for AI and high-performance computing, introduces the CHx1000, the world's highest-density liquid-to-liquid coolant distribution unit (CDU). Designed for mission-critical applications, the CHx1000 is purpose-built to cool the NVIDIA Blackwell platform and other demanding AI workloads where liquid cooling is now necessary.

"CoolIT created the CHx1000 to provide the high capacity and pressure delivery required to direct liquid cool NVIDIA Blackwell and future generations of high-performance AI accelerators," said Patrick McGinn, CoolIT's COO. "Besides exceptional performance, serviceability and reliability are central to the CHx1000's design. The single rack-sized unit is fully front and back serviceable with hot-swappable critical components. Precision coolant controls and multiple levels of redundancy provide for steady, uninterrupted operation."

NVIDIA "Blackwell" NVL72 Servers Reportedly Require Redesign Amid Overheating Problems

According to The Information, NVIDIA's latest "Blackwell" processors are reportedly encountering significant thermal management issues in high-density server configurations, potentially affecting deployment timelines for major tech companies. The challenges emerge specifically in NVL72 GB200 racks housing 72 GB200 processors, which can consume up to 120 kilowatts of power per rack, weighting a "mere" 3,000 pounds (or about 1.5 tons). These thermal concerns have prompted NVIDIA to revisit and modify its server rack designs multiple times to prevent performance degradation and potential hardware damage. Hyperscalers like Google, Meta, and Microsoft, who rely heavily on NVIDIA GPUs for training their advanced language models, have allegedly expressed concerns about possible delays in their data center deployment schedules.

The thermal management issues follow earlier setbacks related to a design flaw in the Blackwell production process. The problem stemmed from the complex CoWoS-L packaging technology, which connects dual chiplets using RDL interposer and LSI bridges. Thermal expansion mismatches between various components led to warping issues, requiring modifications to the GPU's metal layers and bump structures. A company spokesperson characterized these modifications as part of the standard development process, noting that a new photomask resolved this issue. The Information states that mass production of the revised Blackwell GPUs began in late October, with shipments expected to commence in late January. However, these timelines are unconfirmed by NVIDIA, and some server makers like Dell confirmed that these GB200 NVL72 liquid-cooled systems are shipping now, not in January, with CoreWave GPU cloud provider as a customer. The original report could be using older information, as Dell is one of NVIDIA's most significant partners and among the first in the supply chain to gain access to new GPU batches.

NVIDIA B200 "Blackwell" Records 2.2x Performance Improvement Over its "Hopper" Predecessor

We know that NVIDIA's latest "Blackwell" GPUs are fast, but how much faster are they over the previous generation "Hopper"? Thanks to the latest MLPerf Training v4.1 results, NVIDIA's HGX B200 Blackwell platform has demonstrated massive performance gains, measuring up to 2.2x improvement per GPU compared to its HGX H200 Hopper. The latest results, verified by MLCommons, reveal impressive achievements in large language model (LLM) training. The Blackwell architecture, featuring HBM3e high-bandwidth memory and fifth-generation NVLink interconnect technology, achieved double the performance per GPU for GPT-3 pre-training and a 2.2x boost for Llama 2 70B fine-tuning compared to the previous Hopper generation. Each benchmark system incorporated eight Blackwell GPUs operating at a 1,000 W TDP, connected via NVLink Switch for scale-up.

The network infrastructure utilized NVIDIA ConnectX-7 SuperNICs and Quantum-2 InfiniBand switches, enabling high-speed node-to-node communication for distributed training workloads. While previous Hopper-based systems required 256 GPUs to optimize performance for the GPT-3 175B benchmark, Blackwell accomplished the same task with just 64 GPUs, leveraging its larger HBM3e memory capacity and bandwidth. One thing to look out for is the upcoming GB200 NVL72 system, which promises even more significant gains past the 2.2x. It features expanded NVLink domains, higher memory bandwidth, and tight integration with NVIDIA Grace CPUs, complemented by ConnectX-8 SuperNIC and Quantum-X800 switch technologies. With faster switching and better data movement with Grace-Blackwell integration, we could see even more software optimization from NVIDIA to push the performance envelope.

NVIDIA Ships Over One Billion RISC-V Cores This Year Inside Its Accelerators, Up to 40 Cores Per Chip

During the 2024 RISC-V Summit in Santa Clara, California, NVIDIA was one of the presenting members. RISC-V, being a free and open-source instruction set architecture, is an interesting choice for many companies looking to develop custom solutions. NVIDIA designs accelerators for AI and graphics processing, all of which are equipped with up to tens of thousands of cores. To manage these cores, NVIDIA has developed a custom RISC-V processor called "NV-RISCV," which is a replacement for its predecessor "Falcon." Unlike Falcon, NV-RISCV is based on an open-source ISA and is customized much more deeply, with features like more customized caches and special instructions. Initially, the company reported better performance over its Falcon GPU System Processor (GSP), and NV-RISCV is now running in millions of NVIDIA chips.

Thanks to a post on X by Nick Brown, we learn that NVIDIA is shipping roughly one billion RISC-V cores in the year 2024. Each NVIDIA chip includes between 10 and 40 RISC-V cores, depending on the chip size and complexity. Some more complex designs, like GB200, require massive data coordination, meaning that more cores are needed to handle these requests and distribute them. This includes chip-to-chip interfaces, context switching, memory controller, camera handling, video codecs, display output, resource management, power management, and more. NVIDIA has developed a total of over 20 custom extensions for RISC-V cores, which all serve their specific use cases.
Return to Keyword Browsing
Mar 25th, 2025 10:52 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts