Tuesday, July 30th 2024
NVIDIA Blackwell's High Power Consumption Drives Cooling Demands; Liquid Cooling Penetration Expected to Reach 10% by Late 2024
With the growing demand for high-speed computing, more effective cooling solutions for AI servers are gaining significant attention. TrendForce's latest report on AI servers reveals that NVIDIA is set to launch its next-generation Blackwell platform by the end of 2024. Major CSPs are expected to start building AI server data centers based on this new platform, potentially driving the penetration rate of liquid cooling solutions to 10%.
Air and liquid cooling systems to meet higher cooling demands
TrendForce reports that the NVIDIA Blackwell platform will officially launch in 2025, replacing the current Hopper platform and becoming the dominant solution for NVIDIA's high-end GPUs, accounting for nearly 83% of all high-end products. High-performance AI server models like the B200 and GB200 are designed for maximum efficiency, with individual GPUs consuming over 1,000 W. HGX models will house 8 GPUs each, while NVL models will support 36 or 72 GPUs per rack, significantly boosting the growth of the liquid cooling supply chain for AI servers.TrendForce highlights the increasing TDP of server chips, with the B200 chip's TDP reaching 1,000 W, making traditional air cooling solutions inadequate. The TDP of the GB200 NVL36 and NVL72 complete rack systems is projected to reach 70kW and nearly 140kW, respectively, necessitating advanced liquid cooling solutions for effective heat management.
TrendForce observes that the GB200 NVL36 architecture will initially utilize a combination of air and liquid cooling solutions, while the NVL72, due to higher cooling demands, will primarily employ liquid cooling.
TrendForce identifies five major components in the current liquid cooling supply chain for GB200 rack systems: cold plates, coolant distribution units (CDUs), manifolds, quick disconnects (QDs), and rear door heat exchangers (RDHx).
The CDU is the critical system responsible for regulating coolant flow to maintain rack temperatures within the designated TDP range, preventing component damage. Vertiv is currently the main CDU supplier for NVIDIA AI solutions, with Chicony, Auras, Delta, and CoolIT undergoing continuous testing.
GB200 shipments expected to reach 60,000 units in 2025, making Blackwell the mainstream platform and accounting for over 80% of NVIDIA's high-end GPUs
In 2025, NVIDIA will target CSPs and enterprise customers with diverse AI server configurations, including the HGX, GB200 Rack, and MGX, with expected shipment ratios of 5:4:1. The HGX platform will seamlessly transition from the existing Hopper platform, enabling CSPs and large enterprise customers to adopt it quickly. The GB200 rack AI server solution will primarily target the hyperscale CSP market. TrendForce predicts NVIDIA will introduce the NVL36 configuration at the end of 2024 to quickly enter the market, with the more complex NVL72 expected to launch in 2025.
TrendForce forecasts that in 2025, GB200 NVL36 shipments will reach 60,000 racks, with Blackwell GPU usage between 2.1 to 2.2 million units.
However, there are several variables in the adoption of the GB200 Rack by end customers. TrendForce points out that the NVL72's power consumption of around 140kW per rack requires sophisticated liquid cooling solutions, making it challenging. Additionally, liquid-cooled rack designs are more suitable for new CSP data centers but involve complex planning processes. CSPs might also avoid being tied to a single supplier's specifications and opt for HGX or MGX models with x86 CPU architectures, or expand their self-developed ASIC AI server infrastructure for lower costs or specific AI applications.
Source:
TrendForce
Air and liquid cooling systems to meet higher cooling demands
TrendForce reports that the NVIDIA Blackwell platform will officially launch in 2025, replacing the current Hopper platform and becoming the dominant solution for NVIDIA's high-end GPUs, accounting for nearly 83% of all high-end products. High-performance AI server models like the B200 and GB200 are designed for maximum efficiency, with individual GPUs consuming over 1,000 W. HGX models will house 8 GPUs each, while NVL models will support 36 or 72 GPUs per rack, significantly boosting the growth of the liquid cooling supply chain for AI servers.TrendForce highlights the increasing TDP of server chips, with the B200 chip's TDP reaching 1,000 W, making traditional air cooling solutions inadequate. The TDP of the GB200 NVL36 and NVL72 complete rack systems is projected to reach 70kW and nearly 140kW, respectively, necessitating advanced liquid cooling solutions for effective heat management.
TrendForce observes that the GB200 NVL36 architecture will initially utilize a combination of air and liquid cooling solutions, while the NVL72, due to higher cooling demands, will primarily employ liquid cooling.
TrendForce identifies five major components in the current liquid cooling supply chain for GB200 rack systems: cold plates, coolant distribution units (CDUs), manifolds, quick disconnects (QDs), and rear door heat exchangers (RDHx).
The CDU is the critical system responsible for regulating coolant flow to maintain rack temperatures within the designated TDP range, preventing component damage. Vertiv is currently the main CDU supplier for NVIDIA AI solutions, with Chicony, Auras, Delta, and CoolIT undergoing continuous testing.
GB200 shipments expected to reach 60,000 units in 2025, making Blackwell the mainstream platform and accounting for over 80% of NVIDIA's high-end GPUs
In 2025, NVIDIA will target CSPs and enterprise customers with diverse AI server configurations, including the HGX, GB200 Rack, and MGX, with expected shipment ratios of 5:4:1. The HGX platform will seamlessly transition from the existing Hopper platform, enabling CSPs and large enterprise customers to adopt it quickly. The GB200 rack AI server solution will primarily target the hyperscale CSP market. TrendForce predicts NVIDIA will introduce the NVL36 configuration at the end of 2024 to quickly enter the market, with the more complex NVL72 expected to launch in 2025.
TrendForce forecasts that in 2025, GB200 NVL36 shipments will reach 60,000 racks, with Blackwell GPU usage between 2.1 to 2.2 million units.
However, there are several variables in the adoption of the GB200 Rack by end customers. TrendForce points out that the NVL72's power consumption of around 140kW per rack requires sophisticated liquid cooling solutions, making it challenging. Additionally, liquid-cooled rack designs are more suitable for new CSP data centers but involve complex planning processes. CSPs might also avoid being tied to a single supplier's specifications and opt for HGX or MGX models with x86 CPU architectures, or expand their self-developed ASIC AI server infrastructure for lower costs or specific AI applications.
41 Comments on NVIDIA Blackwell's High Power Consumption Drives Cooling Demands; Liquid Cooling Penetration Expected to Reach 10% by Late 2024
It's time for PCs to also participate in this phenomenon !
Our world will literally die the moment humans turn on millions of 1000W GPUs in order just to see better virtual water and light reflections in games and CGI films.
Of course people will buy 5090s like crazy. Why wouldn't they? As ive said, even people that care about effieicny and power draw will buy 5090s cause they are the fastest at any power level.
A 1000w GPU is going to use 2 ATX 3.0 connectors. Split that power among the 12 12v lines, and that's just 6.94 amps per line. That's right around the power level of the old 8 pin connectors, per line.
400 amps at 1.4v is only 560 watt. We've had GPUs push more then that already. That 2700w card is going to be server only, likely using non standard connectors to push that kind of power. It's not unusual either, high power cards have existed for a LONG time. Wouldnt surprise me if they went with 24v for those.
Also, my old uncle (dead) is the main builder of the ECOLOGISTS (politics) in France.
fr.wikipedia.org/wiki/Bernard_Charbonneau
I also swapped the 3080 I had for 4070 Super just for desktop/gaming power reduction.
My carbon foot print is extremely small for an American and I sleep fine at night. So I can easily say we need to reject such high power computer parts as much as we can. Oh and I was a political activist in my youth fighting for the environment.
Your believers still want to buy even if your products draw 10kilo watts.
I only know AlwaysMuchDaft.
There's no purity requirements to lodge a pro-environment argument or any argument for that matter. This is just a typical logical fallacy employed to discount a person's opinion without actually providing any substance against their argument.