Tuesday, July 30th 2024

NVIDIA Blackwell's High Power Consumption Drives Cooling Demands; Liquid Cooling Penetration Expected to Reach 10% by Late 2024

With the growing demand for high-speed computing, more effective cooling solutions for AI servers are gaining significant attention. TrendForce's latest report on AI servers reveals that NVIDIA is set to launch its next-generation Blackwell platform by the end of 2024. Major CSPs are expected to start building AI server data centers based on this new platform, potentially driving the penetration rate of liquid cooling solutions to 10%.

Air and liquid cooling systems to meet higher cooling demands
TrendForce reports that the NVIDIA Blackwell platform will officially launch in 2025, replacing the current Hopper platform and becoming the dominant solution for NVIDIA's high-end GPUs, accounting for nearly 83% of all high-end products. High-performance AI server models like the B200 and GB200 are designed for maximum efficiency, with individual GPUs consuming over 1,000 W. HGX models will house 8 GPUs each, while NVL models will support 36 or 72 GPUs per rack, significantly boosting the growth of the liquid cooling supply chain for AI servers.
TrendForce highlights the increasing TDP of server chips, with the B200 chip's TDP reaching 1,000 W, making traditional air cooling solutions inadequate. The TDP of the GB200 NVL36 and NVL72 complete rack systems is projected to reach 70kW and nearly 140kW, respectively, necessitating advanced liquid cooling solutions for effective heat management.

TrendForce observes that the GB200 NVL36 architecture will initially utilize a combination of air and liquid cooling solutions, while the NVL72, due to higher cooling demands, will primarily employ liquid cooling.

TrendForce identifies five major components in the current liquid cooling supply chain for GB200 rack systems: cold plates, coolant distribution units (CDUs), manifolds, quick disconnects (QDs), and rear door heat exchangers (RDHx).

The CDU is the critical system responsible for regulating coolant flow to maintain rack temperatures within the designated TDP range, preventing component damage. Vertiv is currently the main CDU supplier for NVIDIA AI solutions, with Chicony, Auras, Delta, and CoolIT undergoing continuous testing.

GB200 shipments expected to reach 60,000 units in 2025, making Blackwell the mainstream platform and accounting for over 80% of NVIDIA's high-end GPUs
In 2025, NVIDIA will target CSPs and enterprise customers with diverse AI server configurations, including the HGX, GB200 Rack, and MGX, with expected shipment ratios of 5:4:1. The HGX platform will seamlessly transition from the existing Hopper platform, enabling CSPs and large enterprise customers to adopt it quickly. The GB200 rack AI server solution will primarily target the hyperscale CSP market. TrendForce predicts NVIDIA will introduce the NVL36 configuration at the end of 2024 to quickly enter the market, with the more complex NVL72 expected to launch in 2025.

TrendForce forecasts that in 2025, GB200 NVL36 shipments will reach 60,000 racks, with Blackwell GPU usage between 2.1 to 2.2 million units.

However, there are several variables in the adoption of the GB200 Rack by end customers. TrendForce points out that the NVL72's power consumption of around 140kW per rack requires sophisticated liquid cooling solutions, making it challenging. Additionally, liquid-cooled rack designs are more suitable for new CSP data centers but involve complex planning processes. CSPs might also avoid being tied to a single supplier's specifications and opt for HGX or MGX models with x86 CPU architectures, or expand their self-developed ASIC AI server infrastructure for lower costs or specific AI applications.
Source: TrendForce
Add your own comment

41 Comments on NVIDIA Blackwell's High Power Consumption Drives Cooling Demands; Liquid Cooling Penetration Expected to Reach 10% by Late 2024

#27
[crs]
I really would love the card manufacturers to release a card with no heatsink, lower the price accordingly, and I will buy my own water block.
Posted on Reply
#28
stimpy88
Maybe it's time nGreedia focussed on the architecture IPC instead of just adding more of the same building blocks to a die. Their greed has made them lazy.
Posted on Reply
#29
Jism
KlemcThe global climate is getting hotter and hotter...

It's time for PCs to also participate in this phenomenon !
2700W for just compute units looks staggering but if you have a target or a demand for a X compute performance a single unit will be more efficient then using 10+ units to archiving the same.

20 years ago if you would have bought a server or host multiple applications you likely needed 10 servers with 10x the power usage. Today you can get away with just one since the tech has turned so efficient.

So 2700W looks like a lot - its likely more efficient then hosting a machine with 4 separate GPU's.
Posted on Reply
#30
wolf
Better Than Native
fevgatosThey see nvidia on the title and go berserk.
/thread. As reliable as the sun rising. I'm just surprised nobody wrote the more you buy the more you save, or anything about a leather jacket.... yet.

I live a very energy and waste conscious life overall, it runs through the core of how I operate, yet, power draw ≠ efficiency and performance per watt, it's all relative. I do enjoy my efficient parts, and undervolt / overclock everything i own to get the most out of it per watt.
Posted on Reply
#31
Minus Infinity
fevgatosJust limit it? At some point we need to realize the 5090,just like the 4090, are the most efficient chips, thus the best at "protecting the environment".

Of course people will buy 5090s like crazy. Why wouldn't they? As ive said, even people that care about effieicny and power draw will buy 5090s cause they are the fastest at any power level.
LOL 4080 Super is the most efficient, so even smarter people worried about efficiency will save a pretty penny. Consumes way less power and gets more fps/Watt.
Posted on Reply
#32
metalslaw
R0H1TWe should probably start growing more brains, as they say win-win :laugh:
Probably cheaper to just put people in cryo pods, and hook their brains together in some kind of matrix..... :D
Posted on Reply
#33
JustBenching
Minus InfinityLOL 4080 Super is the most efficient, so even smarter people worried about efficiency will save a pretty penny. Consumes way less power and gets more fps/Watt.
No it doesn't. You do realize you can limit the power on all of these cards, right?
Posted on Reply
#34
yfn_ratchet
Minus InfinityLOL 4080 Super is the most efficient, so even smarter people worried about efficiency will save a pretty penny. Consumes way less power and gets more fps/Watt.
fevgatosNo it doesn't. You do realize you can limit the power on all of these cards, right?
I'll do you both one better and grab some numbers from a prior report: www.techpowerup.com/301649/nvidia-geforce-rtx-4090-with-nearly-half-its-power-limit-and-undervolting-loses-just-8-performance

As it is, higher bins and higher tier chips will reliably perform more efficiently and carry a higher performance ceiling; the 4090 isn't even the cream of the crop. Try the Quadro RTX A6000 Ada. That's some top-shelf chips, and they come at a PREMIUM.

Generally though, point is that in the consumer market all players will push their power draw far beyond the zenith of the perf/watt curve to get a few extra (very marketable) frames. Undervolting is the one truth, fite me.
Posted on Reply
#35
starfals
TheinsanegamerNIf you dont want it, dont buy it? There's clearly plenty of us willing to pay for powerful GPUs. That's ALWAYS been the case. And people have winged and moaned about high power expensive GPUs since the dawn of PCIe.


It's time you DID something about it! Have you abandoned your PCs and returned to an agrarian society? Oh, no, you have not. Better get on it, you got a planet to save!
Indeed, fk the planet. We are alive today and it's all fine today. A few hot days are no prob for any cheap AC. I can't wait for 2000-3000W GPUs, honestly. I want the 240 fps in RT. By the time the planet is dead, ill also be long gone too. At least i died with god rays and reflections all over my face while i was alive!

P.s. The 3000W GPU might be good for the winter too, think about it. It will warm your room. Here winters are freezing!
Posted on Reply
#36
Dr. Dro
DavenIt’s really going to be up to us as consumers to reject these high power, expensive, obscenely high margin parts.
Consumer GPUs won't reach full-rack, platform power consumption figures, in other words, 5090 should still be 450W or less.
KlemcWhy the name's Blackwell, any source ?
All NVIDIA GPUs are named after important theoretical physicists, inventors or mathematicians who made groundbreaking or landmark contributions to science. This one is named after David Blackwell, one of the most prominent mathematicians of his time and recipient of countless honors in his field

en.wikipedia.org/wiki/David_Blackwell

Ada is named after Ada, Countess of Lovelace (mathematician that is credited with creating the first computer program), Ampere is named after André-Marie Ampère, physicist credited with the discovery of electrodynamics and creation of the electrical telegraph, Turing is named after Alan Turing, computer scientist famous for breaking the Enigma code and father of modern computing, Volta is named after the inventor Alessandro Volta, creator of the battery, Pascal after Blaise Pascal, famous for his work in probability theory and many other areas, Maxwell after James Clerk Maxwell (theory of electromagnetic reaction), Fermi after Enrico Fermi (creator of the first nuclear reactor and member of the Manhattan Project), Tesla after Nikola Tesla, creator of the AC system, etc.
Posted on Reply
#37
JustBenching
yfn_ratchetI'll do you both one better and grab some numbers from a prior report: www.techpowerup.com/301649/nvidia-geforce-rtx-4090-with-nearly-half-its-power-limit-and-undervolting-loses-just-8-performance

As it is, higher bins and higher tier chips will reliably perform more efficiently and carry a higher performance ceiling; the 4090 isn't even the cream of the crop. Try the Quadro RTX A6000 Ada. That's some top-shelf chips, and they come at a PREMIUM.

Generally though, point is that in the consumer market all players will push their power draw far beyond the zenith of the perf/watt curve to get a few extra (very marketable) frames. Undervolting is the one truth, fite me.
Isn't that what I said kinda? I mean let's say nvidia releases the 5090 at 1000 watts. Why would I or anyone else actually care? I'd still limit it to 320 watts like i've done with my 4090. Don't see the problem.
Posted on Reply
#38
yfn_ratchet
fevgatosIsn't that what I said kinda? I mean let's say nvidia releases the 5090 at 1000 watts. Why would I or anyone else actually care? I'd still limit it to 320 watts like i've done with my 4090. Don't see the problem.
Was mostly just nabbing some graphs that demonstrate driving down power draw on a 4090/AD102 via either a simple PL adjust or a half-decent undervolt is very much capable of outperforming a 4080S/AD103 at the same power. It backs up your point, I suppose I wasn't all that clear towards that end.
Posted on Reply
#39
londiste
Consumers buy perf/$ and get products geared towards that - maximum performance that can be wrung out of a chip.
Datacenters buy perf/W and get products geared towards that - maximum efficiency point that can be found.

The development goals flow and ebb here but usually it is a compromise between max performance and efficiency anyway.
Posted on Reply
#40
JustBenching
yfn_ratchetWas mostly just nabbing some graphs that demonstrate driving down power draw on a 4090/AD102 via either a simple PL adjust or a half-decent undervolt is very much capable of outperforming a 4080S/AD103 at the same power. It backs up your point, I suppose I wasn't all that clear towards that end.
Which basically means the 4090 is indeed the most efficient desktop gpu currently in existence.

In other words, people that complain about ruining the planet with high end gpus, if you don't have 4090s you are the ones ruining the planet
Posted on Reply
#41
Noyand
DavenIt’s really going to be up to us as consumers to reject these high power, expensive, obscenely high margin parts.

Our world will literally die the moment humans turn on millions of 1000W GPUs in order just to see better virtual water and light reflections in games and CGI films.
I don't think that the CPU-based render farms still used to render movies today are better in that regard...a lot of us here are judging those figures from a consumer running a single GPU POV, but the market that they are talking about has always dealt with power figures that a house wouldn't be able to handle. GPU rendering is actually much faster per watt, but the market leaders for heavy-duty rendering (Renderman and Arnold) still don't have feature parity between the CPU and GPU renderer. Last I've heard CPU rendering is more robust even if it's drastically slower per watt.
What We Can Learn from Pixar's 24,000-Core Supercomputer | No Film School
Wētā 'outgrew the power grid' during rendering of Avatar 2, producer says | Stuff
(Avatar 2 used Renderman which is still CPU based for a production output)
Posted on Reply
Add your own comment
Nov 21st, 2024 10:41 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts