• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

NVIDIA "Blackwell" GB200 Server Dedicates Two-Thirds of Space to Cooling at Microsoft Azure

AleksandarK

News Editor
Staff member
Joined
Aug 19, 2017
Messages
2,467 (0.95/day)
Late Tuesday, Microsoft Azure shared an interesting picture on its social media platform X, showcasing the pinnacle of GPU-accelerated servers—NVIDIA "Blackwell" GB200-powered AI systems. Microsoft is one of NVIDIA's largest customers, and the company often receives products first to integrate into its cloud and company infrastructure. Even NVIDIA listens to feedback from companies like Microsoft about designing future products, especially those like the now-canceled NVL36x2 system. The picture below shows a massive cluster that roughly divides the compute area into a single-third of the entire system, with a gigantic two-thirds of the system dedicated to closed-loop liquid cooling.

The entire system is connected using Infiniband networking, a standard for GPU-accelerated systems due to its lower latency in packet transfer. While the details of the system are scarce, we can see that the integrated closed-loop liquid cooling allows the GPU racks to be in a 1U form for increased density. Given that these systems will go into the wider Microsoft Azure data centers, a system needs to be easily maintained and cooled. There are indeed limits in power and heat output that Microsoft's data centers can handle, so these types of systems often fit inside internal specifications that Microsoft designs. There are more compute-dense systems, of course, like NVIDIA's NVL72, but hyperscalers should usually opt for other custom solutions that fit into their data center specifications. Finally, Microsoft noted that we can expect to see more details at the upcoming Microsoft Ignite conference in November and learn more about its GB200-powered AI systems.



View at TechPowerUp Main Site | Source
 

StimpsonJCat

New Member
Joined
Sep 30, 2024
Messages
10 (1.00/day)
This is what happens when you take the easy option and do not make architectural changes and smarter designs, and just overclock and over volt for a "free upgrade". NV haven't made any major architectural updates to their GPU for many years now - they just bolt on more of the same, max it up to the reticle limit, then OC it to meet the performance goal. Very cheap and fast to do, but we end up with this monstrosity.

NV will need to actually come up with a new architecture to move the needle on the next chip, as TSMC is at their limits now, and nothing new that can manufacture a GPU at this size for NV is close for at least another 2 years.

NV really need to separate their AI and GPU business and make optimized versions of each.
 
Joined
Oct 28, 2023
Messages
92 (0.27/day)
Processor 7600x -- 8600k
Motherboard MSI B650 -- ASRock z370
Cooling TR PA120 -- CM212
Memory 2x16GB -- 2x8GB
Video Card(s) PNY 4080 -- Zotac 2080ti
Storage 4TB SN850x / 4TB 870evo / 2TB SN770 -- 512GB 970 / 2TB WDBlue / some HDDs
Display(s) LG C3 (42) + Acer XB271HU -- Sony X950H (49)
Case Torrent Compact Steel Panel -- Define R2
Power Supply BeQuiet Darkpower13 750 -- Seasonic X750
Keyboard Epomaker TH80SE/EK21 -- Logi G710+
Software Win10 (both)
So how long before the cooling needs of our AI datacenters can provide steam turbine power for our industry needs to provide more AI power to power our AI overlords?
 
Joined
Jul 24, 2024
Messages
118 (1.51/day)
Excuse me, what other purpose serve these chips except for generating heat? Well, if they power up Microsofts co-pilot-like stuff, LLM and generative AI, that the heat is better purpose. As they say in GoT: "Winter is coming".
 
Joined
Nov 15, 2005
Messages
1,011 (0.15/day)
Processor 2500K @ 4.5GHz 1.28V
Motherboard ASUS P8P67 Deluxe
Cooling Corsair A70
Memory 8GB (2x4GB) Corsair Vengeance 1600 9-9-9-24 1T
Video Card(s) eVGA GTX 470
Storage Crucial m4 128GB + Seagate RAID 1 (1TB x 2)
Display(s) Dell 22" 1680x1050 nothing special
Case Antec 300
Audio Device(s) Onboard
Power Supply PC Power & Cooling 750W
Software Windows 7 64bit Pro
Anyone else notice the towel at the bottom of the radiator?
 
Joined
Sep 29, 2020
Messages
92 (0.06/day)
This is what happens when you take the easy option and do not make architectural changes and smarter designs, and just overclock and over volt for a "free upgrade".
What smart "architectural changes" would you make? Be specific, with calculated details on their effects on manufacturing costs, yield rates, and power:performance ratios.
 
Joined
Dec 28, 2012
Messages
3,770 (0.88/day)
System Name Skunkworks
Processor 5800x3d
Motherboard x570 unify
Cooling Noctua NH-U12A
Memory 32GB 3600 mhz
Video Card(s) asrock 6800xt challenger D
Storage Sabarent rocket 4.0 2TB, MX 500 2TB
Display(s) Asus 1440p144 27"
Case Old arse cooler master 932
Power Supply Corsair 1200w platinum
Mouse *squeak*
Keyboard Some old office thing
Software openSUSE tumbleweed/Mint 21.2
This is what happens when you take the easy option and do not make architectural changes and smarter designs, and just overclock and over volt for a "free upgrade". NV haven't made any major architectural updates to their GPU for many years now - they just bolt on more of the same, max it up to the reticle limit, then OC it to meet the performance goal. Very cheap and fast to do, but we end up with this monstrosity.

NV will need to actually come up with a new architecture to move the needle on the next chip, as TSMC is at their limits now, and nothing new that can manufacture a GPU at this size for NV is close for at least another 2 years.

NV really need to separate their AI and GPU business and make optimized versions of each.
No arch changes? Really? You saying that ampere, ada, and pascal are the same now?

:laugh::roll::laugh::banghead::laugh::roll::laugh:

So how long before the cooling needs of our AI datacenters can provide steam turbine power for our industry needs to provide more AI power to power our AI overlords?
Sadly never, because these chips dont have anywhere near the thermal output or max temperature needed to make high pressure steam.
 
Joined
Jan 3, 2021
Messages
3,334 (2.42/day)
Location
Slovenia
Processor i5-6600K
Motherboard Asus Z170A
Cooling some cheap Cooler Master Hyper 103 or similar
Memory 16GB DDR4-2400
Video Card(s) IGP
Storage Samsung 850 EVO 250GB
Display(s) 2x Oldell 24" 1920x1200
Case Bitfenix Nova white windowless non-mesh
Audio Device(s) E-mu 1212m PCI
Power Supply Seasonic G-360
Mouse Logitech Marble trackball, never had a mouse
Keyboard Key Tronic KT2000, no Win key because 1994
Software Oldwin
Anyone else notice the towel at the bottom of the radiator?
I'm afraid this is not even a radiator, just a water-water heat exchanger. The thick pipes at the top connect to the really big radiator outside the building.
 
Top