• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

NVIDIA "Blackwell" GB200 Server Dedicates Two-Thirds of Space to Cooling at Microsoft Azure

AleksandarK

News Editor
Staff member
Joined
Aug 19, 2017
Messages
2,582 (0.97/day)
Late Tuesday, Microsoft Azure shared an interesting picture on its social media platform X, showcasing the pinnacle of GPU-accelerated servers—NVIDIA "Blackwell" GB200-powered AI systems. Microsoft is one of NVIDIA's largest customers, and the company often receives products first to integrate into its cloud and company infrastructure. Even NVIDIA listens to feedback from companies like Microsoft about designing future products, especially those like the now-canceled NVL36x2 system. The picture below shows a massive cluster that roughly divides the compute area into a single-third of the entire system, with a gigantic two-thirds of the system dedicated to closed-loop liquid cooling.

The entire system is connected using Infiniband networking, a standard for GPU-accelerated systems due to its lower latency in packet transfer. While the details of the system are scarce, we can see that the integrated closed-loop liquid cooling allows the GPU racks to be in a 1U form for increased density. Given that these systems will go into the wider Microsoft Azure data centers, a system needs to be easily maintained and cooled. There are indeed limits in power and heat output that Microsoft's data centers can handle, so these types of systems often fit inside internal specifications that Microsoft designs. There are more compute-dense systems, of course, like NVIDIA's NVL72, but hyperscalers should usually opt for other custom solutions that fit into their data center specifications. Finally, Microsoft noted that we can expect to see more details at the upcoming Microsoft Ignite conference in November and learn more about its GB200-powered AI systems.



View at TechPowerUp Main Site | Source
 
Joined
Sep 30, 2024
Messages
87 (1.58/day)
This is what happens when you take the easy option and do not make architectural changes and smarter designs, and just overclock and over volt for a "free upgrade". NV haven't made any major architectural updates to their GPU for many years now - they just bolt on more of the same, max it up to the reticle limit, then OC it to meet the performance goal. Very cheap and fast to do, but we end up with this monstrosity.

NV will need to actually come up with a new architecture to move the needle on the next chip, as TSMC is at their limits now, and nothing new that can manufacture a GPU at this size for NV is close for at least another 2 years.

NV really need to separate their AI and GPU business and make optimized versions of each.
 
Joined
Oct 28, 2023
Messages
102 (0.26/day)
Processor 7600x -- 8600k
Motherboard MSI B650 -- ASRock z370
Cooling TR PA120 -- CM212
Memory 2x16GB -- 2x8GB
Video Card(s) PNY 4080 -- Zotac 2080ti
Storage 4TB SN850x / 4TB 870evo / 2TB SN770 -- 512GB 970 / 2TB WDBlue / some HDDs
Display(s) LG C3 (42) + Acer XB271HU -- Sony X950H (49)
Case Torrent Compact Steel Panel -- Define R2
Power Supply BeQuiet Darkpower13 750 -- Seasonic X750
Keyboard Epomaker TH80SE/EK21 -- Logi G710+
Software Win10 (both)
So how long before the cooling needs of our AI datacenters can provide steam turbine power for our industry needs to provide more AI power to power our AI overlords?
 
Joined
Jul 24, 2024
Messages
220 (1.79/day)
System Name AM4_TimeKiller
Processor AMD Ryzen 5 5600X @ all-core 4.7 GHz
Motherboard ASUS ROG Strix B550-E Gaming
Cooling Arctic Freezer II 420 rev.7 (push-pull)
Memory G.Skill TridentZ RGB, 2x16 GB DDR4, B-Die, 3800 MHz @ CL14-15-14-29-43 1T, 53.2 ns
Video Card(s) ASRock Radeon RX 7800 XT Phantom Gaming
Storage Samsung 990 PRO 1 TB, Kingston KC3000 1 TB, Kingston KC3000 2 TB
Case Corsair 7000D Airflow
Audio Device(s) Creative Sound Blaster X-Fi Titanium
Power Supply Seasonic Prime TX-850
Mouse Logitech wireless mouse
Keyboard Logitech wireless keyboard
Excuse me, what other purpose serve these chips except for generating heat? Well, if they power up Microsofts co-pilot-like stuff, LLM and generative AI, that the heat is better purpose. As they say in GoT: "Winter is coming".
 
Joined
Nov 15, 2005
Messages
1,011 (0.15/day)
Processor 2500K @ 4.5GHz 1.28V
Motherboard ASUS P8P67 Deluxe
Cooling Corsair A70
Memory 8GB (2x4GB) Corsair Vengeance 1600 9-9-9-24 1T
Video Card(s) eVGA GTX 470
Storage Crucial m4 128GB + Seagate RAID 1 (1TB x 2)
Display(s) Dell 22" 1680x1050 nothing special
Case Antec 300
Audio Device(s) Onboard
Power Supply PC Power & Cooling 750W
Software Windows 7 64bit Pro
Anyone else notice the towel at the bottom of the radiator?
 
Joined
Sep 29, 2020
Messages
144 (0.09/day)
This is what happens when you take the easy option and do not make architectural changes and smarter designs, and just overclock and over volt for a "free upgrade".
What smart "architectural changes" would you make? Be specific, with calculated details on their effects on manufacturing costs, yield rates, and power:performance ratios.
 
Joined
Dec 28, 2012
Messages
3,877 (0.89/day)
System Name Skunkworks 3.0
Processor 5800x3d
Motherboard x570 unify
Cooling Noctua NH-U12A
Memory 32GB 3600 mhz
Video Card(s) asrock 6800xt challenger D
Storage Sabarent rocket 4.0 2TB, MX 500 2TB
Display(s) Asus 1440p144 27"
Case Old arse cooler master 932
Power Supply Corsair 1200w platinum
Mouse *squeak*
Keyboard Some old office thing
Software Manjaro
This is what happens when you take the easy option and do not make architectural changes and smarter designs, and just overclock and over volt for a "free upgrade". NV haven't made any major architectural updates to their GPU for many years now - they just bolt on more of the same, max it up to the reticle limit, then OC it to meet the performance goal. Very cheap and fast to do, but we end up with this monstrosity.

NV will need to actually come up with a new architecture to move the needle on the next chip, as TSMC is at their limits now, and nothing new that can manufacture a GPU at this size for NV is close for at least another 2 years.

NV really need to separate their AI and GPU business and make optimized versions of each.
No arch changes? Really? You saying that ampere, ada, and pascal are the same now?

:laugh::roll::laugh::banghead::laugh::roll::laugh:

So how long before the cooling needs of our AI datacenters can provide steam turbine power for our industry needs to provide more AI power to power our AI overlords?
Sadly never, because these chips dont have anywhere near the thermal output or max temperature needed to make high pressure steam.
 
Joined
Jan 3, 2021
Messages
3,491 (2.46/day)
Location
Slovenia
Processor i5-6600K
Motherboard Asus Z170A
Cooling some cheap Cooler Master Hyper 103 or similar
Memory 16GB DDR4-2400
Video Card(s) IGP
Storage Samsung 850 EVO 250GB
Display(s) 2x Oldell 24" 1920x1200
Case Bitfenix Nova white windowless non-mesh
Audio Device(s) E-mu 1212m PCI
Power Supply Seasonic G-360
Mouse Logitech Marble trackball, never had a mouse
Keyboard Key Tronic KT2000, no Win key because 1994
Software Oldwin
Anyone else notice the towel at the bottom of the radiator?
I'm afraid this is not even a radiator, just a water-water heat exchanger. The thick pipes at the top connect to the really big radiator outside the building.
 
Joined
Nov 13, 2007
Messages
10,753 (1.73/day)
Location
Austin Texas
System Name stress-less
Processor 9800X3D @ 5.42GHZ
Motherboard MSI PRO B650M-A Wifi
Cooling Thermalright Phantom Spirit EVO
Memory 64GB DDR5 6400 CL30 / 2133 fclk
Video Card(s) RTX 4090 FE
Storage 2TB WD SN850, 4TB WD SN850X
Display(s) Alienware 32" 4k 240hz OLED
Case Jonsbo Z20
Audio Device(s) Yes
Power Supply Corsair SF750
Mouse DeathadderV2 X Hyperspeed
Keyboard 65% HE Keyboard
Software Windows 11
Benchmark Scores They're pretty good, nothing crazy.
I'm afraid this is not even a radiator, just a water-water heat exchanger. The thick pipes at the top connect to the really big radiator outside the building.
AFAIK the Cornell datacenter uses one of the finger lakes as a resevoir for the second part of that -- I'm sure there are others that do this.
 
Joined
Jul 14, 2020
Messages
36 (0.02/day)
System Name SoppingPC
Processor I5-14600kf @ 5.8ghz all P_cores & 4.5mhz all E_cores undervolted
Motherboard MSI z790 DDR5 Wifi Gaming etc
Cooling Arctic Liquid Freezer II 360mm radiator A-RGB CPU Cooler
Memory 64gb (2x32gb) g.skill cl30 ddr5 xmp 6400 cl32. @ 6600 cl30
Video Card(s) Asus RTX 4090 se with 2x HDMI 2.1b + 2xDP 1.4a and 1x DP 2.1@ 2910-3060mhz core +1000mem
Storage Samsung 980 Pro 2tb NVMe, Corsair MP600 2tb NVMe, and 6x other drives
Display(s) LG C3 42" via HDMI and eARC
Case NZXT H7 elite premium white. With replaced rear 140mm case exhaust
Audio Device(s) Samsung HW-Q930D atmos eARC 2.1 soundbar, sub, rear Atmos speakers. 17 total speakers
Power Supply Corsair HX1000
Mouse Logitech 502x Gaming wireless. Powerplay mat
Keyboard Logitech G915 wireless
VR HMD Oculus Rift @ 500% graphic upscale
Software Windows 11
Benchmark Scores Very damn good
Get a diploma in AI refrigeration mechanical engineering maintenance for new multi-point failure water cooling server farms. AI . It's hip!
 
Top