• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Meta Shows Open-Architecture NVIDIA "Blackwell" GB200 System for Data Center

AleksandarK

News Editor
Staff member
Joined
Aug 19, 2017
Messages
2,742 (1.01/day)
During the Open Compute Project (OCP) Summit 2024, Meta, one of the prime members of the OCP project, showed its NVIDIA "Blackwell" GB200 systems for its massive data centers. We previously covered Microsoft's Azure server rack with GB200 GPUs featuring one-third of the rack space for computing and two-thirds for cooling. A few days later, Google showed off its smaller GB200 system, and today, Meta is showing off its GB200 system—the smallest of the bunch. To train a dense transformer large language model with 405B parameters and a context window of up to 128k tokens, like the Llama 3.1 405B, Meta must redesign its data center infrastructure to run a distributed training job on two 24,000 GPU clusters. That is 48,000 GPUs used for training a single AI model.

Called "Catalina," it is built on the NVIDIA Blackwell platform, emphasizing modularity and adaptability while incorporating the latest NVIDIA GB200 Grace Blackwell Superchip. To address the escalating power requirements of GPUs, Catalina introduces the Orv3, a high-power rack capable of delivering up to 140kW. The comprehensive liquid-cooled setup encompasses a power shelf supporting various components, including a compute tray, switch tray, the Orv3 HPR, Wedge 400 fabric switch with 12.8 Tbps switching capacity, management switch, battery backup, and a rack management controller. Interestingly, Meta also upgraded its "Grand Teton" system for internal usage, such as deep learning recommendation models (DLRMs) and content understanding with AMD Instinct MI300X. Those are used to inference internal models, and MI300X appears to provide the best performance per Dollar for inference. According to Meta, the computational demand stemming from AI will continue to increase exponentially, so more NVIDIA and AMD GPUs is needed, and we can't wait to see what the company builds.



View at TechPowerUp Main Site | Source
 
Joined
Jun 29, 2023
Messages
107 (0.19/day)
we can't wait to see what the company builds.

We have yet to see what productivity improvement all these investments lead to.

Right now, this seems to be an exercise in buying as much hardware as possible.

And as for AI for the consumers :
- It didn't do much for the iPhones of Apple it seems.
- Nobody forced electricity on people for it to succeed.
- Nobody forced the car on people for it to succeed.

Just saying.
 

Aquinus

Resident Wat-man
Joined
Jan 28, 2012
Messages
13,173 (2.78/day)
Location
Concord, NH, USA
System Name Apollo
Processor Intel Core i9 9880H
Motherboard Some proprietary Apple thing.
Memory 64GB DDR4-2667
Video Card(s) AMD Radeon Pro 5600M, 8GB HBM2
Storage 1TB Apple NVMe, 4TB External
Display(s) Laptop @ 3072x1920 + 2x LG 5k Ultrafine TB3 displays
Case MacBook Pro (16", 2019)
Audio Device(s) AirPods Pro, Sennheiser HD 380s w/ FIIO Alpen 2, or Logitech 2.1 Speakers
Power Supply 96w Power Adapter
Mouse Logitech MX Master 3
Keyboard Logitech G915, GL Clicky
Software MacOS 12.1
Right now, this seems to be an exercise in buying as much hardware as possible.
...and roasting as much power as possible on it.
a high-power rack capable of delivering up to 140kW.
You could run a neighborhood off of 140kW. The average US household uses 30kWh a day. 140kW sustained for 24 hours is 3,360kWh. That's 112 average households that fits within that power budget assuming they are all drawing at a constant rate over time. Either way, you get the idea.
 
Joined
May 10, 2023
Messages
528 (0.84/day)
Location
Brazil
Processor 5950x
Motherboard B550 ProArt
Cooling Fuma 2
Memory 4x32GB 3200MHz Corsair LPX
Video Card(s) 2x RTX 3090
Display(s) LG 42" C2 4k OLED
Power Supply XPG Core Reactor 850W
Software I use Arch btw
We have yet to see what productivity improvement all these investments lead to.

Right now, this seems to be an exercise in buying as much hardware as possible.
Meta does tons of research and their team does make use of those GPUs quite a lot. All the Llama models out there, along with the tons of models they made freely available are proof of that:
 
Joined
Jun 29, 2023
Messages
107 (0.19/day)
Meta does tons of research and their team does make use of those GPUs quite a lot. All the Llama models out there, along with the tons of models they made freely available are proof of that:

Money spent and money returned.

Spending a lot of money is the easy part. Everybody helps you to do it.

Meta also spent a ton of money on the Metaverse. How is it going ?
 
Joined
May 10, 2023
Messages
528 (0.84/day)
Location
Brazil
Processor 5950x
Motherboard B550 ProArt
Cooling Fuma 2
Memory 4x32GB 3200MHz Corsair LPX
Video Card(s) 2x RTX 3090
Display(s) LG 42" C2 4k OLED
Power Supply XPG Core Reactor 850W
Software I use Arch btw
Money spent and money returned.

Spending a lot of money is the easy part. Everybody helps you to do it.

Meta also spent a ton of money on the Metaverse. How is it going ?
Their financials seem to be doing great:

I don't think it's as easy as saying "we have X many more GPUs, our revenue increased Y% because of that".
R&D is hard to measure in such way.
 
Joined
Jun 29, 2023
Messages
107 (0.19/day)
Their financials seem to be doing great:

I don't think it's as easy as saying "we have X many more GPUs, our revenue increased Y% because of that".
R&D is hard to measure in such way.

That's why I asked what productivity improvement was made.

And your lack of answer to this simple question means it is, for now, none.

What we have is a lot of FOMO investing throwing money around, hoping something sticks and that their money sticks to the next Amazon and not the next pets.com.

ML is very useful in a lot of domains (the IRS used it to detect fraud to great effect recently). But LLMs are very resource intensive without any clear path to profit.

The AI gnomes :
1. LLM
2. ???
3. Profit

For some reason, when people/websites talk about AI, they only talk about LLMs.
 
Last edited:
Joined
May 10, 2023
Messages
528 (0.84/day)
Location
Brazil
Processor 5950x
Motherboard B550 ProArt
Cooling Fuma 2
Memory 4x32GB 3200MHz Corsair LPX
Video Card(s) 2x RTX 3090
Display(s) LG 42" C2 4k OLED
Power Supply XPG Core Reactor 850W
Software I use Arch btw
That's why I asked what productivity improvement was made.

And your lack of answer to this simple question means it is, for now, none.
My lack of a proper answer is simply because I do not work at Meta, I can't say what difference it makes internally.

ML is very useful in a lot of domains (the IRS used it to detect fraud to great effect recently). But LLMs are very resource intensive without any clear path to profit.
Meta works in many models other than LLMs, just see their hugging face page. I've personally used many of their vision models.
In their own post, they do mention how this platform will be used for their recommendation models, which I'm pretty sure is what drives most of their earnings in one way or another. But, again, I can't put those into numbers since I don't work there.
 
Joined
Jun 29, 2023
Messages
107 (0.19/day)
My lack of a proper answer is simply because I do not work at Meta, I can't say what difference it makes internally.

If any LLMs made any real difference, it would be public knowledge by now, because there are a lot of models freely or cheaply available.

Meta works in many models other than LLMs, just see their hugging face page. I've personally used many of their vision models.
In their own post, they do mention how this platform will be used for their recommendation models, which I'm pretty sure is what drives most of their earnings in one way or another. But, again, I can't put those into numbers since I don't work there.

Yet I doubt a recommandation model requires 50,000 GPUs.
 
Joined
Apr 18, 2019
Messages
947 (0.45/day)
Location
The New England region of the United States
System Name Gaming Rig
Processor Ryzen 7 3800X
Motherboard Gigabyte X570 Aurus Pro Wifi
Cooling Noctua NH-D15 chromax.black
Memory 32GB(2x16GB) Patriot Viper DDR4-3200C16
Video Card(s) EVGA RTX 3060 Ti
Storage Samsung 970 EVO Plus 1TB (Boot/OS)|Hynix Platinum P41 2TB (Games)
Display(s) Gigabyte G27F
Case Corsair Graphite 600T w/mesh side
Audio Device(s) Logitech Z625 2.1 | cheapo gaming headset when mic is needed
Power Supply Corsair HX850i
Mouse Redragon M808-KS Storm Pro (Great Value)
Keyboard Redragon K512 Shiva replaced a Corsair K70 Lux - Blue on Black
VR HMD Nope
Software Windows 11 Pro x64
Benchmark Scores Nope
The new way to waste massive amounts of energy!
 

Solaris17

Super Dainty Moderator
Staff member
Joined
Aug 16, 2005
Messages
27,191 (3.83/day)
Location
Alabama
System Name RogueOne
Processor Xeon W9-3495x
Motherboard ASUS w790E Sage SE
Cooling SilverStone XE360-4677
Memory 128gb Gskill Zeta R5 DDR5 RDIMMs
Video Card(s) MSI SUPRIM Liquid X 4090
Storage 1x 2TB WD SN850X | 2x 8TB GAMMIX S70
Display(s) 49" Philips Evnia OLED (49M2C8900)
Case Thermaltake Core P3 Pro Snow
Audio Device(s) Moondrop S8's on schitt Gunnr
Power Supply Seasonic Prime TX-1600
Mouse Razer Viper mini signature edition (mercury white)
Keyboard Monsgeek M3 Lavender, Moondrop Luna lights
VR HMD Quest 3
Software Windows 11 Pro Workstation
Benchmark Scores I dont have time for that.
If any LLMs made any real difference, it would be public knowledge by now, because there are a lot of models freely or cheaply available.



Yet I doubt a recommandation model requires 50,000 GPUs.

It doesnt matter. Your basis is fundamentally flawed. You compared Meta AI to the likes of:

And as for AI for the consumers :
- It didn't do much for the iPhones of Apple it seems.
- Nobody forced electricity on people for it to succeed.
- Nobody forced the car on people for it to succeed.

But Apple told you Siri was useful, Google told you AI was useful, Microsoft told you Cortana was useful.

Meta is making AI for themselves and there eco system.

1729292638664.png


They dont owe you anything. They want it to be more, but they arent "shoving it in your face." I dont have Llama on my phone, I dont have Meta AI on my PC, It isnt trying to control my thermostat.

Besides as mentioned, they also give a lot back as the article suggests. Not only do they provide there pre-trained models, but all the hardware is OCP. Your not paying Dell, Nvidia, HPE etc to model your DC.



I mean I dont use any of there products, but if your gonna compare your off base.
 
Joined
Jun 29, 2023
Messages
107 (0.19/day)
they dont owe you anything. They want it to be more, but they arent "shoving it in your face." I dont have Llama on my phone, I dont have Meta AI on my PC, It isnt trying to control my thermostat.

Besides as mentioned, they also give a lot back as the article suggests. Not only do they provide there pre-trained models, but all the hardware is OCP. Your not paying Dell, Nvidia, HPE etc to model your DC.

Yeah, no.

What the AI gnomes equation is :

Phase 1 : LLM
Phase 2a : Detect what speech is unwanted and has to be censored (by who ?)
Phase 2b : Inject new speech that serves the purpose (of who ?)
Phase 3 : profit by not being fined/regulated into oblivion.

And that aligns perfectly with MS trying its most to force Recall on its users, making damn sure that it will be next to impossible to remove.

No for profit company will "give back" anything for free, especially if it could be used by competitors.

Enjoy the shreds of democracy that are left.
 
Top