Friday, October 18th 2024

Meta Shows Open-Architecture NVIDIA "Blackwell" GB200 System for Data Center

During the Open Compute Project (OCP) Summit 2024, Meta, one of the prime members of the OCP project, showed its NVIDIA "Blackwell" GB200 systems for its massive data centers. We previously covered Microsoft's Azure server rack with GB200 GPUs featuring one-third of the rack space for computing and two-thirds for cooling. A few days later, Google showed off its smaller GB200 system, and today, Meta is showing off its GB200 system—the smallest of the bunch. To train a dense transformer large language model with 405B parameters and a context window of up to 128k tokens, like the Llama 3.1 405B, Meta must redesign its data center infrastructure to run a distributed training job on two 24,000 GPU clusters. That is 48,000 GPUs used for training a single AI model.

Called "Catalina," it is built on the NVIDIA Blackwell platform, emphasizing modularity and adaptability while incorporating the latest NVIDIA GB200 Grace Blackwell Superchip. To address the escalating power requirements of GPUs, Catalina introduces the Orv3, a high-power rack capable of delivering up to 140kW. The comprehensive liquid-cooled setup encompasses a power shelf supporting various components, including a compute tray, switch tray, the Orv3 HPR, Wedge 400 fabric switch with 12.8 Tbps switching capacity, management switch, battery backup, and a rack management controller. Interestingly, Meta also upgraded its "Grand Teton" system for internal usage, such as deep learning recommendation models (DLRMs) and content understanding with AMD Instinct MI300X. Those are used to inference internal models, and MI300X appears to provide the best performance per Dollar for inference. According to Meta, the computational demand stemming from AI will continue to increase exponentially, so more NVIDIA and AMD GPUs is needed, and we can't wait to see what the company builds.
Source: Meta
Add your own comment

11 Comments on Meta Shows Open-Architecture NVIDIA "Blackwell" GB200 System for Data Center

#1
MacZ
we can't wait to see what the company builds.
We have yet to see what productivity improvement all these investments lead to.

Right now, this seems to be an exercise in buying as much hardware as possible.

And as for AI for the consumers :
- It didn't do much for the iPhones of Apple it seems.
- Nobody forced electricity on people for it to succeed.
- Nobody forced the car on people for it to succeed.

Just saying.
Posted on Reply
#2
Aquinus
Resident Wat-man
MacZRight now, this seems to be an exercise in buying as much hardware as possible.
...and roasting as much power as possible on it.
AleksandarKa high-power rack capable of delivering up to 140kW.
You could run a neighborhood off of 140kW. The average US household uses 30kWh a day. 140kW sustained for 24 hours is 3,360kWh. That's 112 average households that fits within that power budget assuming they are all drawing at a constant rate over time. Either way, you get the idea.
Posted on Reply
#3
igormp
MacZWe have yet to see what productivity improvement all these investments lead to.

Right now, this seems to be an exercise in buying as much hardware as possible.
Meta does tons of research and their team does make use of those GPUs quite a lot. All the Llama models out there, along with the tons of models they made freely available are proof of that:
huggingface.co/facebook
huggingface.co/meta-llama
Posted on Reply
#4
MacZ
igormpMeta does tons of research and their team does make use of those GPUs quite a lot. All the Llama models out there, along with the tons of models they made freely available are proof of that:
huggingface.co/facebook
huggingface.co/meta-llama
Money spent and money returned.

Spending a lot of money is the easy part. Everybody helps you to do it.

Meta also spent a ton of money on the Metaverse. How is it going ?
Posted on Reply
#5
igormp
MacZMoney spent and money returned.

Spending a lot of money is the easy part. Everybody helps you to do it.

Meta also spent a ton of money on the Metaverse. How is it going ?
Their financials seem to be doing great:
investor.fb.com/investor-news/press-release-details/2024/Meta-Reports-Second-Quarter-2024-Results/default.aspx

I don't think it's as easy as saying "we have X many more GPUs, our revenue increased Y% because of that".
R&D is hard to measure in such way.
Posted on Reply
#6
MacZ
igormpTheir financials seem to be doing great:
investor.fb.com/investor-news/press-release-details/2024/Meta-Reports-Second-Quarter-2024-Results/default.aspx

I don't think it's as easy as saying "we have X many more GPUs, our revenue increased Y% because of that".
R&D is hard to measure in such way.
That's why I asked what productivity improvement was made.

And your lack of answer to this simple question means it is, for now, none.

What we have is a lot of FOMO investing throwing money around, hoping something sticks and that their money sticks to the next Amazon and not the next pets.com.

ML is very useful in a lot of domains (the IRS used it to detect fraud to great effect recently). But LLMs are very resource intensive without any clear path to profit.

The AI gnomes :
1. LLM
2. ???
3. Profit

For some reason, when people/websites talk about AI, they only talk about LLMs.
Posted on Reply
#7
igormp
MacZThat's why I asked what productivity improvement was made.

And your lack of answer to this simple question means it is, for now, none.
My lack of a proper answer is simply because I do not work at Meta, I can't say what difference it makes internally.
MacZML is very useful in a lot of domains (the IRS used it to detect fraud to great effect recently). But LLMs are very resource intensive without any clear path to profit.
Meta works in many models other than LLMs, just see their hugging face page. I've personally used many of their vision models.
In their own post, they do mention how this platform will be used for their recommendation models, which I'm pretty sure is what drives most of their earnings in one way or another. But, again, I can't put those into numbers since I don't work there.
Posted on Reply
#8
MacZ
igormpMy lack of a proper answer is simply because I do not work at Meta, I can't say what difference it makes internally.
If any LLMs made any real difference, it would be public knowledge by now, because there are a lot of models freely or cheaply available.
igormpMeta works in many models other than LLMs, just see their hugging face page. I've personally used many of their vision models.
In their own post, they do mention how this platform will be used for their recommendation models, which I'm pretty sure is what drives most of their earnings in one way or another. But, again, I can't put those into numbers since I don't work there.
Yet I doubt a recommandation model requires 50,000 GPUs.
Posted on Reply
#9
Zareek
The new way to waste massive amounts of energy!
Posted on Reply
#10
Solaris17
Super Dainty Moderator
MacZIf any LLMs made any real difference, it would be public knowledge by now, because there are a lot of models freely or cheaply available.



Yet I doubt a recommandation model requires 50,000 GPUs.
It doesnt matter. Your basis is fundamentally flawed. You compared Meta AI to the likes of:
MacZAnd as for AI for the consumers :
- It didn't do much for the iPhones of Apple it seems.
- Nobody forced electricity on people for it to succeed.
- Nobody forced the car on people for it to succeed.
But Apple told you Siri was useful, Google told you AI was useful, Microsoft told you Cortana was useful.

Meta is making AI for themselves and there eco system.



They dont owe you anything. They want it to be more, but they arent "shoving it in your face." I dont have Llama on my phone, I dont have Meta AI on my PC, It isnt trying to control my thermostat.

Besides as mentioned, they also give a lot back as the article suggests. Not only do they provide there pre-trained models, but all the hardware is OCP. Your not paying Dell, Nvidia, HPE etc to model your DC.

www.opencompute.org/search?site_search%5Bquery%5D=meta

github.com/facebook

I mean I dont use any of there products, but if your gonna compare your off base.
Posted on Reply
#11
MacZ
Solaris17they dont owe you anything. They want it to be more, but they arent "shoving it in your face." I dont have Llama on my phone, I dont have Meta AI on my PC, It isnt trying to control my thermostat.

Besides as mentioned, they also give a lot back as the article suggests. Not only do they provide there pre-trained models, but all the hardware is OCP. Your not paying Dell, Nvidia, HPE etc to model your DC.
Yeah, no.

What the AI gnomes equation is :

Phase 1 : LLM
Phase 2a : Detect what speech is unwanted and has to be censored (by who ?)
Phase 2b : Inject new speech that serves the purpose (of who ?)
Phase 3 : profit by not being fined/regulated into oblivion.

And that aligns perfectly with MS trying its most to force Recall on its users, making damn sure that it will be next to impossible to remove.

No for profit company will "give back" anything for free, especially if it could be used by competitors.

Enjoy the shreds of democracy that are left.
Posted on Reply
Add your own comment
Dec 11th, 2024 20:30 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts