OpenAI Has "Run Out of GPUs" - Sam Altman Mentions Incoming Delivery of "Tens of Thousands"

Rover4444 · Mar 1, 2025

Beermotor said:
The main issue with the current AI stuff is the utility of it just isn't there for anything other than psychotic chatbots, pictures of big-titted elf girls, and source code analysis.

... To be totally fair though, all of those have MASSIVE usecases...

Veseleil · Mar 1, 2025

Rover4444 said:
... To be totally fair though, all of those have MASSIVE usecases...

True.

But as a side note, government agencies have another kind of massive use cases as well. Face recognition, all kinds of surveillance, tracking of individuals, etc.

Rover4444 · Mar 1, 2025

Veseleil said:
True.
But as a side note, government agencies have another kind of massive use cases as well. Face recognition, all kinds of surveillance, tracking of individuals, etc.

I'm aware. I don't think it's worth it for the government to do that, though. Anti-human corporations that have the government as a customer, on the other hand...

Random_User · Mar 1, 2025

Rover4444 said:
They know they can use Radeon Instincts instead, right? LLMs don't need the image generation power of NVIDIA hardware, they could just put all the stuff doing inference on Instincts and reserve the NVIDIA hardware for training. Unless they need more density, then like, what? Are you training a 1T+ model or something?

... And that's a GOOD thing!

And this is is ot anywhere close to being good thing. This means even less chips for the consumer market. They will ramp up production of enterprise GPUs, or try to sell even more (might be the entire stockpile) of gamers/consumer videocards to the AI users.

DaemonForce said:
OMFG what a nightmare. If this ROP issue is way bigger than being communicated, that's a LOT of defects.
Around every two dozen cards missing the 8 ROPs or whatever is a whole ass other flagship card missing.
That's actually expensive.

Actually, this is an issue for even the gamer, that is thirsty for every single FPS. Now put this in the grand scheme of thousands of GPUS connected in the rendering clusters,, and the lack of even eight ROPs, from at least a part of the GPUs, and it will become a significant performance hit. Also, there's indeed a lot of missing compute blocks missing for the money already paid for.

Beermotor said:
Yep I think the last figures I heard were they spent nine billion in 2024 to make four billion in revenue. Even their $200/month subscriptions are losing them money.

The main issue with the current AI stuff is the utility of it just isn't there for anything other than psychotic chatbots, pictures of big-titted elf girls, and source code analysis.

Nothing new. This is like those dumb stubborn miners, which refuse to sell their gaming GPU mining farms, earning couple cents per day, and wasting bucks worth of electric power, and waiting for the next "comming" of the mining boom.
Also, the US energy company, is going to restart the reactor of old closed nuclear plant, to satisfy AI "demands".

boidsonly · Mar 1, 2025

But what about us Lumpenproletariats?

ScaLibBDP · Mar 1, 2025

Impressed how technologies advanced during last 30 years!

Simply to note, in 1995 I had my first desktop computer, used 100% for software development, with 32 MB of memory and Windows 95 PE operating system, with a price tag of 2,500 USD.

Just to compare, In 2002 I paid for my first car 1,500 USD. The computer was more expensive than the car!

thesmokingman · Mar 1, 2025

john_ said:
I see them begging Nvidia for GPUs, but probably Nvidia has given priority to Musk, because, well, he is in the government, and Microsoft and Google and Meta. Add Deepseek in the mixture and OpenAI from leader in AI could drop many places.
Also OpenAI thinking of building it's own chips and Altman being in bad relations with Musk, who let me repeat is in the government, well OpenAI probably gets the last place in the priority list of Nvidia's top customers. They either differenciate themselves, or keep begging.

And he also broke the scaling law limits paving the way for higher gpu cluster counts, 100K and up now possible for EVERYONE.

Jensen is like ya boy, cha-ching!

john_ · Mar 2, 2025

thesmokingman said:
And he also broke the scaling law limits paving the way for higher gpu cluster counts, 100K and up now possible for EVERYONE.

Jensen is like ya boy, cha-ching!

100K is probably just one step for more ridiculous systems
AMD talks 1.2 million GPU AI supercomputer to compete with Nvidia — 30X more GPUs than world's fastest supercomputer

thesmokingman · Mar 2, 2025

john_ said:
100K is probably just one step for more ridiculous systems
AMD talks 1.2 million GPU AI supercomputer to compete with Nvidia — 30X more GPUs than world's fastest supercomputer

They're talking out their asses. "Even the most powerful supercomputers in the world don't scale to millions of GPUs. For instance, the fastest operational supercomputer right now, Frontier, "only" has 37,888 GPUs."

Elon Musk plans to scale the xAI supercomputer to a million GPUs — currently at over 100,000 H100 GPUs and counting

But we don't have a timescale for this lofty ambition.

www.tomshardware.com

john_ · Mar 3, 2025

thesmokingman said:
They're talking out their asses. "Even the most powerful supercomputers in the world don't scale to millions of GPUs. For instance, the fastest operational supercomputer right now, Frontier, "only" has 37,888 GPUs."

Elon Musk plans to scale the xAI supercomputer to a million GPUs — currently at over 100,000 H100 GPUs and counting

But we don't have a timescale for this lofty ambition.

www.tomshardware.com

And then Deepseek happens and they look like a bunch of kids, programming in Basic.

ScaLibBDP · Mar 3, 2025

john_ said:
100K is probably just one step for more ridiculous systems
AMD talks 1.2 million GPU AI supercomputer to compete with Nvidia — 30X more GPUs than world's fastest supercomputer

AMD can talk about anything and even about a supercomputer with 10 million GPUs. The problem is Not the number of GPUs. The problem is Power!

This is how much power is needed for three exascale supercomputers already deployed:

El Capitan: Approximately 30 MW
Frontier: Around 21 MW
Aurora: About 38.7 MW

Also, there is another problem of reliability of that "fictional" supersystem with 10 million GPUs. On average, a Mean Time Before Failure ( MTBF ) on these three exascale supercomputers is just several hours.

For example, Are we going to be happy if our cars would have MTBF just 3 or 4 hours?

john_ · Mar 3, 2025

ScaLibBDP said:
AMD can talk about anything and even about a supercomputer with 10 million GPUs. The problem is Not the number of GPUs. The problem is Power!

This is how much power is needed for three exascale supercomputers already deployed:

El Capitan: Approximately 30 MW

Frontier: Around 21 MW

Aurora: About 38.7 MW

Also, there is another problem of reliability of that "fictional" supersystem with 10 million GPUs. On average, a Mean Time Before Failure ( MTBF ) on these three exascale supercomputers is just several hours.

For example, Are we going to be happy if our cars would have MTBF just 3 or 4 hours?

Power yes, that's a problem. Temps also. Nvidia knows it, they removed the hot spot sensor on Blackwell. But MTBF I doubt. Are the servers going of line until the (for example) GPU that failed to be replaced? I don't know how super computers work, but I guess only a node is getting out of the system. To use your example with cars, in a small town with 40000 cars, probably the MTBF is also a few hours, but only one person/family is affected, the one driving the car that failed.

PS Things evolve. If we where doing this conversation 10 years ago, we would be talking about how unrealistic is a supercomputer with the specs of El Capitan.

Steevo · Mar 3, 2025

ScaLibBDP said:
AMD can talk about anything and even about a supercomputer with 10 million GPUs. The problem is Not the number of GPUs. The problem is Power!

This is how much power is needed for three exascale supercomputers already deployed:

El Capitan: Approximately 30 MW

Frontier: Around 21 MW

Aurora: About 38.7 MW

Also, there is another problem of reliability of that "fictional" supersystem with 10 million GPUs. On average, a Mean Time Before Failure ( MTBF ) on these three exascale supercomputers is just several hours.

For example, Are we going to be happy if our cars would have MTBF just 3 or 4 hours?

You don't seem to understand what MTBF on machines this large with redundancies mean. You don't subtract them when there is failover fault tolerant redundancy, you determine what single component would cause a system level failure and then for every redundant component the MTBF goes up but not exponentially.

Your example would have cars/computers having a complete failure where it was nonfunctional based on 10 million batteries/GPU's where one might individually fail but the rest take over the load with no appreciable loss in performance.

ScaLibBDP · Mar 3, 2025

Steevo said:
You don't seem to understand what MTBF on machines this large with redundancies mean. You don't subtract them when there is failover fault tolerant redundancy, you determine what single component would cause a system level failure and then for every redundant component the MTBF goes up but not exponentially.

Your example would have cars/computers having a complete failure where it was nonfunctional based on 10 million batteries/GPU's where one might individually fail but the rest take over the load with no appreciable loss in performance.

>>...You don't seem to understand what MTBF on machines this large with redundancies mea

Get in touch with Software Engineers and Computer Scientists working on Frontier and Aurora exascale supercomputers! Then, talk to them, and you'll see that situation is Not so simple and they Constantly have reliability problems.

My point of view and statements are based on real communications with professionals from both teams. I'm a C/C++ Software Engineer and really understand the scope of the problem on these machines.

If you're a Software Developer you could even apply for a Small Node allocation for a middle size project and you'll see how it works in the real life. It is Not so simple.

thesmokingman · Mar 3, 2025

john_ said:
And then Deepseek happens and they look like a bunch of kids, programming in Basic.

You missed the point clearly and then deflect, got it.

john_ · Mar 3, 2025

thesmokingman said:
You missed the point clearly and then deflect, got it.

Yea, that's what you are doing here, I agree.

Rover4444 · Mar 3, 2025

thesmokingman said:
You missed the point clearly and then deflect, got it.

Yea, that's what he's doing here, I agree.

thesmokingman · Mar 3, 2025

john_ said:
Yea, that's what you are doing here, I agree.

At first I thought you kind of knew what you were saying, but really you're just a clown googling crap w/o a clue.

john_ · Mar 3, 2025

thesmokingman said:
At first I thought you kind of knew what you were saying, but really you're just a clown googling crap w/o a clue.

You might just explain your point, that you think I missed, instead of throwing a derogatory reply followed by a second reply with direct and clear -so I don't miss the point - insults.

john_ said:
You might just explain your point, that you think I missed, instead of throwing a derogatory reply followed by a second reply with direct and clear -so I don't miss the point - insults.

No reply. I Thought so. You are only insults and nothing more. And the thing is that I didn't missed your point (it can be seen in another post of mine). You did.

thesmokingman · Mar 4, 2025

john_ said:
You might just explain your point, that you think I missed, instead of throwing a derogatory reply followed by a second reply with direct and clear -so I don't miss the point - insults.

No reply. I Thought so. You are only insults and nothing more. And the thing is that I didn't missed your point (it can be seen in another post of mine). You did.

Seriously, you're a strange one. You only seem to one up shit. I've written things out clearly, if you actually read it and the info, you wouldn't be acting like a clown now.

Rover4444 · Mar 4, 2025

Hmm... I think I know which chatbots are in need of a few extra GPUs...

john_ · Mar 4, 2025

thesmokingman said:
Seriously, you're a strange one. You only seem to one up shit. I've written things out clearly, if you actually read it and the info, you wouldn't be acting like a clown now.

Nothing strange here. You keep calling me clown. OK. You are the serious one here. Are you going to keep throwing insults, or are you going to prove that you are the serious one here and explain your point? It's your choice. You can do either that, or throw some more insults and an excuse to avoid replying.

In the end, the only one having a party here, is that 10 years old kid that is taking advantage of your posts.

Rover4444 · Mar 5, 2025

john_ said:
In the end, the only one having a party here, is that 10 years old kid that is taking advantage of your posts.

Hey mom, that's me! That's me!!! I'm famous!!!!!!!!!!

MentalAcetylide · Mar 5, 2025

8tyone said:
The More "Nvidia" You Buy, The More "?" You Save

Fixed that for you.

I guess Huang slept through his Economics 101 lectures & fails to mention anything about "opportunity costs".

System Name	1️Purple Haze 2️Vacuum Box
Processor	1️AMD Ryzen 7 5800X3D (-30 CO) 2️Intel Xeon E3-1241 v3
Motherboard	1️MSI B450 Tomahawk Max 2️Gigabyte GA-Z87X-UD5H
Cooling	1️Dark Rock 4 Pro, P14, P12, T30 case fans 2️212 Evo & P12 PWM PST x2, Arctic P14 & P12 case fans
Memory	1️32GB Ballistix (Micron E 19nm) CL16 @3733MHz 2️32GB HyperX Beast 2400MHz (XMP)
Video Card(s)	1️6900XTXH ASRock OC Formula LM'd & Phanteks T30x3 2️5700XT Sapphire Nitro+ LM'd & Arctic P12x2
Storage	1️ADATA SX8200 Pro 1TB, Toshiba P300 3TB x2 2️Kingston A400 120GB, Fanxiang S500 Pro 256GB
Display(s)	TCL C805 50" 2160p 144Hz VA miniLED, Mi 27" 1440p 165Hz IPS, AOC 24G2U 1080p 144Hz IPS
Case	1️Modded MS Industrial Titan II Pro RGB 2️Heavily Modded Cooler Master Q500L
Audio Device(s)	Audient iD14 MKII, Adam Audio T8Vs, Bloody M550, HiFiMan HE400se, Tascam TM-80, DS4 v2
Power Supply	1️Rosewill Capstone 1000M 2️Enermax Revolution X't 730W (both with P14 fans)
Mouse	Logitech G305, Bloody A91, Amazon basics, Logitech M187
Keyboard	Redragon K530, Bloody B930, Epomaker TH80 SE, BTC 9110
Software	W10LTSC 21H2, PBO2, FF, MusicBee, mpv, ImageGlass, OpenRGB, FanControl, Greenshot, DS4Win, Signal

System Name	Very old, but all I've got ®
Processor	So old, you don't wanna know... Really!

Processor	AMD 5900x
Motherboard	Asus x570 Strix-E
Cooling	Hardware Labs
Memory	G.Skill 4000c17 2x16gb
Video Card(s)	RTX 3090
Storage	Sabrent
Display(s)	Samsung G9
Case	Phanteks 719
Audio Device(s)	Fiio K5 Pro
Power Supply	EVGA 1000 P2
Mouse	Logitech G600
Keyboard	Corsair K95

System Name	3 desktop systems: Gaming / Internet / HTPC
Processor	Ryzen 5 7600 / Ryzen 5 4600G / Ryzen 5 5500
Motherboard	X670E Gaming Plus WiFi / MSI X470 Gaming Plus Max (1) / MSI X470 Gaming Plus Max (2)
Cooling	Aigo ICE 400SE / Segotep T4 / Νoctua U12S
Memory	Kingston FURY Beast 32GB DDR5 6000 / 16GB JUHOR / 32GB G.Skill RIPJAWS 3600 + Aegis 3200
Video Card(s)	ASRock RX 6600 / Vega 7 integrated / Radeon RX 580
Storage	NVMes, ONLY NVMes / NVMes, SATA Storage / NVMe, SATA, external storage
Display(s)	Philips 43PUS8857/12 UHD TV (120Hz, HDR, FreeSync Premium) / 19'' HP monitor + BlitzWolf BW-V5
Case	Sharkoon Rebel 12 / CoolerMaster Elite 361 / Xigmatek Midguard
Audio Device(s)	onboard
Power Supply	Chieftec 850W / Silver Power 400W / Sharkoon 650W
Mouse	CoolerMaster Devastator III Plus / CoolerMaster Devastator / Logitech
Keyboard	CoolerMaster Devastator III Plus / CoolerMaster Devastator / Logitech
Software	Windows 10 / Windows 10&Windows 11 / Windows 10

Processor	AMD 5900x
Motherboard	Asus x570 Strix-E
Cooling	Hardware Labs
Memory	G.Skill 4000c17 2x16gb
Video Card(s)	RTX 3090
Storage	Sabrent
Display(s)	Samsung G9
Case	Phanteks 719
Audio Device(s)	Fiio K5 Pro
Power Supply	EVGA 1000 P2
Mouse	Logitech G600
Keyboard	Corsair K95

System Name	Compy 386
Processor	7800X3D
Motherboard	Asus
Cooling	Air for now.....
Memory	64 GB DDR5 6400Mhz
Video Card(s)	7900XTX 310 Merc
Storage	Samsung 990 2TB, 2 SP 2TB SSDs, 24TB Enterprise drives
Display(s)	55" Samsung 4K HDR
Audio Device(s)	ATI HDMI
Mouse	Logitech MX518
Keyboard	Razer
Software	A lot.
Benchmark Scores	Its fast. Enough.