• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

OpenAI Has "Run Out of GPUs" - Sam Altman Mentions Incoming Delivery of "Tens of Thousands"

... To be totally fair though, all of those have MASSIVE usecases...
True. :laugh:
But as a side note, government agencies have another kind of massive use cases as well. Face recognition, all kinds of surveillance, tracking of individuals, etc.
 
True. :laugh:
But as a side note, government agencies have another kind of massive use cases as well. Face recognition, all kinds of surveillance, tracking of individuals, etc.
I'm aware. I don't think it's worth it for the government to do that, though. Anti-human corporations that have the government as a customer, on the other hand...
 
They know they can use Radeon Instincts instead, right? LLMs don't need the image generation power of NVIDIA hardware, they could just put all the stuff doing inference on Instincts and reserve the NVIDIA hardware for training. Unless they need more density, then like, what? Are you training a 1T+ model or something?

... And that's a GOOD thing!
And this is is ot anywhere close to being good thing. This means even less chips for the consumer market. They will ramp up production of enterprise GPUs, or try to sell even more (might be the entire stockpile) of gamers/consumer videocards to the AI users.
OMFG what a nightmare. If this ROP issue is way bigger than being communicated, that's a LOT of defects.
Around every two dozen cards missing the 8 ROPs or whatever is a whole ass other flagship card missing.
That's actually expensive.
Actually, this is an issue for even the gamer, that is thirsty for every single FPS. Now put this in the grand scheme of thousands of GPUS connected in the rendering clusters,, and the lack of even eight ROPs, from at least a part of the GPUs, and it will become a significant performance hit. Also, there's indeed a lot of missing compute blocks missing for the money already paid for.
Yep I think the last figures I heard were they spent nine billion in 2024 to make four billion in revenue. Even their $200/month subscriptions are losing them money.

The main issue with the current AI stuff is the utility of it just isn't there for anything other than psychotic chatbots, pictures of big-titted elf girls, and source code analysis.
Nothing new. This is like those dumb stubborn miners, which refuse to sell their gaming GPU mining farms, earning couple cents per day, and wasting bucks worth of electric power, and waiting for the next "comming" of the mining boom.
Also, the US energy company, is going to restart the reactor of old closed nuclear plant, to satisfy AI "demands".
 
Impressed how technologies advanced during last 30 years!

Simply to note, in 1995 I had my first desktop computer, used 100% for software development, with 32 MB of memory and Windows 95 PE operating system, with a price tag of 2,500 USD.

Just to compare, In 2002 I paid for my first car 1,500 USD. The computer was more expensive than the car!
 
I see them begging Nvidia for GPUs, but probably Nvidia has given priority to Musk, because, well, he is in the government, and Microsoft and Google and Meta. Add Deepseek in the mixture and OpenAI from leader in AI could drop many places.
Also OpenAI thinking of building it's own chips and Altman being in bad relations with Musk, who let me repeat is in the government, well OpenAI probably gets the last place in the priority list of Nvidia's top customers. They either differenciate themselves, or keep begging.
And he also broke the scaling law limits paving the way for higher gpu cluster counts, 100K and up now possible for EVERYONE.

Jensen is like ya boy, cha-ching!
 
They're talking out their asses. "Even the most powerful supercomputers in the world don't scale to millions of GPUs. For instance, the fastest operational supercomputer right now, Frontier, "only" has 37,888 GPUs."


 
They're talking out their asses. "Even the most powerful supercomputers in the world don't scale to millions of GPUs. For instance, the fastest operational supercomputer right now, Frontier, "only" has 37,888 GPUs."


And then Deepseek happens and they look like a bunch of kids, programming in Basic.
 

AMD can talk about anything and even about a supercomputer with 10 million GPUs. The problem is Not the number of GPUs. The problem is Power!

This is how much power is needed for three exascale supercomputers already deployed:
  • El Capitan: Approximately 30 MW
  • Frontier: Around 21 MW
  • Aurora: About 38.7 MW
Also, there is another problem of reliability of that "fictional" supersystem with 10 million GPUs. On average, a Mean Time Before Failure ( MTBF ) on these three exascale supercomputers is just several hours.

For example, Are we going to be happy if our cars would have MTBF just 3 or 4 hours?
 
AMD can talk about anything and even about a supercomputer with 10 million GPUs. The problem is Not the number of GPUs. The problem is Power!

This is how much power is needed for three exascale supercomputers already deployed:
  • El Capitan: Approximately 30 MW
  • Frontier: Around 21 MW
  • Aurora: About 38.7 MW
Also, there is another problem of reliability of that "fictional" supersystem with 10 million GPUs. On average, a Mean Time Before Failure ( MTBF ) on these three exascale supercomputers is just several hours.

For example, Are we going to be happy if our cars would have MTBF just 3 or 4 hours?
Power yes, that's a problem. Temps also. Nvidia knows it, they removed the hot spot sensor on Blackwell. But MTBF I doubt. Are the servers going of line until the (for example) GPU that failed to be replaced? I don't know how super computers work, but I guess only a node is getting out of the system. To use your example with cars, in a small town with 40000 cars, probably the MTBF is also a few hours, but only one person/family is affected, the one driving the car that failed.


PS Things evolve. If we where doing this conversation 10 years ago, we would be talking about how unrealistic is a supercomputer with the specs of El Capitan.
 
AMD can talk about anything and even about a supercomputer with 10 million GPUs. The problem is Not the number of GPUs. The problem is Power!

This is how much power is needed for three exascale supercomputers already deployed:
  • El Capitan: Approximately 30 MW
  • Frontier: Around 21 MW
  • Aurora: About 38.7 MW
Also, there is another problem of reliability of that "fictional" supersystem with 10 million GPUs. On average, a Mean Time Before Failure ( MTBF ) on these three exascale supercomputers is just several hours.

For example, Are we going to be happy if our cars would have MTBF just 3 or 4 hours?

You don't seem to understand what MTBF on machines this large with redundancies mean. You don't subtract them when there is failover fault tolerant redundancy, you determine what single component would cause a system level failure and then for every redundant component the MTBF goes up but not exponentially.


Your example would have cars/computers having a complete failure where it was nonfunctional based on 10 million batteries/GPU's where one might individually fail but the rest take over the load with no appreciable loss in performance.
 
You don't seem to understand what MTBF on machines this large with redundancies mean. You don't subtract them when there is failover fault tolerant redundancy, you determine what single component would cause a system level failure and then for every redundant component the MTBF goes up but not exponentially.


Your example would have cars/computers having a complete failure where it was nonfunctional based on 10 million batteries/GPU's where one might individually fail but the rest take over the load with no appreciable loss in performance.

>>...You don't seem to understand what MTBF on machines this large with redundancies mea

Get in touch with Software Engineers and Computer Scientists working on Frontier and Aurora exascale supercomputers! Then, talk to them, and you'll see that situation is Not so simple and they Constantly have reliability problems.

My point of view and statements are based on real communications with professionals from both teams. I'm a C/C++ Software Engineer and really understand the scope of the problem on these machines.

If you're a Software Developer you could even apply for a Small Node allocation for a middle size project and you'll see how it works in the real life. It is Not so simple.
 
And then Deepseek happens and they look like a bunch of kids, programming in Basic.
You missed the point clearly and then deflect, got it.
 
Yea, that's what you are doing here, I agree.
At first I thought you kind of knew what you were saying, but really you're just a clown googling crap w/o a clue.
 
At first I thought you kind of knew what you were saying, but really you're just a clown googling crap w/o a clue.
You might just explain your point, that you think I missed, instead of throwing a derogatory reply followed by a second reply with direct and clear -so I don't miss the point - insults.

You might just explain your point, that you think I missed, instead of throwing a derogatory reply followed by a second reply with direct and clear -so I don't miss the point - insults.
No reply. I Thought so. You are only insults and nothing more. And the thing is that I didn't missed your point (it can be seen in another post of mine). You did.
 
You might just explain your point, that you think I missed, instead of throwing a derogatory reply followed by a second reply with direct and clear -so I don't miss the point - insults.


No reply. I Thought so. You are only insults and nothing more. And the thing is that I didn't missed your point (it can be seen in another post of mine). You did.
Seriously, you're a strange one. You only seem to one up shit. I've written things out clearly, if you actually read it and the info, you wouldn't be acting like a clown now.
 
Seriously, you're a strange one. You only seem to one up shit. I've written things out clearly, if you actually read it and the info, you wouldn't be acting like a clown now.
Nothing strange here. You keep calling me clown. OK. You are the serious one here. Are you going to keep throwing insults, or are you going to prove that you are the serious one here and explain your point? It's your choice. You can do either that, or throw some more insults and an excuse to avoid replying.


In the end, the only one having a party here, is that 10 years old kid that is taking advantage of your posts.
 
Back
Top