Wednesday, March 6th 2024

Dr. Lisa Su Responds to TinyBox's Radeon RX 7900 XTX GPU Firmware Problems

The TinyBox AI server system attracted plenty of media attention last week—its creator, George Hotz, decided to build with AMD RDNA 3.0 GPU hardware rather than the expected/traditional choice of CDNA 3.0. Tiny Corp. is a startup firm dealing in neural network frameworks—they currently "write and maintain tinygrad." Hotz & Co. are in the process of assembling rack-mounted 12U TinyBox systems for customers—an individual server houses an AMD EPYC 7532 processor and six XFX Speedster MERC310 Radeon RX 7900 XTX graphics cards. The Tiny Corp. social media account has engaged in numerous NVIDIA vs. AMD AI hardware debates/tirades—Hotz appears to favor the latter, as evidenced in his latest choice of components. ROCm support on Team Red AI Instinct accelerators is fairly mature at this point in time, but a much newer prospect on gaming-oriented graphics cards.

Tiny Corporation's unusual leveraging of Radeon RX 7900 XTX GPUs in a data center configuration has already hit a developmental roadblock. Yesterday, the company's social media account expressed driver-related frustrations in a public forum: "If AMD open sources their firmware, I'll fix their LLVM spilling bug and write a fuzzer for HSA. Otherwise, it's not worth putting tons of effort into fixing bugs on a platform you don't own." Hotz's latest complaint was taken onboard by AMD's top brass—Dr. Lisa Su responded with the following message: "Thanks for the collaboration and feedback. We are all in to get you a good solution. Team is on it." Her software engineers—within a few hours—managed to fling out a set of fixes in Tiny Corporation's direction. Hotz appreciated the quick turnaround, and proceeded to run a model without encountering major stability issues: "AMD sent me an updated set of firmware blobs to try. They are responsive, and there have been big strides in the driver in the last year. It will be good! This training run is almost 5 hours in, hasn't crashed yet." Tiny Corp. drummed up speculation about AMD open sourcing GPU MES firmware—Hotz disclosed that he will be talking (on the phone) to Team Red leadership.
Sources: Lisa Su Tweet, Tom's Hardware
Add your own comment

24 Comments on Dr. Lisa Su Responds to TinyBox's Radeon RX 7900 XTX GPU Firmware Problems

#1
ThrashZone
Hi,
So did AI fail to test the vbios :eek:
Posted on Reply
#2
Dirt Chip
One of the best PR stunt I've seen, this whole TinyBox thing is.
Posted on Reply
#3
bonehead123
HAHAHAHA, I'm soooo ROTFLMAO I can't even think straight :D

But yea, at least Jacket Lady (supposedly) lit a fire & got 'er done, supposedly....
Posted on Reply
#4
john_
Dirt ChipOne of the best PR stunt I've seen, this whole TinyBox thing is.
Tinybox yes, it is. AMD probably tries to do what Nvidia is doing. Sell gaming GPUs for AI.

But publishing that gaming GPUs come with bugs that crash the simulation, is in no way a PR stunt, it's a disaster, even if a fix is build from AMD and applied in a matter of hours.
I would stay away from a solution like this after reading that "crashing" word and consider AMD next year. I mean, we talk how people shouldn't want to become beta testers for Intel's gaming GPUs, why would someone spend 15K to become a beta tester for Tinybox and AMD?
Posted on Reply
#5
mechtech
john_Tinybox yes, it is. AMD probably tries to do what Nvidia is doing. Sell gaming GPUs for AI.

But publishing that gaming GPUs come with bugs that crash the simulation, is in no way a PR stunt, it's a disaster, even if a fix is build from AMD and applied in a matter of hours.
I would stay away from a solution like this after reading that "crashing" word and consider AMD next year. I mean, we talk how people shouldn't want to become beta testers for Intel's gaming GPUs, why would someone spend 15K to become a beta tester for Tinybox and AMD?
Do they?
Posted on Reply
#6
Cheeseball
Not a Potato
As expected from the guy who bypassed the SIM lock on the iPhone and first hacked the PS3.
Posted on Reply
#7
A&P211
ThrashZoneHi,
So did AI fail to test the vbios :eek:
Poor AL, maybe Peggy didnt let him test his PC parts.
Posted on Reply
#8
DeathtoGnomes
was it really necessary to do the dialog over social media?
Posted on Reply
#9
Veseleil
CheeseballAs expected from the guy who bypassed the SIM lock on the iPhone and first hacked the PS3.
George and Lisa would make an interesting couple. :laugh:
Posted on Reply
#10
R0H1T
DeathtoGnomeswas it really necessary to do the dialog over social media?
Yes if you really wanted publicity? Which I guess the former did.
Posted on Reply
#11
Veseleil
DeathtoGnomeswas it really necessary to do the dialog over social media?
What dialog? The guy publicly humiliated AMD for their incompetence, and they responded in the best way possible.
Posted on Reply
#12
Cheeseball
Not a Potato


This is a good sign because if AMD actually allows more access, it can make the RX 7900 XTX a viable alternative to the RTX 4090 (and RTX 3090) for ML/AI usage.

For what it's worth, you can access the scheduler and block scheduling directly (DMA) on an NVIDIA card through CUDA API calls.
Posted on Reply
#13
Patriot
Cheeseball

This is a good sign because if AMD actually allows more access, it can make the RX 7900 XTX a viable alternative to the RTX 4090 (and RTX 3090) for ML/AI usage.

For what it's worth, you can access the scheduler and block scheduling directly (DMA) on an NVIDIA card through CUDA API calls.
This is one of my gripes/worries on the instinct side as well. Too much is done automatically.
Posted on Reply
#14
TechLurker
It's worth noting that the code or firmware TinyBox is hoping to get AMD to open-source is not related to display output and gaming, so those hoping for AMD to open-source their display drivers because of TinyBox are going to be disappointed. It's proprietary core code.

It's also not great optics that TinyBox hasn't even fully validated their systems before starting marketing sales, and is now trying to put the blame on AMD for their lack of proper long-term testing. They're literally trying to just get around paying out the arm for enterprise-grade CDNA cards by going with consumer-grade RDNA cards. And they're not even reference models straight from AMD (which are at least guaranteed to work because it IS reference, but AIB customs which may have their own quirks due to out-of-the-box OC'ing or tweaked components on the card vs Reference. They announced their new product just last month and started initial sales, but haven't even begun testing anything until now and realizing there's some teething issues in using consumer products in an enterprise environment.

If anything, TinyBox should be happy that AMD is even bothering to give them the Enterprise-level treatment of priority service. AMD could just have put them in the prosumer queue for using consumer cards in an enterprise environment. It's not like TinyBox could go anywhere either; Nvidia would demand they cease-and-desist and switch to their enterprise models since they're selling to potential enterprise/datacenter clients (even though running AI via CUDA is permitted on their consumer cards, but only at the consumer/prosumer level), while Intel would shrug and tell them to go all Intel for their All Intel AI PC program in which Intel will begin incorporating AI elements into their CPUs (and IIRC, they don't have a dedicated accelerator equivalent to Instinct or the Nvidia equivalent yet).
Posted on Reply
#15
Dirt Chip
DeathtoGnomeswas it really necessary to do the dialog over social media?
It's all PR imo. No bad press in this case, just more exposure.
Posted on Reply
#16
Σario
TechLurkerIt's worth noting that the code or firmware TinyBox is hoping to get AMD to open-source is not related to display output and gaming, so those hoping for AMD to open-source their display drivers because of TinyBox are going to be disappointed. It's proprietary core code.

It's also not great optics that TinyBox hasn't even fully validated their systems before starting marketing sales, and is now trying to put the blame on AMD for their lack of proper long-term testing. They're literally trying to just get around paying out the arm for enterprise-grade CDNA cards by going with consumer-grade RDNA cards. And they're not even reference models straight from AMD (which are at least guaranteed to work because it IS reference, but AIB customs which may have their own quirks due to out-of-the-box OC'ing or tweaked components on the card vs Reference. They announced their new product just last month and started initial sales, but haven't even begun testing anything until now and realizing there's some teething issues in using consumer products in an enterprise environment.

If anything, TinyBox should be happy that AMD is even bothering to give them the Enterprise-level treatment of priority service. AMD could just have put them in the prosumer queue for using consumer cards in an enterprise environment. It's not like TinyBox could go anywhere either; Nvidia would demand they cease-and-desist and switch to their enterprise models since they're selling to potential enterprise/datacenter clients (even though running AI via CUDA is permitted on their consumer cards, but only at the consumer/prosumer level), while Intel would shrug and tell them to go all Intel for their All Intel AI PC program in which Intel will begin incorporating AI elements into their CPUs (and IIRC, they don't have a dedicated accelerator equivalent to Instinct or the Nvidia equivalent yet).
Ah yes, it would be terrible indeed if AMD consumer grade cards could be used for this type of workload and the masses could have a choice in the matter. Only well funded businesses and corporations who can afford to buy H100's or equivalent should be able to do this. How dare this guy try and use off the shelf stuff and cobble something together.
Posted on Reply
#17
john_
mechtechDo they?
Aren't people buying 4090s for AI? Is Nvidia stopping them?
Posted on Reply
#18
wNotyarD
john_Aren't people buying 4090s for AI? Is Nvidia stopping them?
I'd say "only by not producing enough", but that's also on TSMC.
Posted on Reply
#19
Chomiq
DeathtoGnomeswas it really necessary to do the dialog over social media?
It got Lisa on the case so I guess it was.
Posted on Reply
#20
TheDeeGee
Yes, stacks those GPU's like their bricks for a wall.
Posted on Reply
#21
hsew
CheeseballAs expected from the guy who bypassed the SIM lock on the iPhone and first hacked the PS3.
The same guy running his own autonomous driving AI firm; doing it so well that Elon Musk begged him to come work for him at one point.

GeoHotz is legit.
Posted on Reply
#22
TechLurker
ΣarioAh yes, it would be terrible indeed if AMD consumer grade cards could be used for this type of workload and the masses could have a choice in the matter. Only well funded businesses and corporations who can afford to buy H100's or equivalent should be able to do this. How dare this guy try and use off the shelf stuff and cobble something together.
I mean, kind of? That's literally part of the mark-up on enterprise equipment; they also get the priority treatment and a hotline to tech support vs the average consumer, and some level of guaranteed support for intended and validated use cases (or one wouldn't be buying from them in the first place).

Nothing wrong with AMD consumer cards being able to run AI; it's great for the home user or prosumer looking to utilize AI to assist them. However, expecting them to run like enterprise is stupid, and expecting to be treated like enterprise customers is even stupider, considering that's literally what the enterprise segment is for. Even Nvidia doesn't care if their GeForce gaming cards are used for AI in small environments, but they definitely will not provide any support to run it at the enterprise scale, directing customers to their enterprise line up in the first place.

At any rate, it's not even ROCm the guy is working with based on his TwiX posts, but something more exotic that requires access to the proprietary core code, which he wants open-sourced. Going by comments elsewhere, it's supposedly the kind of code that could make or break AMD's GPUs, so it's no surprise they wouldn't want to open source that, and why they chose to put his team in connection with a high level engineer to work out a custom fix. As for ROCm on 7900XTX and 7900XT, AMD is admittedly is kind of overdue on updating it; since the last major update was over 6 months back.

As well, if AI ends up following a similar path as cryptocurrency, there will eventually be near-consumer level AI accelerators developed at some point to handle most of the work that could beat a consumer card many times over. Assuming economies of scale don't just see that kind of tech end up in consumer GPUs anyway for future games; proclaiming smarter enemies, smarter allies, smarter wildlife, the return of PhysX in some manner, or advanced realistic sound generation and artificial surround sound (assuming AI in games doesn't become another novelty like PhysX; underutilized then killed because it takes up too much dev time).
Posted on Reply
#23
mechtech
john_Aren't people buying 4090s for AI? Is Nvidia stopping them?
Isn't tinybox a company?

I was under the impression company's typically use dedicated hardware and individuals typically use gaming cards?

And I don't follow individuals or AI so I wouldn't know.
Posted on Reply
#24
john_
mechtechIsn't tinybox a company?
It seems so. Now I have no idea of it's size.
I was under the impression company's typically use dedicated hardware and individuals typically use gaming cards?
Usually yes, but as it was shown in the past numerous times, pros can go with a gaming card if it can do the job they want at a fraction of the price of a pro card and gamers can buy pro cards - for whatever crazy reason - to game. Maybe they are pros who also game. In any case being a company doesn't mean that gaming cards are forbidden if they can be used for certain tasks.
And I don't follow individuals or AI so I wouldn't know.
Me neither. Forums are full of posts where people just speculate. Even people who will say "I am a pro" " I know what I am saying", could be just throwing random thoughts and speculations.

In any case 4090 is used by many for AI and that's the reason it is one of those cards restricted for export to China.
Posted on Reply
Add your own comment
Dec 19th, 2024 04:17 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts