Wednesday, March 20th 2024

Tiny Corp. Pauses Development of AMD Radeon GPU-based Tinybox AI Cluster

George Hotz and his Tiny Corporation colleagues were pinning their hopes on AMD delivering some good news earlier this month. The development of a "TinyBox" AI compute cluster project hit some major roadblocks a couple of weeks ago—at the time, Radeon RX 7900 XTX GPU firmware was not gelling with Tiny Corp.'s setup. Hotz expressed "70% confidence" in AMD approving open-sourcing certain bits of firmware. At the time of writing this has not transpired—this week the Tiny Corp. social media account has, once again, switched to an "all guns blazing" mode. Hotz and Co. have publicly disclosed that they were dabbling with Intel Arc graphics cards, as of a few weeks ago. NVIDIA hardware is another possible route, according to freshly posted open thoughts.

Yesterday, it was confirmed that the young startup organization had paused its utilization of XFX Speedster MERC310 RX 7900 XTX graphics cards: "the driver is still very unstable, and when it crashes or hangs we have no way of debugging it. We have no way of dumping the state of a GPU. Apparently it isn't just the MES causing these issues, it's also the Command Processor (CP). After seeing how open Tenstorrent is, it's hard to deal with this. With Tenstorrent, I feel confident that if there's an issue, I can debug and fix it. With AMD, I don't." The $15,000 TinyBox system relies on "cheaper" gaming-oriented GPUs, rather than traditional enterprise solutions—this oddball approach has attracted a number of customers, but the latest announcements likely signal another delay. Yesterday's tweet continued to state: "we are exploring Intel, working on adding Level Zero support to tinygrad. We also added a $400 bounty for XMX support. We are also (sadly) exploring a 6x GeForce RTX 4090 GPU box. At least we know the software is good there. We will revisit AMD once we have an open and reproducible build process for the driver and firmware. We are willing to dive really deep into hardware to make it amazing. But without access, we can't."
Another post provided a behind-the-scenes look at Hotz's diplomatic approach: "I have spoken with AMD on multiple occasions, we have gotten through to top people, and they have been quite nice to us. I believe they want to be more open, and obviously they don't want their driver to have bugs. Unfortunately, this access and responses prolonged this decision, part of me wishes they just said it's a consumer card, you get what you pay for and we could have switched earlier. We probably tried too hard to make it work. We have an amazing team at tinygrad. Someday, we are going to make our own chips, and I figure if we can make our own chips, we better be able to make the 7900XTX software great. But we can't if we don't have access. The firmware is complex, undocumented, closed source, and signed, all struggles we wouldn't have with our own hardware. If and when the firmware is open and installable, if we aren't too far along with a different chip, we are down to put resources into writing fuzzers and rewriting whatever needs to be rewritten. The 7900XTX hardware seems great, but we aren't going to put resources into fixing a black box."
Sources: tinygrad Tweet, Tom's Hardware, Wccftech
Add your own comment

36 Comments on Tiny Corp. Pauses Development of AMD Radeon GPU-based Tinybox AI Cluster

#1
rv8000
“Small startup got burned by jumping into the AI buzz market with consumer GPUs not designed for professional work in an effort to make a quick buck”

Oh no… anyways.
Posted on Reply
#2
Leiesoldat
lazy gamer & woodworker
I'm annoyed that Mr. Hotz is blaming AMD for a use case scenario that is not supported by consumer level hardware when they (Tiny Corporation) clearly need to be using enterprise cards like the Instinct series. What a baby.
Posted on Reply
#3
Cheeseball
Not a Potato
The good part about Intel's oneAPI is that Level Zero support is technically direct-to-metal access on their GPUs.

AMD does (did?) this with their GCN-based (CDNA now) products through ROCm, but unfortunately not yet for RDNA.
LeiesoldatI'm annoyed that Mr. Hotz is blaming AMD for a use case scenario that is not supported by consumer level hardware when they (Tiny Corporation) clearly need to be using enterprise cards like the Instinct series. What a baby.
The problem is that AMD does not offer any consumer/prosumer card that can be used for local development (e.g. GPGPU developers using a RTX 3090/4090). The Radeon VII was their pinnacle of success, but unfortunately got hampered by "gaming" reviews.

Eventually they will get there once ROCm is in a good state of support for the RX 7900 XTX and the PRO W7900.
Posted on Reply
#4
Tropick
Why is this news? Who the hell is this guy and why should I care that he's bitching about some missing AI feature on a gaming card?
Posted on Reply
#5
john_
Just another individual running to seize the opportunity of self-promotion presented to them.
Also publicly blaming AMD could be a way out if he already got money for systems not ready to be send to customers.
Posted on Reply
#6
Cheeseball
Not a Potato
john_Just another individual running to seize the opportunity of self-promotion presented to them.
Also publicly blaming AMD could be a way out if he already got money for systems not ready to be send to customers.
No, if AMD can't help him with the 7900 XTXs firmware then he will just go with the RTX 4090s.

Also the pre-orders were only $100. The only customers who would be "angry" about this are the ones who don't want to use NVIDIA hardware for whatever reason.
Posted on Reply
#7
seventy
If you don't know who Geohot is, you are very new.
He is absolutely right here, software of AMD gpu's are a mess, only fanboys would disagree.
He is giving AMD a fighting chance to compete with NVIDIA by kickstarting the grassroots enthusiasm for their chips in ML. And AMD is throwing it away.
Another reason to never buy an AMD gpu, their software team is just too incompetent.
Posted on Reply
#8
Cheeseball
Not a Potato
seventyIf you don't know who Geohot is, you are very new.
He is absolutely right here, software of AMD gpu's are a mess, only fanboys would disagree.
He is giving AMD a fighting chance to compete with NVIDIA by kickstarting the grassroots enthusiasm for their chips in ML. And AMD is throwing it away.
Another reason to never to buy an AMD gpu, their software team is just too incompetent.
It's not that AMD is throwing away the opportunity intentionally (otherwise why would Lisa Su respond to him). Speculation-wise, AMD is probably investigating what would happen if they open source the firmware, as doing so may possibly ruin the market of their Instinct Accelerators since it would be cheaper to just get a bunch of 7900XTX/W7900s instead of the MI200s (not necessarily the MI300 series). The PRO W7900 is not really affected because it also has closed-source firmware and is unfortunately expensive.
Posted on Reply
#9
john_
CheeseballNo, if AMD can't help him with the 7900 XTXs firmware then he will just go with the RTX 4090s.

Also the pre-orders were only $100. The only customers who would be "angry" about this are the ones who don't want to use NVIDIA hardware for whatever reason.
He wants AMD to solve his own problems. If AMD doesn't want to play his game, then he can go Intel, because obviously he wants to bring to the market something that others don't provide. If Intel can't help him either, he can do what everyone else is doing. Go Nvidia. No one stops him.
seventyIf you don't know who Geohot is, you are very new.
He is absolutely right here, software of AMD gpu's are a mess, only fanboys would disagree.
He is giving AMD a fighting chance to compete with NVIDIA by kickstarting the grassroots enthusiasm for their chips in ML. And AMD is throwing it away.
Another reason to never buy an AMD gpu, their software team is just too incompetent.
Tech press avoiding reporting on Nvidia's problems, doesn't mean that AMD's software team is incompetent and Nvidia's team flawless. A friend of mine bought a laptop with an Nvidia GPU recently. He was saying to me today, that his system is freezing every time he tries to play a movie through HDMI because Optimus is not working as it should. Sound continues, image goes black, system needs to be hard reseted to be usable again.
rtx 4060 laptop screen freeze - Google
Posted on Reply
#10
Cheeseball
Not a Potato
john_He wants AMD to solve his own problems. If AMD doesn't want to play his game, then he can go Intel, because obviously he wants to bring to the market something that others don't provide. If Intel can't help him either, he can do what everyone else is doing. Go Nvidia. No one stops him.


Tech press avoiding reporting on Nvidia's problems, doesn't mean that AMD's software team is incompetent and Nvidia's team flawless. A friend of mine bought a laptop with an Nvidia GPU recently. He was saying to me today, that his system is freezing every time he tries to play a movie through HDMI because Optimus is not working as it should. Sound continues, image is frozen, system needs to hard reseted to be usable again.
rtx 4060 laptop screen freeze - Google
Not sure what your friend's laptop has anything to do with this. If anything, the blame would be placed on the laptop manufacturer's implementation of Optimus and their OEM-modification of the AMD Intel/NVIDIA graphics drivers. Much like how I blamed ASUS (instead of AMD) for their lack of support of their G15 Advantage (5980HX/6850M XT) with the black screen issue, until they finally released a BIOS update that resolved it.

All AMD needs to do is open source the MES/CP firmware and Geohot can correct any bugs that his driver fork of the AMD driver is encountering. That's the main problem and why he asked for AMD's help. He cannot identify what the issue the GPU is having if he does not have access to those specific GPU components. What's sad is that they allowed this on the Radeon VII (and the Vegas actually) but stopped doing this on the RDNA cards.
Posted on Reply
#11
john_
CheeseballNot sure what your friend's laptop has anything to do with this. If anything, the blame would be placed on the laptop manufacturer's implementation of Optimus and their OEM-modification of the AMD graphics drivers. Much like how I blamed ASUS (instead of AMD) for their lack of support of their G15 Advantage (5980HX/6850M XT) with the black screen issue, until they finally released a BIOS update that resolved it.
Double standards? Why not blame Tinycorp then and blame AMD here?
CheeseballAll AMD needs to do is open source the MES/CP firmware and Geohot can correct any bugs that his driver fork of the AMD driver is encountering. That's the main problem and why he asked for AMD's help. He cannot identify what the issue the GPU is having if he does not have access to those specific GPU components. What's sad is that they allowed this on the Radeon VII (and the Vegas actually) but stopped doing this on the RDNA cards.
All AMD needs is to fix whatever problems there are in their firmware, IF they want to start selling gaming GPUs for AI. If they don't, because let's not forget they seem to not have plenty of capacity at TSMC to support all their product lines, they can just ignore Tinycorp. Someone posting on internet that he can fix whatever problems there are in a firmware in hours - no matter his past in hacking, doesn't mean he can actually do it. I mean, he started with AMD gaming GPUs pretty convinced that he can do it and now he is crying publicly because he can't. AMD not wanting to give him access might be the reasonable thing to do. We don't know specifics, if not letting someone from the outside to gain access to firmware is a business decision or something more, like security or giving someone the inside information to brick every AMD GPU out there, for example.
Posted on Reply
#12
Denver
LeiesoldatI'm annoyed that Mr. Hotz is blaming AMD for a use case scenario that is not supported by consumer level hardware when they (Tiny Corporation) clearly need to be using enterprise cards like the Instinct series. What a baby.
Yeah, I wonder why they started selling a product without first testing it in the scenario it's being marketed for...
Posted on Reply
#13
vantila
Intel needs to meet every deman they can get for their gpu's so this can be the demand boom moment for this or next gen intel cards similar to nvidia crypto and now big tech ai deman boom. If Tinycorp and Intel works together and put together something that works well and undercuts more expensive solutions, other start-ups like tinycorp can pop up everywhere in europe, china, india, even russia and next thing you know intel is outselling amd left and right.
Posted on Reply
#14
Vya Domus
CheeseballThe good part about Intel's oneAPI is that Level Zero support is technically direct-to-metal access on their GPUs.

AMD does (did?) this with their GCN-based (CDNA now) products through ROCm, but unfortunately not yet for RDNA.
I don't know what you mean, RDNA3 ISA is out there, you can write software as direct-to-metal as you want, Nvidia doesn't make their ISAs public, don't know about Intel.

I have not looked into this, I don't know exactly what issue he has with RDNA3, if there is a bug in the command processor I don't know even know if it can be "fixed" and who knows if that's even the problem. AMD probably has a good reason for not allowing access to it, Nvidia didn't allow it until a year or so ago as well.

I suspect the reason AMD doesn't seem to care much about consumer products in this particular segment is because they plan to change the hardware dramatically anyway, it's clear that they intend to jump on the AI train, RDNA3 still doesn't really have dedicated ML hardware blocks, no reason to overhaul the software when the hardware is likely subject to change.
Posted on Reply
#15
LabRat 891
rv8000“Small startup got burned by jumping into the AI buzz market with consumer GPUs not designed for professional work in an effort to make a quick buck”

Oh no… anyways.
History, would like a word with you (and this thread).

Raja Koduri's Ellesmere-Polaris and (especially) Fury and Vega *started* CDNA/AI-MI compute @ AMD.
To this day, Vega 10 and Vega 20 cards are some of the best 'budget' options for 'tinkering' with LLMs, etc.

He's well-within his (and his team's) rights to have thought that 3rd Generation Navi could be used for such.
Not to mention, he wasn't told that; AMD historically 'likes to see' new uses for their products, and can more-less crowd source off them.
LeiesoldatI'm annoyed that Mr. Hotz is blaming AMD for a use case scenario that is not supported by consumer level hardware when they (Tiny Corporation) clearly need to be using enterprise cards like the Instinct series. What a baby.
'Consumer Hardware' started the AI/MI revolution.
Vega is retaining support largely because of how similar it is to currently-supported CDNA.

We're only 2generation off from Raja's last pre-CDNA work, Navi 1x. (Navi 12 is something strange...)
I can't blame Hotz for poking @ AMD when, previously they'd been quite accommodating towards this kind of use.
Ex: I received VII air coolers (for a MI25 mod) from a EHW'r that *still* runs quad VIIs for his work.
TropickWhy is this news? Who the hell is this guy and why should I care that he's bitching about some missing AI feature on a gaming card?
Quite often, I'm reminded that: Gamers =/= Enthusiasts.
You clearly have no enthusiasm for technology, beyond the FPS and the pretties on the screen...
Posted on Reply
#16
Cheeseball
Not a Potato
Vya DomusI don't know what you mean, RDNA3 ISA is out there, you can write software as direct-to-metal as you want, Nvidia doesn't make their ISAs public, don't know about Intel.
Yes, I know. AMD's documentation is really good, but unfortunately has limited info about the micro-engine scheduler and its command parser. I think AMD does not want to open-source those since that's one of the major components that talks to the GPU scheduler in the drivers.

While NVIDIA does not have a public ISA, you can get really close to direct GPU access using NVPTX (yes, I know its a VM), but it runs code directly on the GPU.

Intel is doing something similar to PTX with Level Zero, but I haven't seen it utilized on campus yet.
Posted on Reply
#17
Vya Domus
CheeseballWhile NVIDIA does not have a public ISA, you can get really close to direct GPU access using NVPTX (yes, I know its a VM), but it runs code directly on the GPU.
It's not clear to me how Nvidia (or Intel) is doing a better job here in allowing closer to the metal access compared to AMD, some years ago I found a blog of a guy who wanted to see how close he can get to cuBLAS performance wise optimizing kernels by hand and he concluded that PTX is insufficient, it wasn't possible to fully optimize occupancy in the SM without access to the actual ISA.
Posted on Reply
#18
fec32a4de
LabRat 891History, would like a word with you (and this thread).

Raja Koduri's Ellesmere-Polaris and (especially) Fury and Vega *started* CDNA/AI-MI compute @ AMD.
To this day, Vega 10 and Vega 20 cards are some of the best 'budget' options for 'tinkering' with LLMs, etc.

He's well-within his (and his team's) rights to have thought that 3rd Generation Navi could be used for such.
Not to mention, he wasn't told that; AMD historically 'likes to see' new uses for their products, and can more-less crowd source off them.

'Consumer Hardware' started the AI/MI revolution.
Vega is retaining support largely because of how similar it is to currently-supported CDNA.

We're only 2generation off from Raja's last pre-CDNA work, Navi 1x. (Navi 12 is something strange...)
I can't blame Hotz for poking @ AMD when, previously they'd been quite accommodating towards this kind of use.
Ex: I received VII air coolers (for a MI25 mod) from a EHW'r that *still* runs quad VIIs for his work.


Quite often, I'm reminded that: Gamers =/= Enthusiasts.
You clearly have no enthusiasm for technology, beyond the FPS and the pretties on the screen...
Its a gaming line of GPUs. Designed for gaming. For pretties on the screen. That's why AMD has CDNA and RDNA lines, to separate them for various use cases.

I get that the modern GPU can do a lot more than just graphics, but they were designed to give you graphics, the radiance display engine, high-bandwidth HDMI and DisplayPort for higher refresh rate and FPS.

If you buy a product thats marketed and designed for a purpose (gaming) then complain it can't do the dishes for you, thats a you problem.
Posted on Reply
#19
evernessince
Vya DomusI don't know what you mean, RDNA3 ISA is out there, you can write software as direct-to-metal as you want, Nvidia doesn't make their ISAs public, don't know about Intel.

I have not looked into this, I don't know exactly what issue he has with RDNA3, if there is a bug in the command processor I don't know even know if it can be "fixed" and who knows if that's even the problem. AMD probably has a good reason for not allowing access to it, Nvidia didn't allow it until a year or so ago as well.

I suspect the reason AMD doesn't seem to care much about consumer products in this particular segment is because they plan to change the hardware dramatically anyway, it's clear that they intend to jump on the AI train, RDNA3 still doesn't really have dedicated ML hardware blocks, no reason to overhaul the software when the hardware is likely subject to change.
I'm not even sure documentation was the issue in this case. The guy made ambitious announcements to attract attention and potential investment before he had checked whether anything he wanted to do would actually work.

1/2 of scientific or tech "news" nowdays is just wishful thinking that never comes to fruition but is published to push clicks.
Posted on Reply
#20
TechLurker
I doubt NVIDIA will even help him out; esp. if he's trying to push consumer cards as a cheaper alternative to NVIDIA's own dedicated AI cards (which are easily multiple the costs of 4090s). He's going beyond the prosumer scale and trying to target entry level enterprise scale, which is where NVIDIA is currently competing with AMD at, and to a lesser degree, Intel.

Intel on the other hand might be willing, if only to help jump-start their own AI cards, but I wouldn't put it past them to eventually split off development and put AI on dedicated cards while forcing restrictions on their consumer GPUs. After all, Intel has done product segmentation before, and they want both the AI and gaming sector.

At any rate, here's a friendly reminder that the black box stuff in question is not related to raster or gaming whatsoever, so don't expect improvements to game performance even if AMD and TinyCorp work out a code-sharing agreement.
Posted on Reply
#21
LabRat 891
fec32a4deIts a gaming line of GPUs. Designed for gaming. For pretties on the screen. That's why AMD has CDNA and RDNA lines, to separate them for various use cases.

I get that the modern GPU can do a lot more than just graphics, but they were designed to give you graphics, the radiance display engine, high-bandwidth HDMI and DisplayPort for higher refresh rate and FPS.
Can't totally disagree here.
fec32a4deIf you buy a product thats marketed and designed for a purpose (gaming) then complain it can't do the dishes for you, thats a you problem.
The issue, is that it's fundamentally a product for doing the dishes, that's being sold as a Salmon Cooker.
Yes, it'll cook salmon, and yes, the 'salmon cooker' variety appliance is cheaper. -doesn't change the fact the hardware itself is both useful and capable of more.

IMO, Tiny was building a Dishwasher out of the Salmon Cooker. They knew full-well that's not the intended use, but were willing to build all the ancillary stuff to make it work reliably.
They're pissed, less because they were actively blocked, and more because AMD cannot 'nail down' an answer or a solution. (denying both Tiny and AMD new marketshare)
Posted on Reply
#22
Cheeseball
Not a Potato
evernessinceI'm not even sure documentation was the issue in this case. The guy made ambitious announcements to attract attention and potential investment before he had checked whether anything he wanted to do would actually work.

1/2 of scientific or tech "news" nowdays is just wishful thinking that never comes to fruition but is published to push clicks.
Attention, yes. This is Geohot we're talking about. But potential investment? I don't think so, mainly because the tinybox pre-order itself is only $100.
TechLurkerI doubt NVIDIA will even help him out; esp. if he's trying to push consumer cards as a cheaper alternative to NVIDIA's own dedicated AI cards (which are easily multiple the costs of 4090s). He's going beyond the prosumer scale and trying to target entry level enterprise scale, which is where NVIDIA is currently competing with AMD at, and to a lesser degree, Intel.

Intel on the other hand might be willing, if only to help jump-start their own AI cards, but I wouldn't put it past them to eventually split off development and put AI on dedicated cards while forcing restrictions on their consumer GPUs. After all, Intel has done product segmentation before, and they want both the AI and gaming sector.

At any rate, here's a friendly reminder that the black box stuff in question is not related to raster or gaming whatsoever, so don't expect improvements to game performance even if AMD and TinyCorp work out a code-sharing agreement.
^ This really.

I don't think they would ever reach an agreement since Geohot actually wants the MES/CP firmware open-sourced (not just shared to him for use). As I mentioned above (again, lol), that may not be of AMD's best interests since it may mean potentially losing out on their HPC division since people would just get the cheaper 7900XTX/W7900 and use those instead of the Instinct Accelerators.

The main goal of the tinybox is to have the option of having an on-prem $15,000 compute cluster. The main goal of tinygrad is to have an alternative (and possibly more optimized?) framework to PyTorch (and its autograd).
Posted on Reply
#23
R-T-B
TropickWhy is this news? Who the hell is this guy and why should I care that he's bitching about some missing AI feature on a gaming card?
Because this isn't just a gaming site, and this had gathered some AI centered interest, as bizarre as it is segmentation wise.
Posted on Reply
#24
xSneak
If this guy had his way all of the amd gaming gpus would be sold out and only available via scalper pricing. Good riddance.
Posted on Reply
#25
Patriot
CheeseballThe good part about Intel's oneAPI is that Level Zero support is technically direct-to-metal access on their GPUs.

AMD does (did?) this with their GCN-based (CDNA now) products through ROCm, but unfortunately not yet for RDNA.


The problem is that AMD does not offer any consumer/prosumer card that can be used for local development (e.g. GPGPU developers using a RTX 3090/4090). The Radeon VII was their pinnacle of success, but unfortunately got hampered by "gaming" reviews.

Eventually they will get there once ROCm is in a good state of support for the RX 7900 XTX and the PRO W7900.
They support single cards of the W7900 7900xtx xt and gre. 2 cards is in beta support. They aren't having trouble till cards 5 and 6. They are simply trying to use things in ways that are unsupported and crying about it.
They made an announcement before partnering with AMD, before qualifying a solution like any competent firm would have. If they want 8 card nodes, they should be using MI210s.
Posted on Reply
Add your own comment
May 3rd, 2024 22:49 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts