Monday, July 3rd 2023

AMD "Vega" Architecture Gets No More ROCm Updates After Release 5.6

AMD's "Vega" graphics architecture powering graphics cards such as the Radeon VII, Radeon PRO VII, sees a discontinuation of maintenance with ROCm GPU programming software stack. The release notes of ROCm 5.6 states that the AMD Instinct MI50 accelerator, Radeon VII client graphics card, and Radeon PRO VII pro-vis graphics card, collectively referred to as "gfx906," will reach EOM (end of maintenance) starting Q3-2023, which aligns with the release of ROCm 5.7. Developer "EwoutH" on GitHub, who discovered this, remarks gfx906 is barely 5 years old, with the Radeon PRO VII and Instinct MI50 accelerator currently being sold in the market. The most recent AMD product powered by "Vega" has to be the "Cezanne" desktop processor, which uses an iGPU based on the architecture. This chip was released in Q2-2021.
Source: EwoutH (ROCm GitHub)
Add your own comment

42 Comments on AMD "Vega" Architecture Gets No More ROCm Updates After Release 5.6

#1
R0H1T
The "architecture" is GCN.
Posted on Reply
#2
tabascosauz
@btarunr Renoir/Lucienne and Cezanne/Barcelo/whatever you wanna call it still continues on the mobile side, as new products, under the Ryzen 7020 and 7030 names. Released last year in 2022. Lucienne is decidedly trash heap but Barcelo is still being quietly put in new, reasonably mid-high end notebooks. Since it's all listed by AMD as one big family under Ryzen 7000, I don't see Barcelo going away until Rembrandt-R (7035) and Phoenix (7040) do too. Which will be......pretty interesting to see in the context of GCN5.1 being dropped.

Vega had a good run, I honestly don't see a problem with this. It's just that it should have died off long ago, instead of being continuously shoehorned into new products in the past 3 years.
Posted on Reply
#3
Vayra86
tabascosauzVega had a good run, I honestly don't see a problem with this. It's just that it should have died off long ago, instead of being continuously shoehorned into new products in the past 3 years.
Its funny, I remember predicting this exact problem with AMD's one-off (well, two, including Fury X, which had an even shorter support cycle, and qualitatively much worse one) HBM adventures on gaming GPUs. AMD, with its problematic driver regimes in the past, was now going to push optimizations for two different memory subsystems in their GPU families? Of course not.

And here we are. That's also why I think they are much more focused and strategically positioned better today; the whole business is chiplet-focused now, moving to unification rather than trying weird shit left and right. Its also why I don't mind them not pushing the RT button too hard. Less might be more.
Posted on Reply
#4
ZoneDymo
What does this mean? idk what ROCm even is.
Posted on Reply
#5
john_
This is a bad decision from AMD. Maybe they have to, because it is GCN, still they should try to keep the support for longer. Why? Because Nvidia does it and this is an area where matching Nvidia doesn't need stronger hardware and better software. It's just a business decision. Let's not forget that the main argument for years in favor of AMD was that "Fine Wine". This is the complete opposite.
Posted on Reply
#6
dj-electric
AMD simply can't get their CUDA competitor off the ground. Still very much locked out of the ML industry. Not happy to see that
Posted on Reply
#7
john_
dj-electricAMD simply can't get their CUDA competitor off the ground. Still very much locked out of the ML industry. Not happy to see that
I think they can. They seems to fixed some things those last months.
UPDATE 1-AMD's AI chips could match Nvidia's offerings, software firm says
So maybe their are fixing their software problems. Probably dropping the older architectures is to make their job easier. But they do send the wrong message to professionals. Long term support should be also a priority.
Posted on Reply
#8
dj-electric
john_I think they can. They seems to fixed some things those last months.
UPDATE 1-AMD's AI chips could match Nvidia's offerings, software firm says
So maybe their are fixing their software problems. Probably dropping the older architectures is to make their job easier. But they do send the wrong message to professionals. Long term support should be also a priority.
Do you know how astronomically ahead anything with Tensor cores is to even RDNA3 on ML applications?
I think its a bit of a system that feeds itself. I think that if the ROCm ecosystem was more popular, AMD would have been incentivized to make their GPUs train faster. If you want to generate a model, you are better with an RTX 3070 Ti than an RX 7900 XTX at this point.

The only thing I believe AMD's RDNA2-3 cards are decent at is inference.
Posted on Reply
#9
john_
dj-electricDo you know how astronomically ahead anything with Tensor cores is to even RDNA3 on ML applications?
I think its a bit of a system that feeds itself. I think that if the ROCm ecosystem was more popular, AMD would have been incentivized to make their GPUs train faster. If you want to generate a model, you are better with an RTX 3070 Ti than an RX 7900 XTX at this point.

The only thing I believe AMD's RDNA2-3 cards are decent at is inference.
No, please explain it to me. And really talking about AI and ML in servers and then pointing to gaming cards, like the 3070 Ti and the RX 7900XT/X looks a bit odd. AMD is not using RDNA3 in Instinct cards.

In any case AMD GPUs do find their way in super computers meant to be used also for AI and ML. That probably means something. Also Nvidia is having so much difficulty fulfilling orders that I believe I read about 6 months waiting list. If AMD's options can be at 80% performance and at 80% price, I would expect many turning to AMD solutions instead of waiting 6 months. And there is a paragraph in the above article that does seems to implie that something changed about the AMD software
"For most (machine learning) chip companies out there, the software is the Achilles heel of it," Tang said, adding that AMD had not paid MosaicML to conduct its research. "Where AMD has done really well is on the software side."
Posted on Reply
#10
bug
Vayra86Its funny, I remember predicting this exact problem with AMD's one-off (well, two, including Fury X, which had an even shorter support cycle, and qualitatively much worse one) HBM adventures on gaming GPUs. AMD, with its problematic driver regimes in the past, was now going to push optimizations for two different memory subsystems in their GPU families? Of course not.

And here we are. That's also why I think they are much more focused and strategically positioned better today; the whole business is chiplet-focused now, moving to unification rather than trying weird shit left and right. Its also why I don't mind them not pushing the RT button too hard. Less might be more.
It's a little debatable how "focused" they are, considering they still put Vega in current products. I wish they dropped this habit once and for all. Or at least stick to a couple of generations.
ZoneDymoWhat does this mean? idk what ROCm even is.
It's their Linux compute stack. I.e. what keep s them from being taken seriously for AI/ML :(
john_No, please explain it to me. And really talking about AI and ML in servers and then pointing to gaming cards, like the 3070 Ti and the RX 7900XT/X looks a bit odd.
It may seem odd, but it's really not. People dive into AI/ML using the hardware they have, they don't buy professional adapters for a hobby. This, in turn, determines what skills are readily available in the market when you're looking to hire AI/ML engineers.
Posted on Reply
#11
sLowEnd
tabascosauz@btarunr Renoir/Lucienne and Cezanne/Barcelo/whatever you wanna call it still continues on the mobile side, as new products, under the Ryzen 7020 and 7030 names. Released last year in 2022. Lucienne is decidedly trash heap but Barcelo is still being quietly put in new, reasonably mid-high end notebooks. Since it's all listed by AMD as one big family under Ryzen 7000, I don't see Barcelo going away until Rembrandt-R (7035) and Phoenix (7040) do too. Which will be......pretty interesting to see in the context of GCN5.1 being dropped.

Vega had a good run, I honestly don't see a problem with this. It's just that it should have died off long ago, instead of being continuously shoehorned into new products in the past 3 years.
The CPU core is still Zen 2, but the iGPU on those 7020 series chips is RDNA2 instead of Vega. (Radeon 610M)

You're right about the 7030 chips though
Posted on Reply
#13
tabascosauz
sLowEndThe CPU core is still Zen 2, but the iGPU on those 7020 series chips is RDNA2 instead of Vega. (Radeon 610M)

You're right about the 7030 chips though
Ah no, that was my mistake. 7020 is Mendocino and is the new Zen2/RDNA2 Athlon release. I was referring to the Renoir refresh.

It does still illustrate the utter disaster that is the 7000 mobile naming scheme. AMD seriously wants people to view Mendocino, Barcelo, Rembrandt, Phoenix and Dragon Range as equals in terms of technology :roll:
TheoneandonlyMrKPro users using Rocm, on Cezanne!?!, That a big crowd?!.
As far as I can tell ROCm support on APUs (even if they are "Vega") is kinda pepega and a clear answer/proper documentation is scarce. Still, why not? I can think of plenty of people very interested in running stuff like stable diffusion - it doesn't mean they have the funds to smash on a high end GPU.
Posted on Reply
#14
john_
bugIt may seem odd, but it's really not. People dive into AI/ML using the hardware they have, they don't buy professional adapters for a hobby. This, in turn, determines what skills are readily available in the market when you're looking to hire AI/ML engineers.
I don't believe Nvidia financial success is based on the fact that millions with Nvidia gaming cards decided to make AI and ML their hobby. I understand your point, but gaming cards are probably irrelevant here.
I also understand your point about hobbyists getting used to CUDA and then probably doing some studies on CUDA to get jobs. But again, where Nvidia and AMD and everyone else is targeting, it's not about "what was your hobby, what did you learn in university?". If that was the case, then EVERYTHING ELSE than CUDA would being DOA.
Posted on Reply
#15
bug
TheoneandonlyMrKPro users using Rocm, on Cezanne!?!, That a big crowd?!.
That's AMD's real problem: compute underperforms and is hard to set up. As a result, everything happens in the green camp, AMD's crowd is not big no matter how you look at it.
john_I also understand your point about hobbyists getting used to CUDA and then probably doing some studies on CUDA to get jobs. But again, where Nvidia and AMD and everyone else is targeting, it's not about "what was your hobby, what did you learn in university?". If that was the case, then EVERYTHING ELSE than CUDA would being DOA.
It's not entirely about that. But when you need to move fast, existing knowledge in the market is a factor.
Posted on Reply
#16
john_
bugIt's not entirely about that. But when you need to move fast, existing knowledge in the market is a factor.
Google and others might grab individuals that are good in CUDA, not to program in CUDA, but because they understand what AI and ML programing is and how it looks like and how to get the results needed. Most of them will have to learn something else to get and keep their new/old jobs.
Again, if it was CUDA and only CUDA, EVERYTHING else would have being DOA. Not just anything AMD, but anything Intel, anything Google, anything Amazon, anything Tenstorrent, anything Apple, anything Microsoft, anything different than CUDA. Am I right? Am I wrong?

Now I do agree that for companies and universities with limited resources for probably limited projects, where limited I mean projects that are still huge in my eyes or some other individual's eyes throwing random thoughts in a forum, will just go out and buy a few RTX 4090s. Granted. But again, I doubt Nvidia's lattest amazing success is based on GeForce cards.
Posted on Reply
#17
TheoneandonlyMrK
bugThat's AMD's real problem: compute underperforms and is hard to set up. As a result, everything happens in the green camp, AMD's crowd is not big no matter how you look at it.


It's not entirely about that. But when you need to move fast, existing knowledge in the market is a factor.
So loads do AI on mx450 equipped laptops.

My point was no pro uses Cezanne for AI work and even if that Apu had an Nvidia GPU it would still be irrelevant it's a consumer part.

RocM is irrelevant to consumer part's so no need to mention them at all or discuss consumer part's in this thread.

What's next should we talk. AMD driver issues on consumer part's here too , someone will no doubt.
Posted on Reply
#18
bug
john_Google and others might grab individuals that are good in CUDA, not to program in CUDA, but because they understand what AI and ML programing is and how it looks like and how to get the results needed. Most of them will have to learn something else to get and keep their new/old jobs.
Again, if it was CUDA and only CUDA, EVERYTHING else would have being DOA. Not just anything AMD, but anything Intel, anything Google, anything Amazon, anything Tenstorrent, anything Apple, anything Microsoft, anything different than CUDA. Am I right? Am I wrong?

Now I do agree that for companies and universities with limited resources for probably limited projects, where limited I mean projects that are still huge in my eyes or some other individual's eyes throwing random thoughts in a forum, will just go out and buy a few RTX 4090s. Granted. But again, I doubt Nvidia's lattest amazing success is based on GeForce cards.
Everything other than CUDA is virtually DOA. Most libraries that underpin AI/ML projects are CUDA-based.

Of course you can have your own implementation from scratch (possibly even more performant than CUDA, if you're in a specific niche), but at this point, the entry barrier is quite high.

Mind you, I'm not saying CUDA is better (I haven't used it). But I know at least two guys who tried to dabble in AI/ML using AMD/OpenCL and they both said "screw it" in the end and went CUDA. One of them was doing it for a hobby, the other one for his PhD. TL;DR CUDA is everywhere and it sells hardware while AMD keeps finding ways to shoot themselves in the foot.
Posted on Reply
#19
dj-electric
john_Now I do agree that for companies and universities with limited resources for probably limited projects, where limited I mean projects that are still huge in my eyes or some other individual's eyes throwing random thoughts in a forum, will just go out and buy a few RTX 4090s. Granted. But again, I doubt Nvidia's lattest amazing success is based on GeForce cards.
You'd be amzed to know how small, medium and often large businesses deploy racks and racks of GeForce based ML servers today to perform training. Cards like the RTX 3090, and later the RTX 4080 and 4090 really do represent server-grade compute strength that was previously available to only few.

This economy is crazy. Startups that want to start and train models in-house will buy often 15-25 high end GPUs and put them in racks or rigs to get their initial versions ready.
Posted on Reply
#20
Vya Domus
dj-electricYou'd be amzed to know how small, medium and often large businesses deploy racks and racks of GeForce based ML servers today to perform training. Cards like the RTX 3090, and later the RTX 4080 and 4090 really do represent server-grade compute strength that was previously available to only few.

This economy is crazy. Startups that want to start and train models in-house will buy often 15-25 high end GPUs and put them in racks or rigs to get their initial versions ready.
They really don't, businesses that need that kind of stuff turn to cloud solutions, buying 20 RTX 4090s upfront is a catastrophically cost ineffective solution, no one is doing that.
dj-electricDo you know how astronomically ahead anything with Tensor cores is to even RDNA3 on ML applications?
I don't and neither do you. AMD's upcoming MI300 looks like it's going to be at the very least comparable to H100 and that the fact that Nvidia had to respond with new variants of H100 to match the memory capacity goes to show they feel some heat coming from AMD. Not to mention that AMD keeps getting more and more huge projects to build supercomputers competing with Nvidia/Intel offerings, if Nvidia was so astronomically ahead no one would be paying millions of dollars for their stuff, be real.
Posted on Reply
#21
bug
Vya DomusI don't and neither do you. AMD's upcoming MI300 looks like it's going to be at the very least comparable to H100 and that the fact that Nvidia had to respond with new variants of H100 to match the memory capacity goes to show they feel some heat coming from AMD.
That's the sad part: it doesn't matter how capable is the hardware if the software sucks.
Vya DomusNot to mention that AMD keeps getting more and more huge projects to build supercomputers competing with Nvidia/Intel offerings, if Nvidia was so astronomically ahead no one would be paying millions of dollars for their stuff, be real.
If Ferrari were so great, everybody would be buying Ferraris, nobody would buy anything else, right? ;)

It's an exaggeration, of course. But it shows why you can sell even when there's a considerable gap between you and competition.
Posted on Reply
#22
dj-electric
Vya DomusThey really don't, businesses that need that kind of stuff turn to cloud solutions, buying 20 RTX 4090s upfront is a catastrophically cost ineffective solution, no one is doing that.
Im not speaking out of nowhere or speaking in hypothetics, im in this business myself :). Not everyone is rushing to get 20 4090s, but small offices will already start equipping their employees with machines that allow them to train smaller models locally. There's really nowhere else besides geforce cards they turn to. That means that the product is built with CUDA, and probably for the next few years the business will grow using it and exapnding on their resources.
We have already seen that with most of the popular available tools for developers, someone better get two RTX 4090's in a machine than four RX 7900 XTXs or whatever Radeon instinct equivelant to it is. The situation is extremely skewed towards NVIDIA in the ML ecosystem today. At this point, im pretty sure that two zeroes won't make AMD's sales for ML reach NVIDIA's. It really is quite an astronomical difference.

I don't even want to open up on the GTCs, the courses and the academies that exist on NVIDIA's side to enrich the CUDA based ML world today. This is a losing game for anyone else in this industry so far, Intel included and their less than great solutions. These DGX servers NVIDIA offer are just the cherry on top
Posted on Reply
#23
john_
bugEverything other than CUDA is virtually DOA. Most libraries that underpin AI/ML projects are CUDA-based.

Of course you can have your own implementation from scratch (possibly even more performant than CUDA, if you're in a specific niche), but at this point, the entry barrier is quite high.

Mind you, I'm not saying CUDA is better (I haven't used it). But I know at least two guys who tried to dabble in AI/ML using AMD/OpenCL and they both said "screw it" in the end and went CUDA. One of them was doing it for a hobby, the other one for his PhD. TL;DR CUDA is everywhere and it sells hardware while AMD keeps finding ways to shoot themselves in the foot.
That still doesn't explain why huge companies like Google or Intel build their own hardware for AI and ML. Do we, simple forum users, understand reality better than them?
dj-electricYou'd be amzed to know how small, medium and often large businesses deploy racks and racks of GeForce based ML servers today to perform training. Cards like the RTX 3090, and later the RTX 4080 and 4090 really do represent server-grade compute strength that was previously available to only few.

This economy is crazy. Startups that want to start and train models in-house will buy often 15-25 high end GPUs and put them in racks or rigs to get their initial versions ready.
Individuals and startups are not the target audience for Nvidia, Intel, AMD, Google, Tenstorrent etc.
dj-electricRX 7900 XTXs or whatever Radeon instinct equivelant to it is.
Someone in the business should have that answer.

In any case MosaicML seems to be a company working with Nvidia for a long time and only now coming out with a press release saying "Hey, you know something? AMD's options can be a real alternative NOW, because
thanks largely to a new version of AMD software released late last year and a new version of open-source software backed by Meta Platforms called PyTorch that was released in March
Maybe, being in the business, you need to update your info.
Posted on Reply
#24
dj-electric
john_Individuals and startups are not the target audience for ,Nvidia
If startups warent the target of NVIDIA, they wouldn't invest such tremendous efforts in running GTCs and funding programs worth hundreds of millions to make sure that as many small and medium businesses as possible would be using their hardware and software tools. This is objectively false. Often times, NVIDIA like those business enough to buy into them or them entirely.
If such things weren't important to them, they wouldn't give as many software tools and grow as large community using accessible and affordable hardware to such clients. They would force them to buy server-grade hardware only and unlock those features there. They wouldn't sell you on Xaviar NX / Orin products that you can buy for a couple of hundred dollars and develop for, including in hardware integration level to boards

These products exist especially for startups and small businesses. Here's our little Xaviar we built a board to accomodate. very cute



People really stay under large rocks. Time to lift them up, you missed how NVIDIA exponentially grew their ML outreach since 2017
Posted on Reply
#25
bug
john_That still doesn't explain why huge companies like Google or Intel build their own hardware for AI and ML. Do we, simple forum users, understand reality better than them?
There are reasons. Nvidia's hardware is general-purpose, many things Google does (e.g. ads and targeting) are not. Or they may just want to own the whole solution.

Anyway, you're looking at this the wrong way. The fact that CUDA doesn't command 100% market share is not a guarantee ROCm is just as serviceable. Case in point: this year OpenAI has announced they will buy en-masse from Nvidia. Have they, or any of their competitors, announced something similar for AMD hardware? Another case in point: open up a cloud admin interface and try to create an OpenCL/ROCm instance. Everybody offers CUDA, but otoh, I can't name a provider that also offers ROCm (I'm sure there are some, I just don't recall who).
Posted on Reply
Add your own comment
Dec 18th, 2024 00:55 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts