Monday, September 16th 2024
Interview with AMD's Senior Vice President and Chief Software Officer Andrej Zdravkovic: UDNA, ROCm for Radeon, AI Everywhere, and Much More!
A few days ago, we reported on AMD's newest expansion plans for Serbia. The company opened two new engineering design centers with offices in Belgrade and Nis. We were invited to join the opening ceremony and got an exclusive interview with one of AMD's top executives, Andrej Zdravkovic, who is the senior vice president and Chief Software Officer. Previously, we reported on AMD's transition to become a software company. The company has recently tripled its software engineering workforce and is moving some of its best people to support these teams. AMD's plan is spread over a three to five-year timeframe to improve its software ecosystem, accelerating hardware development to launch new products more frequently and to react to changes in software demand. AMD found that to help these expansion efforts, opening new design centers in Serbia would be very advantageous.
We sat down with Andrej Zdravkovic to discuss the purpose of AMD's establishment in Serbia and the future of some products. Zdravkovic is actually an engineer from Serbia, where he completed his Bachelor's and Master's degrees in electrical engineering from Belgrade University. In 1998, Zdravkovic joined ATI and quickly rose through the ranks, eventually becoming a senior director. During his decade-long tenure, Zdravkovic witnessed a significant industry shift as AMD acquired ATI in 2006. After a brief stint at another company, Zdravkovic returned to AMD in 2015, bringing with him a wealth of experience and a unique perspective on the evolution of the graphics and computing industry.Here is the full interview:
Aleksandar: So, regarding the new opening of the center in Serbia, what is it going to be about? Is it going to be about software or hardware, AI or anything else?
Andrej: Primarily it is software for team we have right now. Large part of the team is working on the virtualization of our graphics processor for the data centers. We have a team working on compilers. We have the team working on content protection and security, which is going to be expanding into further security aspects. And we have a team working on AI technologies for data center developing our ROCm subsystem for data center. New team, that we just established, is working on ROCm for Radeon. We're extending our ROCm subsystem to Radeon graphics products, so everybody can get to use AI on AMD APUs and Radeon GPUs. Going further, we are not limiting Serbia team to these technologies. This is going to be a full-fledged design center, we are going to have RTL design, hardware verification and many other hardware and software technologies as an option. It really will depend on the available talent and on the ability to link to [local] universities. Virtually creating talent.
TechPowerUp: What made you come to Serbia, is it the local talent or anything else?
Andrej: Few different things. Definitely the availability of the qualified engineering talent. We started with a provider of outside services. We recognized the capability, and we have grown the initial, relatively small core, to the sizeable team of engineers that are working for us now. I personally have to insist the local talent is phenomenal because I graduated from the university here. It is very important that engineers we hire are very interested to learn and step up to new challenges. We started to work with Serbian universities, to partner and grow the next [generation] talent.
TechPowerUp: So, you were discussing ROCm. How easy it is for development right now and how easy it will be in the future for developers to write ROCm software and adapt from other accelerators for AI and machine learning to AMD ROCm accelerators.
Andrej: Great question. Today the challenge for ROCm developers is that they need to work on a big data center, [Instinct MI] machine intelligence type products. The access to that kind of a high-end product is limited, usually to developers in big companies like Microsoft. Also, the cost of that access is pretty high. We are bringing ROCm as a subsystem on the Radeon graphics products, desktop graphics products, or any Radeon APU powering desktops and notebooks. Developers will be provided with everyday access to ROCm. ROCm subsystem and the language that sits above it, which is "HIP", is very interesting for many developers from a perspective of being completely open. Compared to our competition, we have the system that is open top-to-bottom, completely open source. Any development, any contribution, and debugging is much easier for developers. We also provide the tools that allow a developer to take an AI application designed for CUDA and use the application that is called "HIPIFY" to transfer it from CUDA to run directly on HIP and ROCm.
TechPowerUp: How reliable is HIPIFY for enterprise applications?
Andrej: We find that HIPIFY is very reliable and very straightforward. We also find it's usually quite performant. Further optimizations are always welcomed, of course, but we find that out to the shoot, it works OK. There are some aspects that introduce complexity because the hardware subsystems are not the same. If the application is using these constructs and some of the lower level function calls, that are hardware specific, that is something that HIPIFY cannot translate. We don't find it that often, other than in applications that are extremely, extremely optimized. But then, if somebody had a will to optimize the application to an extreme level, we are going to help them optimize for HIP/ROCm
TechPowerUp: So AMD's strategy is to provide ROCm support across the entire stack, edge-to-core-to-cloud. All of these cases?
Andrej: Correct.
TechPowerUp: Regarding the new UDNA: We heard that UDNA is combining RDNA and CDNA to single architecture for GPU. So if that's going to be something that's going to be developed here or parts of it developed here or something else?
Andrej: Yeah, that's new. The portions of that new work will be developed in Serbia. We are working to define the next aspects of what's going to be developed here. The technology is moving very, very fast, so access to good engineers thank can learn quickly is extremely, extremely important. This is what we have in Serbia. Along the lines of your question on the combination of new technologies in notebook computers powered by AMD, we are major player in what we call an "AI PC", That is actually immersion of everything. This is the device that has a CPU, that has GPU, and it has a new unit, the NPU. We are opening the world of low power AI using the NPU in combination with the new Windows operating system, supporting the new features that Microsoft announced for NPU. In addition to running the most advanced AI on the NPU you can also execute AI applications on AMD GPU and on AMD CPU.
TechPowerUp: That's very exciting. Exciting because the true power of architecture lies in low power solutions, not high-power high-performance solutions when you give them power and massive TDPs, it is much easier to run than something constrained like smaller NPUs.
Andrej: That is correct. The interesting way to look at it is: we always need to find the balance. There are applications that require high-power solutions, that they are natively designed to work on, let's say with larger data formats, FP16, FP32... and large data sizes. So, some of the applications of the artificial intelligence require these formats and large memory would be run on various types of GPUs. Either RDNA or Machine Intelligence [Instinct MI] GPUs. If you go into the large language models, something like ChatGPT, or that kind of apps, a lot of these actually run perfectly well on the data formats like INT8 or INT4. So, we run that on the NPU on the low power executing very, very quickly, equally quickly, or even faster, as you would on GPU, using much less power. And that's where the AI PC starts playing. NPU combined with APU offers something for every aspect of human need in a PC, AI PC. And beauty of AMD is that we have all the solutions to offer to all these aspects of the need.
TechPowerUp: Take an application and distribute it across all teams. Get it developed fast?
Andrej: Exactly. That is where Serbia team comes. One more ace in our portfolio.
TechPowerUp: What is the future product you are most excited about? Is it something from the software side that is upcoming or something from the hardware side?
Andrej: Of course, you know that I cannot disclose the future products until we are ready to disclose them. Coming from the software world, the innovations in software and in AI are phenomenal. I think we are going to see the combination of both. The way we are looking at technology at AMD is that we are offering solutions for more and more verticals. Everything that we are doing recently, acquisition of Silo, which is bringing huge AI knowledge and competency, or just recently announced plan to acquire ZT Systems. We want to position ourselves as system provider, not to compete with system providers, but to grow that knowledge how to build systems and solutions. The next thing from AMD in general will be more of a combination of everything to provide solutions to our customer. Looking at that software becomes a huge part of it. My title, the Chief Software Officer, kind of shows that importance and level of recognition that AMD is putting into software. We are far from the classical semiconductor company that we were maybe 20 years ago. We are creating solutions to the world's most important challenges.
We sat down with Andrej Zdravkovic to discuss the purpose of AMD's establishment in Serbia and the future of some products. Zdravkovic is actually an engineer from Serbia, where he completed his Bachelor's and Master's degrees in electrical engineering from Belgrade University. In 1998, Zdravkovic joined ATI and quickly rose through the ranks, eventually becoming a senior director. During his decade-long tenure, Zdravkovic witnessed a significant industry shift as AMD acquired ATI in 2006. After a brief stint at another company, Zdravkovic returned to AMD in 2015, bringing with him a wealth of experience and a unique perspective on the evolution of the graphics and computing industry.Here is the full interview:
Aleksandar: So, regarding the new opening of the center in Serbia, what is it going to be about? Is it going to be about software or hardware, AI or anything else?
Andrej: Primarily it is software for team we have right now. Large part of the team is working on the virtualization of our graphics processor for the data centers. We have a team working on compilers. We have the team working on content protection and security, which is going to be expanding into further security aspects. And we have a team working on AI technologies for data center developing our ROCm subsystem for data center. New team, that we just established, is working on ROCm for Radeon. We're extending our ROCm subsystem to Radeon graphics products, so everybody can get to use AI on AMD APUs and Radeon GPUs. Going further, we are not limiting Serbia team to these technologies. This is going to be a full-fledged design center, we are going to have RTL design, hardware verification and many other hardware and software technologies as an option. It really will depend on the available talent and on the ability to link to [local] universities. Virtually creating talent.
TechPowerUp: What made you come to Serbia, is it the local talent or anything else?
Andrej: Few different things. Definitely the availability of the qualified engineering talent. We started with a provider of outside services. We recognized the capability, and we have grown the initial, relatively small core, to the sizeable team of engineers that are working for us now. I personally have to insist the local talent is phenomenal because I graduated from the university here. It is very important that engineers we hire are very interested to learn and step up to new challenges. We started to work with Serbian universities, to partner and grow the next [generation] talent.
TechPowerUp: So, you were discussing ROCm. How easy it is for development right now and how easy it will be in the future for developers to write ROCm software and adapt from other accelerators for AI and machine learning to AMD ROCm accelerators.
Andrej: Great question. Today the challenge for ROCm developers is that they need to work on a big data center, [Instinct MI] machine intelligence type products. The access to that kind of a high-end product is limited, usually to developers in big companies like Microsoft. Also, the cost of that access is pretty high. We are bringing ROCm as a subsystem on the Radeon graphics products, desktop graphics products, or any Radeon APU powering desktops and notebooks. Developers will be provided with everyday access to ROCm. ROCm subsystem and the language that sits above it, which is "HIP", is very interesting for many developers from a perspective of being completely open. Compared to our competition, we have the system that is open top-to-bottom, completely open source. Any development, any contribution, and debugging is much easier for developers. We also provide the tools that allow a developer to take an AI application designed for CUDA and use the application that is called "HIPIFY" to transfer it from CUDA to run directly on HIP and ROCm.
TechPowerUp: How reliable is HIPIFY for enterprise applications?
Andrej: We find that HIPIFY is very reliable and very straightforward. We also find it's usually quite performant. Further optimizations are always welcomed, of course, but we find that out to the shoot, it works OK. There are some aspects that introduce complexity because the hardware subsystems are not the same. If the application is using these constructs and some of the lower level function calls, that are hardware specific, that is something that HIPIFY cannot translate. We don't find it that often, other than in applications that are extremely, extremely optimized. But then, if somebody had a will to optimize the application to an extreme level, we are going to help them optimize for HIP/ROCm
TechPowerUp: So AMD's strategy is to provide ROCm support across the entire stack, edge-to-core-to-cloud. All of these cases?
Andrej: Correct.
TechPowerUp: Regarding the new UDNA: We heard that UDNA is combining RDNA and CDNA to single architecture for GPU. So if that's going to be something that's going to be developed here or parts of it developed here or something else?
Andrej: Yeah, that's new. The portions of that new work will be developed in Serbia. We are working to define the next aspects of what's going to be developed here. The technology is moving very, very fast, so access to good engineers thank can learn quickly is extremely, extremely important. This is what we have in Serbia. Along the lines of your question on the combination of new technologies in notebook computers powered by AMD, we are major player in what we call an "AI PC", That is actually immersion of everything. This is the device that has a CPU, that has GPU, and it has a new unit, the NPU. We are opening the world of low power AI using the NPU in combination with the new Windows operating system, supporting the new features that Microsoft announced for NPU. In addition to running the most advanced AI on the NPU you can also execute AI applications on AMD GPU and on AMD CPU.
TechPowerUp: That's very exciting. Exciting because the true power of architecture lies in low power solutions, not high-power high-performance solutions when you give them power and massive TDPs, it is much easier to run than something constrained like smaller NPUs.
Andrej: That is correct. The interesting way to look at it is: we always need to find the balance. There are applications that require high-power solutions, that they are natively designed to work on, let's say with larger data formats, FP16, FP32... and large data sizes. So, some of the applications of the artificial intelligence require these formats and large memory would be run on various types of GPUs. Either RDNA or Machine Intelligence [Instinct MI] GPUs. If you go into the large language models, something like ChatGPT, or that kind of apps, a lot of these actually run perfectly well on the data formats like INT8 or INT4. So, we run that on the NPU on the low power executing very, very quickly, equally quickly, or even faster, as you would on GPU, using much less power. And that's where the AI PC starts playing. NPU combined with APU offers something for every aspect of human need in a PC, AI PC. And beauty of AMD is that we have all the solutions to offer to all these aspects of the need.
TechPowerUp: Take an application and distribute it across all teams. Get it developed fast?
Andrej: Exactly. That is where Serbia team comes. One more ace in our portfolio.
TechPowerUp: What is the future product you are most excited about? Is it something from the software side that is upcoming or something from the hardware side?
Andrej: Of course, you know that I cannot disclose the future products until we are ready to disclose them. Coming from the software world, the innovations in software and in AI are phenomenal. I think we are going to see the combination of both. The way we are looking at technology at AMD is that we are offering solutions for more and more verticals. Everything that we are doing recently, acquisition of Silo, which is bringing huge AI knowledge and competency, or just recently announced plan to acquire ZT Systems. We want to position ourselves as system provider, not to compete with system providers, but to grow that knowledge how to build systems and solutions. The next thing from AMD in general will be more of a combination of everything to provide solutions to our customer. Looking at that software becomes a huge part of it. My title, the Chief Software Officer, kind of shows that importance and level of recognition that AMD is putting into software. We are far from the classical semiconductor company that we were maybe 20 years ago. We are creating solutions to the world's most important challenges.
40 Comments on Interview with AMD's Senior Vice President and Chief Software Officer Andrej Zdravkovic: UDNA, ROCm for Radeon, AI Everywhere, and Much More!
Compare that to the rise of Epyc. How was AMD able to sell any chips when everyone already had all the CPUs they could ever need, and no one was ever fired for buying Intel?
What I would have liked to know more about is the future of Radeon. Namely, when is RDNA 4 coming, what's next, and what can we expect with the arrival of UDNA (the answer to that question was vague as heck).
The biggest reason ATI was able to do compute was Nvdia was improving performance through rounding and fewer bits and thus had incorrect math for it.
Of course, AMD isn't stupid and will have contracts and many lawyers overwatch it, but I wouldn't go down this route just because it is cheap (which is the main reason why company open up stuff there).
Ngreedias backorder log is huge and same for their prices.
But if you have similar hardware at lower price and available AND tools to pass the CUDA trap, then you will have an option to your AI needs.
So situation like this wont be repeated: :)
www.tomshardware.com/tech-industry/elon-musk-and-oracle-founder-begged-nvidia-ceo-jensen-huang-for-ai-gpus-at-dinner
This is going to be an Herculean task
I mean, software must be taking advantage of their hardware.
That said, as stated in the interview, one of their biggest obstacle is that their software is running as expected on very expensive hardware (MI300), so few have access to transition/migrate away of CUDA.
But your point stands, they need to provide the tools for the customers. It does seems that they are taking that seriously and hopefully dont fail as before.
Also look how EKWB Serbia went.....well it was good for EK Serbias Manager xD
This is not like competing with Intel who fumbles the ball from time to time. Nvidia is a totally different animal.
The same can’t be said of AMD’s AI/compute products.
There problem isn't hardware its the software stack.
If it was both hardware and software then yes I would agree. They just need to put in a huge amount of money and resources into that software stack which they have committed to doing.