Monday, February 12th 2024
AMD Develops ROCm-based Solution to Run Unmodified NVIDIA's CUDA Binaries on AMD Graphics
AMD has quietly funded an effort over the past two years to enable binary compatibility for NVIDIA CUDA applications on their ROCm stack. This allows CUDA software to run on AMD Radeon GPUs without adapting the source code. The project responsible is ZLUDA, which was initially developed to provide CUDA support on Intel graphics. The developer behind ZLUDA, Andrzej Janik, was contracted by AMD in 2022 to adapt his project for use on Radeon GPUs with HIP/ROCm. He spent two years bringing functional CUDA support to AMD's platform, allowing many real-world CUDA workloads to run without modification. AMD decided not to productize this effort for unknown reasons but did open-source it once funding ended per their agreement. Over at Phoronix, there were several benchmarks testing AMD's ZLUDA implementation over a wide variety of benchmarks.
Benchmarks found that proprietary CUDA renderers and software worked on Radeon GPUs out-of-the-box with the drop-in ZLUDA library replacements. CUDA-optimized Blender 4.0 rendering now runs faster on AMD Radeon GPUs than the native ROCm/HIP port, reducing render times by around 10-20%, depending on the scene. The implementation is surprisingly robust, considering it was a single-developer project. However, there are some limitations—OptiX and PTX assembly codes still need to be fully supported. Overall, though, testing showed very promising results. Over the generic OpenCL runtimes in Geekbench, CUDA-optimized binaries produce up to 75% better results. With the ZLUDA libraries handling API translation, unmodified CUDA binaries can now run directly on top of ROCm and Radeon GPUs. Strangely, the ZLUDA port targets AMD ROCm 5.7, not the newest 6.x versions. Only time will tell if AMD continues investing in this approach to simplify porting of CUDA software. However, the open-sourced project now enables anyone to contribute and help improve compatibility. For a complete review, check out Phoronix tests.
Sources:
Phoronix, ZLUDA
Benchmarks found that proprietary CUDA renderers and software worked on Radeon GPUs out-of-the-box with the drop-in ZLUDA library replacements. CUDA-optimized Blender 4.0 rendering now runs faster on AMD Radeon GPUs than the native ROCm/HIP port, reducing render times by around 10-20%, depending on the scene. The implementation is surprisingly robust, considering it was a single-developer project. However, there are some limitations—OptiX and PTX assembly codes still need to be fully supported. Overall, though, testing showed very promising results. Over the generic OpenCL runtimes in Geekbench, CUDA-optimized binaries produce up to 75% better results. With the ZLUDA libraries handling API translation, unmodified CUDA binaries can now run directly on top of ROCm and Radeon GPUs. Strangely, the ZLUDA port targets AMD ROCm 5.7, not the newest 6.x versions. Only time will tell if AMD continues investing in this approach to simplify porting of CUDA software. However, the open-sourced project now enables anyone to contribute and help improve compatibility. For a complete review, check out Phoronix tests.
54 Comments on AMD Develops ROCm-based Solution to Run Unmodified NVIDIA's CUDA Binaries on AMD Graphics
One interesting tidbit for us here is: So who knows, DLSS on AMD GPUs soon? :roll:
Nvidia's lawyers could easily construe the software to be based on stolen source code, even if it was a clean-room implementation like ZLUDA.
But a subset of PTX (ex: add, multiply, and other common operations) probably could be built for AMD. But no one writing PTX would do the common + or * commands of CUDA/C++. The only reason anyone would dip down to PTX assembly is to take advantage of hardware-specific features and/or hardware-specific performance characteristics.
Neither would ever get done IMO. So the project as it stands probably has hit an end. CUDA/C++ cross-compiled binaries running on AMD/Intel sounds "good enough" to me... while later stages of this coding project sound like a lot of work without much gains...
Once your using their software you are halfway to being a customer.
Camel nose -> tent.
I wonder if it would be possible to make some old physx games run on AMD, would be nice to have it for the Batman Arkham games.
The simple fact that even this translation layer outperforms AMD's native OpenCL stack speaks volumes of how ridiculously bad their compute system has been. This will need to be maintained in order to continue to keep up with newer versions of CUDA, and the funding for that will likely have to come from AMD, but it's a massive step in the right direction.
To be fair, if AMD wants to see CUDA going away, the last thing they need to do is make CUDA run on their GPUs officially. If they support it, then they will be like supporting the only reason today Nvidia makes billions every quarter.
@topic, I find it strange as hell that this gimmick runs faster than AMD's own HIP, it's the same as saying that the company doesn't know its own hardware to create optimized code, in other words, they're underutilizing its capabilities. Incompetence? Lack of investment? What's it lol
Imagine a racecar with a slightly worse transmission (translation layer) and the same engine (GPU) but weighing much less (application optimization). Still faster than one with a better transmission.
But it doesn't matter because as far as I can tell this isn't actually backed by AMD.
They likely don't care.
I might be overstating the complexity involved, but I see this as almost John Carmack tier,
and both Intel and AMD left them in the dust.
Looks like Intel and AMD both merely wanted a proof of concept.
Hopefully, this will end up like MCM chips. Where, (eventually) companies pool resources together for a new platform/company-agnostic standard(s).
Esp. w/ Intel (GP)GPUs and AMD MI Accelerators in Datacenters, even 'the industry' might support efforts for a more-agnostic CUDA-descendant.
ZLUDA
For the CUDA itself, these are not necessary to run apps with it. It is not hardware limited as advertised. Probably most NV stuff isn't but is being advertised as such and blocked so that you dont use it with other products. You all know why. I can bet same can be done to all those DLLS and FG etc.
If so, that means NV is charging extra for something that can work on any GPU. I really find hard to believe that a 3090Ti cant run FG or any given 3000 series card can't because of some hardware limitation.