AMD's Pain Point is ROCm Software, NVIDIA's CUDA Software is Still Superior for AI Development: Report
The battle of AI acceleration in the data center is, as most readers are aware, insanely competitive, with NVIDIA offering a top-tier software stack. However, AMD has tried in recent years to capture a part of the revenue that hyperscalers and OEMs are willing to spend with its Instinct MI300X accelerator lineup for AI and HPC. Despite having decent hardware, the company is not close to bridging the gap software-wise with its competitor, NVIDIA. According to the latest report from SemiAnalysis, a research and consultancy firm, they have run a five-month experiment using Instinct MI300X for training and benchmark runs. And the findings were surprising: even with better hardware, AMD's software stack, including ROCm, has massively degraded AMD's performance.
"When comparing NVIDIA's GPUs to AMD's MI300X, we found that the potential on paper advantage of the MI300X was not realized due to a lack within AMD public release software stack and the lack of testing from AMD," noted SemiAnalysis, breaking down arguments in the report further, adding that "AMD's software experience is riddled with bugs rendering out of the box training with AMD is impossible. We were hopeful that AMD could emerge as a strong competitor to NVIDIA in training workloads, but, as of today, this is unfortunately not the case. The CUDA moat has yet to be crossed by AMD due to AMD's weaker-than-expected software Quality Assurance (QA) culture and its challenging out-of-the-box experience."
"When comparing NVIDIA's GPUs to AMD's MI300X, we found that the potential on paper advantage of the MI300X was not realized due to a lack within AMD public release software stack and the lack of testing from AMD," noted SemiAnalysis, breaking down arguments in the report further, adding that "AMD's software experience is riddled with bugs rendering out of the box training with AMD is impossible. We were hopeful that AMD could emerge as a strong competitor to NVIDIA in training workloads, but, as of today, this is unfortunately not the case. The CUDA moat has yet to be crossed by AMD due to AMD's weaker-than-expected software Quality Assurance (QA) culture and its challenging out-of-the-box experience."