Negative. The article focused only on training models using predefined containers not pretrained models for inference.
It is super super easy to run models on mi100/250x/300x 7900gre/xt/xtx
I read as far as the paywall goes.
I also... have used ROCm since vega64/mi25
I also... have used Cuda since K80/GTX690
I currently run a hive of mi100s, and a sxm v100 box.
When it comes to inference, MI300x gets day0 support. Training is very lacking and Nvidia's deep bench of software engineers shows.
I expect part 2 of the article to be a bit different.
I am fully aware of the lacking's of AMDs ecosystem, but I am also aware of its strengths.
And the ability to just grab containers and go exists... hugging face is full of native containers for ROCm, Hipify can convert most* things that are cuda native, abet at performance penalty.
But when it comes to inference AMD is not a 2nd class citizen. It has full support with triton, and flash attention...
And Llama 405b fp16 launched exclusively on mi300x, most likely due to the ram requirements.
As it was quantized down to fp8, then it could fit on 8x h100 80gb, but as it was announced, Meta and AMD announced together that all Meta 405b live instances were run on mi300x.
If that is still true or was just a limited exclusivity while it was quantized down... idk...
But claiming things like a mi300x cant run OOB models is just... ignorant af, and not even what the article claims.
It claims bad training performance and strange bugs and as a user of the ecosystem... yup. AMD has strange bugs.
They have known lockups for multi gpu instances... and the solution is to run additional grub parameters, perfectly stable with iommu=pt, randomly hangs without.
But all this information is in the tuning guides. The install process is easy, and hugging face is full of models to run.