I am pretty sure there will be some sort of reduced experts distill of Llama 4 for LLM enthusiast with gaming GPUs.
I have one relativistic speed spaceship exercise I am giving LLMs to solve. Most cloud based >100B models get it right, including 405B parameter Llama3.1. I expected Llama 4 Maverick to solve this easily, but it got only 1/2 exercise right (thought time flows slower for the observer, not for the crew). Llama 4 Scout was totally off and gave very different answers each time. Even my local 27B Gemma3 solved the problem more accurately

Sometimes models get (partially) dumber, same happened with Gemini 1.5 Pro and 2.0 Pro.
1.5 got flawless answer, while 2.0 was off by 182% at first and 598% by second calculation of that exercise (Edit: Gemini 2.5 Pro experimental got the answers right again).
H100 is just 80GB, pretty easy to hit with a couple of 3090s
Well H100 has 2.04 TB/s speed across that 80 GB VRAM, AMDs MI300X has even more bonkers 10.3 TB/s over 192GB. Consumer GPU solutions can be linked over PCI5.0 x16 max and that's 64GB/s. Inferencing speed would be far from ideal as LLMs parameters try to exchange data over 3 different GPUs being severely botllenecked by PCIe link speed.