Fabric solutions always create more problems than they solve once it becomes this complex, the ring bus approach may be simpler and offer more throughput and lower latentcy if they can get it wide or fast enough.
AMD brought most of this on themselves, technical issues with ZEN, bulldozer, and other designs and latency to cache and memory has never truly been solved for years and "add more cores" has always been the solution. They need to build a memory controller for a 8 core that can be expanded to these insane core and thread counts, where a little latency added to a server workload with custom aware of penalties software handling the threads can mask it.