This is what the nVidia B200 is doing, using two chiplets that behave as a single and are seen as a single die by the software. Or did you mean remotely plausible for AMD to pull off? Granted, the B200 is NOT a consumer level GPU...
Consumer - read: gaming - has different requirements than datacenter compute monsters. Mainly - latency in whatever kind of cooperation the chips are going to be doing. Memory access is wide and expensive especially when having to go to another GPU's VRAM etc. Same as it has always been.
The closest we got was SLI/Crossfire and that was a bunch of driver magic from both sides. SLI/Crossfire died due to new incoming rendering methods that made the whole thing expensive to maintain. Plus, incoming DX12 and Vulkan with their own ways to handle multi-GPU - the implicit and explicit mentioned in the article. Which basically no game developers tried to properly implement.
It is different in that instead of making the entire GPU die on the 5nm node, they took the cache and memory controllers and fabbed them as chiplets on the older 6nm node because these parts do not benefits so much from a node shrink. All of the chiplets were then arranged to make a full die. This was an ingenious way to target the parts of the GPU getting the largest performance benefits of the 5nm node shrink, while saving cost by not using a cutting edge node on the parts that do not. Fantastic engineering in my opinion
The ingenious bit was figuring out which parts of a GPU can be separated. The problem always has been and still is that splitting up the compute array is not doable, at least has not been so far. It has been 15+ years since AMD first publicly said they are trying to go for that. Nvidia has been rumored to look into the same thing for 10+ years as well. Both AMD and Nvidia occasionally publish papers about how to split a GPU but the short version of conclusions has been that you really can't.
Again, the context here is gaming GPU. Something that is very latency-sensitive.
Watch this, especially from 10:50
Well, that actually did directly bring out the bandwidth problem as well. Couple orders of magnitude higher than what they did on CPUs.
This is cool. AMD Crossfire was much easier to run than SLI. AMD already wrote the software to make it run without the need of a dedicated connector. This could be huge for a Dual GPU based system. If it is baked into the driver it would be great.
Dedicated connector had a purpose - it was dedicated and under direct control of GPUs. It was not about bandwidth but more about latency and guaranteed availability. Remember, PCIe does not guarantee that GPU is able to send stuff to the other one over it quickly enough. Plus, in some situations the PCIe interface of the GPU could be busy with something else - reading stuff from RAM for example, textures or whatnot. That was a consideration of whether it was worth doing a separate thing for that and it did seem to benefit for a long while. I guess in the end PCIe simply got fast enough
As has been said, the silicon is dual chiplet but the interconnect would render it as a single chip visible to the OS. It's done through the magic of extremely high speed interconnections...and thus requires precision made substrates to bond to and a decent internal configuration that handles the -relatively- minor communication load losses so that you have 0.9x the power of each GPU visible to the OS as a 1.8x times improvement over the single core...though if they are truly sharing a single memory space it'd be closer to physics limitation and 0.98 or the like.
Nice in theory but interconnects are not magic. The level of interconnect to actually replace some internal connection in GPU - say the connections to shader engines the video was talking about - does come with a cost. And that cost is probably power, given the bandwidth requirements - a lot of power.
So we are clear, my optimism lies with AMD pumping out a dual chip version that works in the middle segment and blows monolithic silicon out of the water. If AMD is rational they give most of that cost savings back to the customer, and sell to the huge consumer base that wants more power without having to take out a mortgage to do so.
Unless there is some new breakthrough development that makes everything simple, this makes no sense to do in mid segment. The added complexity and overhead starts justifying itself when die sizes get very large. In practice - when die sizes are nearing either the reticle limit or yield limits. And this is not a cheap solution for a GPU.
The savings might not be quite as significant as you imagine. RX7000 series is a chiplet design and while cheaper than Nvidia it is basically relatively same as it has always been. These did not end up dominating the generation due to being cheap.