Exactly. GPU manufacturers have already demonstrated an ability to do this with GPUs with their own memory and a considerable distance between the two chips and their memory. Latency is a very real problem (micro-stutter anyone?) but those are also issues that can be mitigated, even in SLI/Crossfire setups. That's a very different animal when instead of having two GPUs with two pools of memory talking over a relatively long PCIe link, think about two GPU chiplets and an I/O die like AMD does with their CPUs now with common GPU memory between the two dies. Latency will be far better than talking over PCIe, shared memory means no copying to another pool of memory, and direct access to the I/O die means that that I/O die can be responsible for handling output to displays.
All in all, I think going this route is a no-brainer from a scalability standpoint. The real issue is cost because it's a more complicated process to do MCM if you're not already doing it at scale. AMD kind of has a leg up on this because they're already doing it and are far further along than just having multiple dies on a chip. The I/O chip with more simple chiplets was a game changer to be honest and I think we'll see that kind of design drive future high performance GPU designs.
All in all, I think nVidia is about where Intel is on the MCM path. They have multiple dies per chip, but they're still doing the direct communication thing which doesn't scale as well as AMD's solution does. It works okay for two dies, 4 dies is complicated and costly (forget the latency penalty talking to a die you don't have a direct connection to,) and any more is inefficient. However with the I/O die, we've seen how many chiplets AMD can cram on to a single chip. Smaller dies are easier to produce consistently with better yields as well.
This is a slight tangent, but what I would like to see is AMD produce a CPU where the chiplets are mixed between CPU and GPU chiplets. We could finally see some really interesting APUs out of AMD if they did.