Inside a CCX, every core has to be linked to every other core. Having 4 cores per CCX means there have to be 6 links established for this all-to-all communication. If they went with more than 4 cores per CCX, that number of links grows very quickly. 5 cores would require 10 links, 6 cores need 15 links, 7 cores need 21 links and 8 cores need 28 links: that's a huge amount of inter-core links, that would take lots of place and make the design innefficient. My guess, based on AMD's modular strategy, is that the CCX will remain at 4 cores, and that each new Zen 2 die still has two 4-cores CCXs. The inter CCXs latency will remain a thing, although hopefully IF2 will help.
Now that they have taken all I/O out of the zen dice (yes, i too say "dice", seems sensible), they can change the former strategy of having 2 designs: one die with 1 CCX and an iGPU for the Raven Ridge line, and another with 2 CCXs for everything else. Now they could have several chiplets connected via IF2 on the same substrate to serve different markets. So for example 1 Zen 2 die + I/O + iGPU for Picasso, 2 zen 2 dice + I/O for regular Ryzen 3xxx (no iGP), 4 zen 2 dice + I/O for Threadripper 3, and as seen yesterday 8 dice + I/O for Epyc 2. Only the I/O would have to change from one design to another, and not even, they might be able to just use two versions of the I/O die, one for TR & Epyc, one for Ryzen 3xxx (both Picasso and regular Ryzen).
In any case, i think an 8 cores CCX is out of question, out of pure complexity of the design. We'll see soon enough!