The review at Anandtech seems to confirm what you're saying: there's either ~17 ns latency between cores on the same CCD, or ~82 ns between CCDs, which is DRAM latency. There's nothing in between. What I thought before is that L3 is always available to the whole CPU, regardless of topology, but with varying amount of latency.
There's nothing saying that L3 has to be shared across all compute resources on the CPU/APU/SoC - that's entirely up to how the chipmaker wants to lay out their cores. Up until Zen I don't think we've ever seen split L3 designs in the consumer/enthusiast space simply because Intel uses a ring bus to connect their consumer chips and a mesh for their high core count HEDT/server stuff, so every core connects to every core there, and
while each core houses a slice of L3 which it obviously has much faster access to, they're all still interconnected through a single network, and latency to non-local L3 slices should be roughly even for any given core. With the introduction of the CCX concept with Zen AMD started consistently splitting their L3 cache with each portion bound to its CCX alone, likely to keep cache latencies relatively consistent inside of the CCX. That's actually a major contributor to the performance of Zen 3: The doubling of CCX size and thus doubling of L3 cache available to each core (with IIRC next to no latency penalty, which is remarkable). Previously you had an intra-CCD latency split due to CCX-to-CCX latency too,
which you can see in AT's Renoir testing.
Actually the 3950X numbers from the review you linked are really interesting, as there's no difference in latency between CCXes on the same chiplet or across them, which would suggest that either Infinity Fabric between each die has next to no latency compared to on-die IF, or that on-die CCX-to-CCX communication goes through the IOD or something similar. The latter sounds very inefficient, and the former sounds utopian, so I wonder what's the explanation there...