We have reviews showing the 7950X3D at the exact same voltage and settings achieving higher efficiency when set to prefer cache as prefer frequency or stock:
View attachment 354489
View attachment 354492
The Ryzen 9 7950X3D is the spearhead of the AMD Zen 4 X3D lineup. In our performance review we test AMD's new 16-core flagship with dozens of applications and 14 games at up to 4K, to see whether AMD can take the performance crown from Intel, how power consumption is affected, and what can be...
www.techpowerup.com
Fetching data from cache as opposed to main system memory takes less energy and less time. Voltage is a factor in the efficiency but not the only factor.
Your claim was that it decreases performance:
Nothing on that claim eh?
As you put in bold, for games you want to be cache resident. The problem with the 7950X3D is that for certain games the OS places the game on the cores without the cache. Having cache on both would solve that issue, thus resulting in increased performance on the 7950X in select scenarios.
This is definitely not the whole picture as we know X3D has benefits outside of gaming (which was pointed out in the video)
Mind you we also know since that video was released that certain games do sometimes end up on frequency favored threads and those are the instances where performance would be improved. You can simulate the performance uplift the 7950X3D would see by using process lasso. It wouldn't exceed the 7800X3D's performance of course except for in games that use a lot of threads but it would bring the 7950X3D on par with the 7800X3D in games if not slightly ahead.
The 7000 series has 8 cores per CPU chiplet. This isn't relevant for the vast majority of games and applications.
You are also assuming that said cores need data from a different CCD. Having X3D on both CCDs is likely to increase the number of local cache hits. There is a reason AMD originally intended to use X3D for it's enterprise CPUs.
You you are assuming that a latency increase as a result of the the lower clocks is not more than offset by latency decreases from having a large fat cache stacked on the chip.
Your "basic logic" is drawing conclusions that aren't stated in any of the sources you provide as usual.