The problem with these GPUs is that there are more SMs than the front-end can feasibly feed at times. If the 4090 scaled decently with it's SM count relative to the 4080, it should have been almost 60% faster, instead it's more like 35% faster on a good day. And it's not just the 4090 that's "underperforming" either, the 4080 "should" be almost 230% faster than a 4060, but in practice it's closer to 150% faster.
Civilization 7 is most using a very simple rendering pipeline, considering it runs on a Nintendo Switch with 2 Maxwell SMs at ~750 MHz. The explanation for why Blackwell seems to run the game "better" might be as simple as some select instructions executing significantly faster on Blackwell than on Ada and Ampere. Intel's B580 is also performing unusually well here, while RDNA2 seems to run especially poorly.