This is ridiculous and stupid. rdna3 register files are 196k, while RDNA2 has 128kB, and rtx 30/40/50 has 64kB. The one that really needs improvement here is NV) That's why we see 0% IPC SM improvements for RTX40->50
As for L2, NV went to such a waste of crystal area for one (may be not only) function - SER, which reloads all register files via L2. But they were wrong about the scale of this feature. One game in two years is nothing.
Your claims about no changes are not true. The recommendations for configuring the last level cache and memory are also incompetent.
A calm(er) discussion would be appreciated and I do appreciate your response except the, erm, aggressive part. The reason I said I expect changes to L2/registers isn't necessarily to do with capacity. I was thinking more along flexibility - two operands cannot read from the same register bank and the destination registers also can't be both even or odd. As of now, that massive increase in FP32 throughput isn't really realized in games and such because dual-issue capability is limited and reliant on hand-compiling the code.
As for L2, Ada only does a two tier cache subsystem so their L2 also works to prevent memory bandwidth bottlenecks as compute capacity gets scaled up. RDNA2/3's does 4 way though, and L2 plays a smaller role and L3/Infinity cache rather deals with preventing bandwidth bottlenecks. I can see them rebalancing L2 size along with infinity cache, it's a given because it'll depend on memory configuration and compute throughput among other things. I didn't claim anything about no changes though, nor did i recommend anything specific about L3/IC or memory so i'm not sure what you meant?