I’m guessing RDNA 3’s dual issue mode will have limited impact. It relies heavily on the compiler to find VOPD possibilities, and compilers are frustratingly stupid at seeing very simple optimizations. For example, the FMA test above uses one variable for two of the inputs, which should make it possible for the compiler to meet dual issue constraints. But obviously, the compiler didn’t make it happen. We also tested with clpeak, and see similar behavior there. Even when the compiler is able to emit VOPD instructions, performance will only improve if compute throughput is a bottleneck, rather than memory performance.
On the other hand, VOPD does leave potential for improvement. AMD can optimize games by replacing known shaders with hand-optimized assembly instead of relying on compiler code generation. Humans will be much better at seeing dual issue opportunities than a compiler can ever hope to. Wave64 mode is another opportunity. On RDNA 2, AMD seems to compile a lot of pixel shaders down to wave64 mode, where dual issue can happen without any scheduling or register allocation smarts from the compiler..