Oh people, you are making TPU uncool here. The main task that graphics card is supposed to do is to flop. CPU is mostly used for barely parallel, but heavily sequential code, which is mostly arithmetic (a good ALU). CPU can also do floating point operations, but due to low parallelization, it's not really very optimal for that. Graphics cards as well as some old math co-processors, are very good at floating point operations. Those operations are a lot rarer in general computing, but they do dominate in certain tasks. Gaming is one of them, as well as some productivity and scientific computing. Gaming is mostly low precision (relatively), so games often utilize single precision or half precision computing capabilities of cards, meanwhile productivity tasks like CAD work, scientific simulations, medical screening, require same fundamental task, but in more precise form and thus they often utilize double precision (basically same floating points, but a lot more numbers after point, so less rounding, more precision and often less speed, but on consumer cards a lot less speed, due to nV and AMD wanting to milk enterprises with Quadros and Radeon Pros). Obviously other card aspects matter, but flopping also matters a lot. Depending on architecture, it can be hard to achieve maximum theoretical floating point performance, be it difficult to program architectures or be it various software overhead. A good example of difficult to program architecture for is Kepler, in each SMX (streaming multiprocessor), it had 192 cores, compared to Fermi's 32, but also the smaller controller logic. I won't get into details, but after a while it became clear, that Kepler's SMX's controller logic was insufficient to properly distribute load to each CUDA core and required some software trickery to work well, if not, it will essentially be underutilizing CUDA cores and it would lose a lot of performance. Still, even with this unfortunate trait, Kepler was a massive improvement over Fermi, so even less than ideal optimization meant, that it will be faster than Fermi, but the problem became clear, once it became old and devs may have started to not optimize for it as much, so Radeons that at launch were weaker, started to beat faster Kepler cards. All I want to say here, is that floating point performance certainly matters, but due to various reasons, maximum theoretical floating point operation performance may not be achieved. That doesn't make floating point spec useless, it's there, but how much in reality is achieved will inevitably vary. Games are made with various developmental constrains (time, money, team size, human talent, goals and etc) and often don't really extract everything from the cards. As long as they run good enough and as long as good degree of actual floating point performance is achieved, there's very little reason to pour more RnD into optimization. Meanwhile, professional software is often more serious about having as much performance as possible, due to how computationally heavy certain tasks are, thus they are far more motivated (also less limited by time and budget) to optimize for hardware better. That's why some game benchmark toping cards are beaten by supposedly "less" powerful cards. Oh, and nVidia historically gimps double precision floating point performance a lot more on consumer cards, than AMD, that's why AMD cards for a long time dominate in MilkyWay@Home.
So, there's only one question, how much it is easier to tap into all those RDNA 2 teraflops, compared to GCN. Sadly that's hard to quantify. But it seems that it should be substantially easier.