There is no application which only has an UI, of course a lot of sub components are single threaded, but even the simplest of applications use more threads.
In practice, most code is multithreaded.
But in practice, most code is
single-thread bound, due to Ahmdal's law. Which means the code gets faster when you get +Single thread performance. +Multithread performance is minimized due to the nature of Ahmdal's law.
There are exceptions: 3d Modeling renders are closer to Gustafson's law. That is: people aren't primarily interested in rendering times per se. A 3d Render is "set" at 8-hours or ~72 hours per frame (in the case of Marvel / Pixar movies), which is the largest practical time for their workflow. What 3d modelers want is a
better image at the end of those 72 hours, which follows Gustafson's law (you can do more work / more detailed modeling in the same timeframe).
Video games are often multithread-programmed but single-thread bound on the physics thread. AI, Sound, even graphical effects can all complete nearly immediately. But the physics rendering (collision detection. Bullet detection, object-per-object updates) takes the most time, and is often only written in a single thread for maximum consistency. (It is hard to make a multiplayer game all update their physics simultaneously unless you're all doing it in a single-thread in a well defined order and well-defined floating-point rounding)
------
Same game at higher FPS: Ahmdal's law and single-thread bound.
Different game with more effects at the same FPS: Gustafson's law, probably can take advantage of multicore more.
Two different programming styles, two different results. It depends on the game engine, the game programming team and their philosophy with regards to high-performance programming.
RISC is if anything more prone to pipeline stalls, which is actually the reason for Power having 4-way and 8-way SMT.
POWER9 has pipeline stalls because it was designed with lol 2-latency on XOR and Add instructions.
The other "RISC" processors (and I hate that word...): ARM / RISC-V, do not suffer from this behavior. I think IBM intended for POWER9 to be a 4-way or 8-way SMT from the start. When you consider that most business class code (ie: databases) are sitting around waiting for cache-stalls, it makes sense to go higher SMT and higher-latency cores.