IBM willingness to do PC and Jobs' greediness is the reason why x86 is a thing.
Oh, and IBM would not have picked up lolwhatwhohaveheardaboutthiscompany Intel, if not AMD (that's why both co-existed in x86 space in the first place).
This isn't a debate about how x86 came to be something, it's about how x86 is part of the reason why modern day gaming consoles can be backwards compatible.
Instruction set doesn't matter at all, it's about sticking with one architecture,, whatever it is.
What the heck are you talking about? The instruction set(s) always influence architecture. There are demands for certain extensions that things exists or work in a particular way. For example, 64-bit extentions to x86 don't merely call for increased address register widths, they call for additional registers as well. Instruction sets and the way they're designed
directly impacts CPU architecture because those instructions are the very thing you need your CPU to do quickly and efficiently. Sure, there are some fundamental similarities between CPUs like the need for a memory controller, or the requirement of at least one ALU, or the requirement of a bus that enables these thing to communicate but, when it comes down to implementation, the instruction set dictates the design of the CPU because you're not writing an instruction set around a CPU, you're building a CPU around an instruction set.
So, now that you've properly derailed the direction this should have been going, regardless of history, x86 is still a reason why it's backwards compatible and how CPUs are designed is completely beside the point.
CELL was a major disaster for Sony, weirdo CPU (1 normal core, 8 vector cores, go multi-platform for it) for R&D of which Sony alone spent 4 billion $.
Cell was Sony's Bulldozer. "Here, have a bunch of parallel throughput but, when it comes to coordination and serial tasks, good luck buddy." It was like forcing game developers to write their games mostly using GPGPU compute. It's great if hardware cost less than devs but, devs aren't exactly cheap either and are a recurring cost. The world is familiar with x86, which makes it that much easier to adopt and use.
Floating Point Operations Per Second (which GPUs can crunch plenty, unlike CPUs), which "per execution unit" did you read between which lines?
The workload has everything to do with it. What GPUs are intended to do and what CPUs are intended to do are very different. Sure, being general purpose machines, they can solve any Turing complete problem but, that isn't to say that a generic measurement accurately describes the behavior of the kind of workload that will primarily be used on that device. FLOPS makes sense when we're talking about parallel compute where time isn't a factor and the kind of code that's always running is predictable but, not when we're trying to describe something that must be responsive in the real world, where several different things are happens in consort. Such a thing would require some measure of latency, not strictly bandwidth or throughput which is what FLOPS describes.
tl;dr: The world knows x86, therefore backwards compatibility is relatively easy despite any history and comparing GPUs and CPUs with the same measure is dumb because they're doing very different workloads, so why would the same workload that's used to describe both be a good idea to describe how they do different things?