gpus being unable to process x86 is years ago, when gpus only did graphical tasks, with todays architectures gpus are very much capable of gpgpu(general purpose computing), amd designed GCN with compute in mind, so did nvidia with fermi
however you have a point and these are pretty much the challenges that both intel and amd go through, but it is possible though
also you need to remember that we are not talking about using a gpu and forgetting the cpu, we are talking about both working together, meaning certain kinds of instructions like the one u stated would most likely be crunched on the cpu, while things like floating point operations would be done by the gpu
also note that there will be no more pci express linking the gpu and cpu when both are integrated together
it seems like you are talking about having a discrete gpu and a cpu doing the method that steevo mentioned but that is NOT the case, we are talking about gpu/cpu integrated in the architecture level
Okay, so the iGPU is linked to the IMC/North Bridge in a Llano chip. The issue is the CPU is still only seeing the iGPU as a GPU. The CPU has no instructions for telling the GPU what to do, everything still goes through the drivers at the software level.
I'm not saying that this is the way things are moving, but with how GPUs and CPUs exist now, there isn't enough coherency between the two architectures to really be able to do as you describe. The CPU still has to dispatch GPU instructions at the software level because there are no instructions that says, use the GPU cores to calculate blar.
Also keep in mind that for a single-threaded, non-optimized application, you have to wait for the last operation to complete before being able to execute the next one. Now this isn't true for modern super-scalar processors, however if the next instruction requires the output of the last, then you have to wait. You can't just execute all the instructions at once. It's not a matter of can it be done, it's a matter of how practical would it be. Right now with how GPUs are designed, providing large sets of data to be calculated at once is the way to go.
What you're describing is some kind of computational "core" that has the ability to do GPU-like calculations on a CPU core and the closest thing to what you're describing is Bulldozer (and most likely its sucessors which will be more like what you're describing) where a Bulldozer "module" contains a number of components for executing threads concurrently where a CPU can have an instruction that computed a Fast Fourier Transform on a matrix (array of data,) where the CPU would have control over so many ALUs to do the task at once where on the other hand, if multiple different ALU operations are being performed at once, only that number of resources are used.
This is what AMD would like to do and BD was step 1.
The issues are:
No software supports it.
No compiler supports it.
No OS supports it.
Could it be fast: Absolutely.
Could is be feasible in the near future: I don't think so.
I think there will be a middle ground in between a CPU and GPU core when all is said and done, we just haven't got there yet and I think it will be some time before we do. I'm just saying what you're describing between a unique CPU and a unique GPU core isn't feasible with how CPUs with an iGPU work and even GCN doesn't overcome these issues.