X86 goes out the window the moment you remove frontend.
The benefit is that it is defined as an accelerator. Just like upgrading a GPU(part of the system, but not part of the CPU) they can improve the front-end. Modified without modifying the cores.
The front-end for a monolithic dual-core can be statically partitioned between cores in multi-core fashion(CMP2).
The front-end for a monolithic dual-core can be competitively shared between cores in vertical multithreaded fashion(VMT2).
The front-end for a monolithic dual-core can be algorithmic-priority partitioned between cores in simultaneous multithreaded fashion(SMT2).
The front-end for a monolithic dual-core can be competitively partitioned between cores in clustered multithreaded fashion(CMT2).
Even in VMT just because it fetches/decodes/dispatches for a single core, doesn't mean there aren't two cores. As the FE isn't actually a defining feature of a core, it's an optional feature. They could always skip the front-end and directly interconnect to the cores themselves.
---
Including the FPU, one would have to prove against a similar design that they made a FPU that is only optimized for single-core usage.
Husky per core; 1x 128-bit add + 1x128-bit mul + 1x128-bit fmisc // 84-entry flight window + 42-entry FPU scheduler + 120-entry PRF <== 32-nm node
Bobcat per core; 1x 64-bit add + 1x 64-bit mul // 40-entry flight window + 18-entry FPU scheduler + 88-entry PRF <== 40-nm node
Bulldozer per dual-core; 2x 128-bit Fused-multiply add, 2x 128-bit Packed Integer vALUs // 2*128-entry flight window + 64-entry FPU scheduler + 160-entry PRF <== 32-nm node
Zen per core; 2x128-bit FMUL, 2x128-bit FADD // 192-entry flight window + 36-entry FPU scheduler + 160-entry PRF <== 14-nm node
Clearly, that is not the case.
40nm => 160nm CPP/120nm Mx, 130nm My, and within Intel's 32nm league with density, thus within spitting of GloFo's 32nm.
32nm => 130nm CPP/104nm Mx, two cores Husky and Bulldozer.
14nm => 78nm CPP/64nm Mx, basically two full nodes from 32nm.