Microsoft also baked their own in-house chip based on ARM, and I don't think they're reckless enough to not provide an OS that natively supports it.
If they are any clever, they're now using that sur-snap-face-dragon as a great learning tool, and they will figure out what (mostly) proper scheduling looks like by 2022.
By software(?) emulation I presume you mean that the CPU frontend will translate it into different instructions (hardware emulation), which is what modern x86 microarchitectures does already; all FPU, MMX, SSE instructions are converted to AVX. This is also how legacy instructions are implemented.
But there will be challenges when there isn't a binary compatible translation, e.g. FMA operations. Doing these separately will result in rounding errors. There are also various other shuffling etc. operations in AVX which will require a lot of instructions to achieve.
I mean emulation in software. The basic mechanism exists in all modern CPUs: if the decoder encounters an unknown instruction, an interrupt is triggered and the interrupt handler can do calculations instead of that instruction. Obviously, AVX-512 registers have to be replaced by data structures in memory. That's utterly inefficient but would prevent the thread and the process from dying.
In such cases I do wonder if the CPU will just freeze the thread and ask the scheduler to move it, because this detection has to happen on the hardware side.
Is that possible in today's CPUs?
One additional aspect to consider, is that Linux distributions are moving to shipping versions where the entire software repositories are compiled with e.g. AVX2 optimizations, so virtually nothing can use the weak cores, so clearly Intel made a really foolish move here.
Small cores are supposed to have AVX2 (but it's still a guess).
Windows and the default Linux kernel have very little x86 specific code, and even less specific to particular microarchitectures. While you certainly can compile your own Linux kernel with a different scheduler, compile time arguments and CPU optimizations, this is something you have to do yourself and keep repeating every time you want kernel patches.
So with a few exceptions, the OS schedulers are running mostly generic code.
They do however as the dragon tamer said in your link, do a lot of heuristics and adjustments in runtime, including moving threads around for distributing heat. Whether these algorithms are "optimal" or not depends on the workload.
I certainly don't understand much of the description of Linux CFS (
this one or others) but it seems to be pretty much configurable, with all those "domains" and "groups" and stuff. The code itself can be universal but can still account for specifics by means of parameters, like those than can be obtained by the cpuinfo command.
We'll see if this changes when Intel and AMD releases hybrid designs, you better prepare for a bumpy ride.
Yes sir. I just believe there will be a smooth ride after a period of bumpy ride. (And later, some turbulent flight when MS inevitably issues a Windows update that's been thoroughly alpha-tested.)
I think you have nailed it, it seems a marketing solution not an engineering one
The engineering (and business) decision that we see in every generation of CPUs is to have as few variants of silicon as possible. It looks like the design and validation and photomask production and such things, those that need to be done repeatedly for each variant and each stepping, are horribly expensive. So Intel may decide to bake only two variants, for example, 8 big + 8 small + IGP and 4 big + 8 small + IGP, and break down these two into a hundred different laptop and desktop chips.
I expect on the PC we will get people trying to disable the small cores as much as possible. Might even be affected by profile so e.gl in "high performance" profile it doesnt schedule anything to small cores.
Who knows, we may even get a BIOS option to disable them.