Sorry, but you're quite mistaken here. Let's take Zen 3 as an example: increasing single thread performance was the explicit main goal of that design - which is a whole new architecture with every part of the core changed from Zen 2 - and it also did increase ST performance in a very impressive way compared to its predecessors. Yet still just barely beats the M1. AMD has a 105W TDP and ~144W total package power draw to work with. If there was more ST scaling to be found, they have all the headroom they need to exploit it.
Yet their cores individually max out at ~20W. Why? Because the architecture doesn't scale past that, it either grows unstable or overheats. Of course Apple saves a lot on their baseline power from basing this on a mobile architecture and not having something power hungry like multiple IF links, PCIe controllers and external memory, which helps them gain a lot of baseline efficiency. But it's undeniable that the cores in the M1 are
massively efficient
and performant at the same time.
It's obvious that AMD
could have made a wider core with tons of transistors and made a higher IPC, lower clocking design like this. But could they have done so at the same level of efficiency? AnandTech suggests no. Of course Apple has a major advantage here in being vertically integrated and as such not caring that much about SoC costs as long as they can preserve their margins. Neither AMD nor Intel can operate that way, pushing them towards smaller and more affordable core designs. But quite frankly, that isn't much of an argument against the M1 being a major achievement, it just shows that Apple's tactics are working. Too bad for us non-Apple users, really.
As for a 16-core M1 being doable? It would definitely be a gargantuan piece of silicon, likely comparable to the Xbox Series X SoC in area, though of course on 5nm and not 7. I don't see Apple having a problem with that, given that it would be - at the low end - for >$2000 laptops and desktops (with very cut down chips at that price, allowing for salvaging a lot of faulty chips), scaling to well above $5000 for top configurations. The margins are more than there to pay for a big chip. As for performance scaling: they'll of course need to change their memory architecture and design an interconnect that works for that many cores. But that isn't
that hard. In terms of pure performance, if the M1 nearly matches the 5950X at <1/4 the per-core power, there's little reason why a bigger chip wouldn't keep that performance at a minimum. Heat density will definitely be an issue, but one that can be solved by spreading core clusters out across the SoC or adding a vapor chamber.
As for AMD or Intel on 5nm being more competitive: well, obviously to some degree, but I wouldn't expect current TSMC 5nm to clock even close to as high as current TSMC 7nm, so that move might actually lose them performance unless it's also a wider architecture. Would it allow them to catch up in perf/W? Not even close. A single node change doesn't get you 75% power savings.
Wasn't AMD's quite about not wanting hybrid architectures on the desktop, referring to Alder Lake? Hybrid for mobile makes perfect sense, and I don't doubt AMD could scale down Zen to a low power design for that use quite easily. That being said, that patent doesn't describe a method for entirely obviating the need for an architecture-aware scheduler; only allocating threads based on the instruction set only works for workloads where only one set of cores supports that instruction set, such as power hungry AVX loads. You'll still need the scheduler to know to move high performance threads to high performance cores even if they use instruction sets common to the two clusters.
Oh dear lord no. The majority of applications today are 64-bit. That would mean the "little" chip
couldn't run them at all. Windows 10 is AFAIK 64-bit
only, so it couldn't even run the OS! No. Just no.
You really shouldn't be surprised that a ~20-24W hybrid 4c (big) + 4c (little) CPU lags significantly behind an 8c16t all-big-core 65W (88W under all-core loads) CPU. What makes this impressive is that they're managing half your score with less than a third of the power, half the high performance cores, and no SMT.
20-24W is for the Mac Mini, not the Macbook Air.