Watching what Apple did in such a short time (sure lots of money, but so does Intel have) i think it's inevitable.
Short time? They've been selling devices with their own ARM cores since, what, 2012? They've likely been working on the M1 series since 2016-2017-ish, as that's about how long a ground-up CPU architecture design cycle is.
My impression from everything is that Zen3 does not go higher due to process limitations.
Not only - different processes have different clock scaling characteristics, but it's also highly architecture dependent. Both need to align for clocks to scale better, and AMD seems to be held back by a combination of both.
It is simply up against the very-very steep efficiency curve at 5GHz.
That is absolutely true, but it doesn't change the absolute power draw charactersitics of the chip.
Intel has it (clock capability, not efficiency) a bit better but not by much - 12900K leaked OC results (that reviews seem to confirm) had 330W at 5.2GHz and 400W at 5.3GHz (and still unstable). +20% power for +100MHz (+2% clocks).
Yes, but again, same thing.
M1 runs at 3.2GHz, I wonder what would it do at for example 4GHz both in terms of performance and power consumption or if it would be capable of that.
I'm reasonably sure that the M1 can't clock that much higher - an execution pipeline that wide is likely very, very limited in how high it can scale.
My point is that M1 efficiency lead is practically within these same margins given that CPUs are run in optimal efficiency point - which M1 is but AMD/Intel CPUs generally are not (EPYCs and Xeons maybe).
But that's missing the point. The point is: they are likely architecturally limited to the mid-to-low 3GHz range,
yet they still manage to match the best x86 CPUs in ST. Yes, they spend tons of transistors to do so, have massive caches and an extremely wide core design, but they still manage to match a 5GHz Zen3 core
at the power levels of a ~4.2GHz Zen3 core. That clearly indicates that, as you say, Ryzen 5000 is well out of its efficiency sweet spot at ~5GHz, but it also shows just how significant Apple's IPC and efficiency advantage is. If AMD had to clock down to match their efficiency, they would be significantly slower.
They tried getting x86 license at one point, they have tried creating (high-performance) ARM cores, both with limited success. They have been trying to get CPUs to go along with the whole GPU/HPC thing but fairly unsuccessfully. For the software side of support, if someone pushes an ISA to wide enough adoption in servers - or automotive for that matter - that tends to trickle to other segments as well.
IMO, that is highly, highly doubtful. How many RISC or POWER chips do you see in consumer applications? Also, the ARM ISA is already ubiquitous in consumer mobile spaces, so that's not the issue. The issue is getting a sufficiently high performance core design out there - and one made for automotive and server tasks is likely to have performance characteristics quite unsuited for consumer applications, or a bunch of features that simply aren't used, eating up die area. And as you said, Nvidia already tried - and crucially, gave up on - building consumer SoCs. Yes, that was in part due to gross anticompetitive behaviour from Qualcomm and Intel (bribing or "sponsoring" manufacturers to not use Tegra4, among other things), but they're quite unlikely to get back into that game. They've even been reluctant to produce a new, bespoke SoC for a new Switch, despite that being a higher-margin product (for them, not necessarily Nintendo) that is guaranteed to sell in tens of millions of units. Nvidia has shown zero interest in being an end-user-friendly custodian of ARM.
Well M1 can control more their power consumption because it control everything where on PC, you have to use standard board with standard memory. On M1, it use LPDDR4/5 soldered where on PC it have to use most of the time DIMM that require longer trace, higher power to be stable etc..
AMD and Intel can specify and package literally whatever RAM they want in whatever way they want. The only question is cost and whether any OEMs are willing to pay for it and put it to use. HBM, on-package LPDDR, whatever, they can do it if they want to. There is no system limitation for this. Also, most laptops today use soldered RAM, whether regular DDR4 or LPDDR4X, as most designs are thin-and-lights these days.
Also, all CPU are designed with a goal in mind. x86 core are still mainly designed as a server/desktop CPU first in mind where M1 was designed for Laptop. There are no perfect designs, but designs adapted to the end goal.
Intel has been "mobile-first" in their CPU designs since at least Skylake. That's what sells the most (by an order of magnitude if not more), so that's the main focus.
The main problem or ARM or RISC-V for me is it's they get excited when they get very good initial performance at lower clock with smaller chips. They think that if they scale it out, the gain will be linear. The reality is the first 80% seem easy to get with low power consumption and low transistors count. When they want to get the last 20% to reach current top CPU like x86, it's where things start to become hard.
It's mainly down to the willingness to pay for a sufficiently substantial design. Most ARM SoCs cost
well below $100 for phone or chromebook manufacturers, while AMD and Intel CPUs/APUs easily cost $3-400 if not more for higher end parts. It stands to reason that AMD and Intel can then afford to make bigger designs with larger caches and more substantial core designs with better performance.
They have to implement complex mechanism like Out of order, pre-fetching, SIMD, etc. to feed larger and larger execution ports. This cost power and transistor count. In the end the CPU become so complex that all the advantages of the ISA become negated. In the end, it's all the design choice, the process, the v/f curve that matter.
There isn't a single high-performance core design on the market today that isn't OoO, prefetchers are equally ubiquitous, as is SIMD hardware and ISAs. I fail to see how this would be a disadvantage for ARM and somehow not x86.
And in those days where AMD and Intel are pushing things hard and are no longer trying failed architecture (Bulldozer - AMD) or milking the market (Intel last 10 years), ARM will have an hard time to get competitive since they also have the ISA incompatibility problem.
Apple has demonstrated clearly that an ARM design can compete with the fastest x86 designs. ARM, Qualcomm, Samsung and the rest just need to get their collective thumbs out of their collective rear ends and catch up. The problem seems to be a conservative and overly cost-conscious design approach, more than anything else.
But this is a thing that Microsoft is actively working on. Even more now that Apple have it's own CPU in house and could push performance way ahead. They cannot let AMD and Intel slowing down and milking the x86 market. They are trying to make Windows ISA agnostics. How long will it take? i don't know but all their decision point to this end goals.
I don't think either Intel or AMD are in a position where they could milk anything. Chipmaking is - thankfully - an extremely competitive business once again.
What we would need is the ability to have both binary for many application and the OS would just run them transparently. If not, then it would be nice to have a offline transcoding, and only in last call, a realtime one. with that, the experience would be mainly transparent.
That sounds like a recipe for disaster IMO. Not only would application install sizes (and download sizes) balloon, but potentially having an active thread swap over to a core of an entirely incompatible ISA, being swapped on the fly including whatever data it's working on? That sounds like BSOD hell.
But in the end, nobody will buy a Qualcomm laptop running Windows 11 if the experience is bad or if the performance are much lower for the same price. The thing is the CPU cost in a laptop is just a fraction so even cutting their price a lot on the CPU parts won't make those laptop very good deals if the performance isn't there.
That's true. But Qualcomm's SoCs are dirt cheap compared to Intel/AMD CPUs/APUs - that's why those ARM Chromebooks get so cheap. There isn't much left to cut. They need to step up their performance game, period.
But if they manage to do low power/reasonable performance at a fair price (instead of the super high price intel/AMD charge for their most efficient sku), they might have a chance to do a dent. But that is a big If. It's something they promised but never happened yet.
Yeah, something like a 4xX1+4xA78 design that was also cheap could be interesting for a laptop. But there's no way such a design would be cheap, which lands us back to square one. Anything cheap and ARM inevitably means a bunch of A53 cores, and they just aren't even remotely competitive today.