The performance of the M1 in 5 nanometer is where i would expect it to be. I don't find it game breaking and they already implemented most of the tricks that x86 use. The architecture is getting mature and they aren't really way more performance than x86.
modern processors are just so much more complex than just the instruction set that in the end it do not really matter. at least for pure performance. for low power, x86 still seem to have a bit higher overhead...
But the thing is ARM is getting more powerful by using more transitors and more power. It's not the 4 watt cpu in your phone that is doing that...
I am not an apple fan, but i am glad they do something powerful because we need competitions. AMD is starting to compete again and there is a lot more performance gain each year than when Intel and Nvidia had 0 competition.
Good stuff indeed, good stuff...
I have to say you must be the most optimistic person I've seen towards ARM, if this is in line with your expectations. Up until now even large, high-powered server ARM chips have only competed with x86 on absolute performance in scenarios where they have had a core count advantage and the workload is heavily multithreaded, and that includes 7nm. So if your expectation from 5nm was for an ARM SoC to suddenly take the lead in single-threaded performance, that's quite the jump you were expecting!
This obviously isn't the most revolutionary thing ever, but it's a much bigger achievement than you're giving them credit for.
Clock speed and voltage, it's that simple. L1 caches basically have to run close to the clock speed of the core so their power scales right along with it (badly), that being said at 3 Ghz it's kept in check, for now. It's still inefficient though considering the performance gains from having up to 6 times more memory, you can bet the hit rate didn't go up by 600%. That's why L3 caches grew so much larger over the years because they don't have to scale along side with the cores themselves and why large L1 caches are avoided like the plague even outside x86.
There's definitely an open question of whether such an architecture can scale to higher clocks and power levels at all - I'm rather skeptical of that, at least for this design, though I'd be surprised if whatever they whip up for the Mac Pro doesn't hit ~4GHz at least in low-threaded boosts - there's definitely power and cooling to spare for that in those cases. As for L3 caches, AT reports the LLC on the M1 as 16MB, so that's half the size of Zen 3 on the desktop, though also 4x the size of a Renoir CCX. The more interesting thing is how Apple shares their L2 cache between cores, making comparisons difficult of course. (Not to mention the LLC being shared across all parts of the SoC further making comparisons to current X86 SoCs and CPUs difficult.) You're likely entirely right that the increase in cache hits from the 6x increase in L1 cache is nowhere near 1:1, but it's obviously still worth it in enough workloads for Apple to be willing to go that route, and also clearly efficient enough to not hurt them.
Of course because again, Intel and AMD design their cores with the goal of being able to fit as many as possible on a single package, hence smaller but higher clocked cores which are also less power efficient.
But there is another company which proves my point more than anyone else, ARM themselves, they're also close to extracting similar performance out of cores that are even smaller and consume way less power and area than Apple's. I'll maintain my opinion that Apple's approach is the wrong one long term.
ARM is nowhere close to the performance of this, or even the mobile A14. Even the X1 cores will be way, way behind. Sure, AT's comparison numbers are just from A77 cores, but
look at those performance differences! Sure, peak power is higher, but we've seen plenty of examples of how poorly A-series ARM cores scale upwards in power in various poorly optimized phones ("gaming" phones with high-clocked SoCs etc.). I'm optimistic that the X1 will be a first step towards getting non-Apple ARM cores that are at least in the same ballpark as Apple's cores, but current options are nowhere close to what Apple delivers. Also, the X1 is supposedly a much bigger core than A-series cores.
Let's also not forget that the memory is shared between all the parts inside the SoC, which might affect performance negatively in some scenarios as well.
Why would Apple ever go as high as 90W? We might see some 15-25W parts next, but I doubt we'll ever see anything in the 90W range from Apple.
Shared memory is definitely going to be a severe bottleneck, as is the measly iGPU bandwidth. That might explain a lot of the delta between synthetic/compute workloads (and very light gaming like 3DMark Ice Storm) vs. real world gaming such as SotTR in AT's numbers. It'll definitely be interesting to see how their chips for MBP 16" and iMacs look in this regard - will they go with some sort of dual memory interface? Will they go stupid wide LPDDR4X? DDR4 would frankly shock me at this point.
As for seeing 15-25W parts next ... isn't that what this is? ~<10W in the MBA, probably ~15-20W in the MBP, ~20-24W in the MM. And, seemingly at 3.0/3.1/3.2GHz, which doesn't bode well for the frequency scaling for this design, though of course we don't have low enough level access to actually know for sure. I would expect the next part to be for the MBP 16", in the 30-50W range, and likely with a
much bigger GPU. 8 big cores, 4 small ones, 16-64GB of RAM and ~32 GPU cores?
I think people are forgetting that this 10w chip is competing with chips that have TDPs as high as 35-45 watts. Come on, let's gain a little bit of perspective here. It's not the best, but it's pretty damn good for what it is. If this is what Apple can do with a 10w power budget, imagine what they can do with 45 watts.
~20-25W, not 10W.