• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.
  • The forums have been upgraded with support for dark mode. By default it will follow the setting on your system/browser. You may override it by scrolling to the end of the page and clicking the gears icon.

Apple M1 Beats Intel "Willow Cove" in Cinebench R23 Single Core Test?

Impressive, especially considering R23 is a continuous, throttling-inducing load designed to bypass short-term performance boosts. 3.1GHz isn't particularly impressive, though the IPC on show clearly demonstrates that at least in some workloads you can make up for clock speeds through a wider core. AnandTech's recent A14 article does show how Apple is managing some things that current X86 designs aren't even close to in terms of architectural width and caches, so it'll be really interesting to see how this in turn affects future X86 development.

Also, of course, it kind of demonstrates what can happen if you give a chip development company unlimited resources. No wonder they're doing things nobody else can.
 
The top score in the screen shot is Intel i7-1165G7 with a single score of 1532
 
True. I tend to forget that this isn't a classic 8 core, but 4+4. So there's a bit of "cheating" there. It's a design philosophy that works well for those machine, but AMD already said that they are not interested in an Hybrid design.

Well, on paper AMD got the tech to make a great soc: zen3 core (low power zen 2 was already impressive in it's own right), rdna 2 with the AV1 acceleration...machine learning for the consumer is the thing that they don't do yet, but it's a whole mess on windows (intel with their odd deep link, and nvidia tensor cores are only for "beefy" laptops.) But then there is the software side. Open cl being what it is, cuda is the only thing close to the metal api on windows. Adobe and Nvidia have a love story where AMD is a third wheel at best.
Well progress is finally being made on your later points with windows now supporting openCl and openGl.
GPU can be used for ML though that will remain an advantage for the M1's specific unit's, both Intel with one API and AMD with God knows what (they do have ML in chip already on Ryzen for process optimization) tbf could incorporate better AI and Ml hardware catching Up to the M1's main advantage.
 
Also, of course, it kind of demonstrates what can happen if you give a chip development company unlimited resources.

Really ? Apparently everyone said the same about Intel, that they also have an unlimited budget.

No wonder they're doing things nobody else can.

They are doing things nobody else have an interest in doing. The segments in which Apple and Intel/AMD operate have little overlap actually, Apple is basically an exclusively mobile silicon company and Intel/AMD are ... not. Their designs are first and foremost meant for servers where the goal is to fit as much compute as possible on a single package, what we get on desktops and even on mobile is a cut down version of whatever that is and at core are basically the same architectures just configured differently.
 
Last edited:
"Way to go Dall..uhhh Apple"

Not surprised here, Apple seems on course to truly separate itself. Next up their own discreet card? Maybe?
 
Really ? Apparently everyone said the same about Intel, that they also have an unlimited budget.

I would add the "and not being beholden to an open ecosystem burdened by legacy stuff" to the system.

Obviously, but then they're not going to replace Those needing more powerful systems like some are implying.
And being the fastest is Exactly what apple reported on being just days ago without proof.

I will never understand why people take PR statements at face value. "Fastest" obviousy comes with a fistful of asterisks.
 
If the M1 does anything, it might finally get Microsoft to hurry the hell up and ditch all the legacy crap that is bogging down x86 Windows.

With x86 and Windows playing a chicken-and-egg game, it's never going to become a more streamlined architecture until someone takes the first step. AMD and Intel can't afford to cull features in hardware until they are dropped from the OS, and Microsoft is still unwilling to completely let go of 32-bit OS even in this day and age....

Well the world is not always a locked down gated bs environment right, so they are taking their time to do it, piece by piece. Making OS look and feel like a mobile OS, and then removing power featrures like Control Panel and then adding a UWP store with UWP drivers and MSIX to make exes go away, same stuff in a different method. Add the WaaS model, feels amazing right latest and greatest right on your desktop every day and every 6 months for big OS release. Wonderful right ? vs the old crappy Win7 RTM release which doesn't even need any sort of NTkernel updates to run the 2020 software.

"Legacy Crap bogging down Windows"
Wonder how the legacy crap is bogging the windows. Windows got it's strength from the software compatibility not gated bs like Apple. At Apple utopia, all is locked down only passed when Apple almighty says so. Do you even realize how big 32bit axe would be for the Applications ? Win10 32bit is already being phased out and so are GPU drivers, the Application support is very important for an OS to be very robust and non user restrictive. Did Linux got bogged down by all Legacy crap ?

M1 won't do anything, Apple users will buy their BGA riddled walled garden Macs no matter what, and Windows machines will be sold as they are, people need GPUs and OS supporting their Software requirements, until Apple catches up to AMD or Nvidia that day is not going to come.
 
Last edited:
Luckily we now have proper tests.
Yes, it's very fast in single threaded workloads, no it's not what Apple claims overall.
No, you still don't want to game on Apple hardware.

119145.png

119372.png

119361.png

119360.png

 
Really ? Apparently everyone said the same about Intel, that they also have an unlimited budget.
Apple is AFAIK the highest valued company on the planet, and the one with the biggest cash hoard too. Intel has never been even close to that. So while Intel's R&D budgest might have been "unlimited" in terms of the tech industry at the time, Apple's R&D budgets are likely only limited by how many people it's possible for them to hire, how many concurrent projects it's possible for them to run, and what they are interested in doing.

They are doing things nobody else have an interest in doing. The segments in which Apple and Intel/AMD operate have little overlap actually, Apple is basically an exclusively mobile silicon company and Intel/AMD are ... not. Their designs are first and foremost meant for servers where the goal is to fit as much compute as possible on a single package, what we get on desktops and even on mobile is a cut down version of whatever that is and at core are basically the same architectures just configured differently.
Intentionally or not, you are completely misunderstanding what I said. I was referring to the A14/M1 Firestorm "big core" microarchitecture and its features, and not mobile-specific features at that, just ones that massively increase the throughput of the core. An 8-wide decode block compared to 4-wide in x86; a re-order buffer 2-3x the size of Intel and AMD's newest architectures, 4x/2x the FP/clock throughput of current Intel architectures/Zen 3, dramatically deeper load/store queues than any other architecture, L1 caches 6x the size of current X86 architectures (seemingly without a latency penalty!) ....and the list goes on. These aren't mobile-specific features, these are common features across all modern CPU architectures. And somehow Apple is able to massively beat the competition on several fronts, doing things that seems to be impossible for the others. Some of it can be blamed on the X86 ISA (decode block width, for example), but not everything. And if you're trying to tell me that AMD and Intel could grow their L1 caches 6x without increasing latency by a single cycle, yet are choosing not to, then you need to provide some proof for that, because that's an outlandish thing to even suggest. If AMD and/or Intel could do what Apple is doing here to increase IPC without tanking performance through either killing power efficiency, introducing massive latency, etc., they would very clearly do so.

The thing is, this design is likely to scale extremely well for servers and workstations, as the clock speeds they are reaching are the same as the ones hit by current top-end multi-core server chips, just at much lower power. They'd obviously need to tweak the architecture in various ways and find a way to couple together more than 4+4 cores efficiently without introducing bottlenecks or massive latency, but ... given what they've already done, that should be doable. Whatever silicon Apple makes for the next Mac Pro, it's looking like it'll be extremely impressive.

Luckily we now have proper tests.
Yes, it's very fast in single threaded workloads, no it's not what Apple claims overall.
No, you still don't want to game on Apple hardware.

119145.png

119372.png

119361.png

119360.png

I would really, really like to see what RotTR performance would look like if it was compiled for this architecture rather than run through a translation layer. Even if gaming performance is lacklustre overall, that is damn impressive.
 
interesting comments to read, so passionate.

Personally I could not care less, by their own design/philosophy I dont have anything to do with Apple and probably never will.
They could make a low power mobile chip that is as fast as an RTX3090 and it would not affect me in the slightest, I have nothing to do with Apple.
 
It not only outperforms intel by a large margin it does it with massive power efficiency. And just a note to everyone... don't choose sides. If you think Apple is any worse than Google or Microsoft as far as culture goes then you are sorely mistaken. They are all equally terrible.
 
I would really, really like to see what RotTR performance would look like if it was compiled for this architecture rather than run through a translation layer. Even if gaming performance is lacklustre overall, that is damn impressive.

Agreed ...

This is the version 1 of a new/unsupported arch in some instances BEATING the next gen best. For a first attempt it's quite insane how fast this is.
 
An 8-wide decode block compared to 4-wide in x86; a re-order buffer 2-3x the size of Intel and AMD's newest architectures, 4x/2x the FP/clock throughput of current Intel architectures/Zen 3, dramatically deeper load/store queues than any other architecture, L1 caches 6x the size of current X86 architectures (seemingly without a latency penalty!) ....and the list goes on.

All of which make it horrendously inefficient in terms of area and transistor budget in order to extract the same performance Intel and AMD do with much smaller cores and on an bigger node by the way, am I really the only one noticing that ? There is a very good reason practically nobody is making cores this wide, it's a scalability dead end, everyone figured this out in the late 90s.

And if you're trying to tell me that AMD and Intel could grow their L1 caches 6x without increasing latency by a single cycle, yet are choosing not to, then you need to provide some proof for that, because that's an outlandish thing to even suggest. If AMD and/or Intel could do what Apple is doing here to increase IPC without tanking performance through either killing power efficiency, introducing massive latency, etc., they would very clearly do so.

Of course they can and they will gradually make wider cores. The reason Apple can use a 128KB cache (which is 4 times larger not 6, and that is for the L1 Data cache not the Instruction cache) is because they use a minimum 16KB page and not 4KB hence a cache that is 4 times larger with 8-way associativity, that's all there is to it and I don't have to explain why a cache that is 4 time bigger with the same associativity is pretty terrible and inefficient. I have no idea why everyone thinks Apple is using some sort of magic fairy dust to make these things.

Edit: I got confused that was for a 128KB cache, for a 192KB one that is 6 times larger it's basically the same explanation, they can do it because of the 16KB page.
 
Last edited:
Agreed ...

This is the version 1 of a new/unsupported arch in some instances BEATING the next gen best. For a first attempt it's quite insane how fast this is.
It's easy for Apple to squeeze out some extra performance though, as they control the compilers, the OS, the driver layer, the drivers and now the hardware.
No other company in the world has a complete in-house platform that can be tuned to this degree for optimal performance.
It gives them what one could almost call an unfair advantage.

However, we still have to seem them scale this, as right now we're looking at an iPad with a keyboard, on steroids.
This is not going to be a hardware solution that will be competitive in all aspects and so far we're barely scratching the surface, as all the benchmarks so far are somewhat limited.
Single core performance is no longer as important as it once was and judging by the benchmarks, it's running into problems keeping up once we go beyond the performance cores.

Not saying Apple did a bad job, I'm just not buying into all the hype, as Apple clearly oversold this when it was announced.
Yes, it's going to be good enough of a work computer for a lot of people, but it's clearly not for everyone.
 
All of which make it horrendously inefficient in terms of area and transistor budget in order to extract the same performance Intel and AMD do with much smaller cores and on an bigger node by the way, am I really the only one noticing that ? There is a very good reason practically nobody is making cores this wide, it's a scalability dead end, everyone figured this out in the late 90s.
Nope, not the only one noticing that, and there's no doubt that these chips are really expensive compared to the competition. Sizeable silicon, high transistor counts, and a very expensive node should make for quite a high silicon cost. There's another factor to the equation though: Intel (not really, but not that far behind) and AMD are delivering the same performance with much smaller cores and on a bigger node but with several times the power consumption. That's a notable difference. Of course comparing a ~25W mobile chip to a 105W desktop chip is an unfair metric of efficiency, but even if Renoir is really efficient for X86, this still beats it.
Of course they can and they will gradually make wider cores. The reason Apple can use a 128KB cache is because they use a minimum 16KB page and not 4KB hence a cache that is 6 times larger with 8-way associativity, that's all there is to it and I don't have to explain why a cache that is 6 time bigger with the same associativity is pretty terrible and inefficient. I have no idea why everyone thinks Apple is using some sort of magic fairy dust to make these things.
Intel grew their L1 cache from 32k to 48k with Ice Lake, which caused its latency to increase from 4 to 5 cycles. Apple manages a cache 3x the size with 3/5 the latency. Regardless of associativity, Apple is managing something that nobody else is. Also, if it's such an inefficient design, how come they're beating every other architecture in efficiency, even when factoring in the node advantage? One would expect a "pretty terrible and inefficient" L1 cache to be rather harmful to overall SoC efficiency, right?

As for making wider cores: yes, they likely will, but this is actually an area where x86 is a real problem. To quote AT:
AnandTech said:
Other contemporary designs such as AMD’s Zen(1 through 3) and Intel’s µarch’s, x86 CPUs today still only feature a 4-wide decoder designs (Intel is 1+4) that is seemingly limited from going wider at this point in time due to the ISA’s inherent variable instruction length nature, making designing decoders that are able to deal with aspect of the architecture more difficult compared to the ARM ISA’s fixed-length instructions.
So they can, but they would need to take a significant efficiency penalty, or find some way to mitigate this.
 
The performance of the M1 in 5 nanometer is where i would expect it to be. I don't find it game breaking and they already implemented most of the tricks that x86 use. The architecture is getting mature and they aren't really way more performance than x86.

modern processors are just so much more complex than just the instruction set that in the end it do not really matter. at least for pure performance. for low power, x86 still seem to have a bit higher overhead...

But the thing is ARM is getting more powerful by using more transitors and more power. It's not the 4 watt cpu in your phone that is doing that...

I am not an apple fan, but i am glad they do something powerful because we need competitions. AMD is starting to compete again and there is a lot more performance gain each year than when Intel and Nvidia had 0 competition.

Good stuff indeed, good stuff...
 
It's easy for Apple to squeeze out some extra performance though, as they control the compilers, the OS, the driver layer, the drivers and now the hardware.
This is an important point that I think people are overlooking.

The closest thing to compare that to in the PC space is game consoles vs PC gaming; Last Gen XBox hardware is pitiful by modern standards but if you took the equivalent Radeon R7 260 DDR3 that the XBox One has in it and tried to run the PC version of an Xbox One game on that R7 260, you'd be greeted by a low-res, low-quality slideshow.

Meanwhile the same hardware in the XBone is getting 1080p30 with improved graphics quality. That's the power of optimising and compiling for a single-purpose, single-spec platform.
 
I really want to see what they can pull of in a 30w, 60w, and 90w package. This is some cool stuff.
 
This is an important point that I think people are overlooking.

The closest thing to compare that to in the PC space is game consoles vs PC gaming; Last Gen XBox hardware is pitiful by modern standards but if you took the equivalent Radeon R7 260 DDR3 that the XBox One has in it and tried to run the PC version of an Xbox One game on that R7 260, you'd be greeted by a low-res, low-quality slideshow.

Meanwhile the same hardware in the XBone is getting 1080p30 with improved graphics quality. That's the power of optimising and compiling for a single-purpose, single-spec platform.
Exactly and the 10Watts the M1 pulls is also shared across the entire Soc so the CPU and GPU performance will be heavily compromised during actual gaming for example or any great use of both at the same time, where's the performance at then?!.
 
Also, if it's such an inefficient design, how come they're beating every other architecture in efficiency, even when factoring in the node advantage? One would expect a "pretty terrible and inefficient" L1 cache to be rather harmful to overall SoC efficiency, right?

Clock speed and voltage, it's that simple. L1 caches basically have to run close to the clock speed of the core so their power scales right along with it (badly), that being said at 3 Ghz it's kept in check, for now. It's still inefficient though considering the performance gains from having up to 6 times more memory, you can bet the hit rate didn't go up by 600%. That's why L3 caches grew so much larger over the years because they don't have to scale along side with the cores themselves and why large L1 caches are avoided like the plague even outside x86.

There's another factor to the equation though: Intel (not really, but not that far behind) and AMD are delivering the same performance with much smaller cores and on a bigger node but with several times the power consumption.

Of course because again, Intel and AMD design their cores with the goal of being able to fit as many as possible on a single package, hence smaller but higher clocked cores which are also less power efficient.

But there is another company which proves my point more than anyone else, ARM themselves, they're also close to extracting similar performance out of cores that are even smaller and consume way less power and area than Apple's. I'll maintain my opinion that Apple's approach is the wrong one long term.
 
Exactly and the 10Watts the M1 pulls is also shared across the entire Soc so the CPU and GPU performance will be heavily compromised during actual gaming for example or any great use of both at the same time, where's the performance at then?!.
Let's also not forget that the memory is shared between all the parts inside the SoC, which might affect performance negatively in some scenarios as well.

I really want to see what they can pull of in a 30w, 60w, and 90w package. This is some cool stuff.
Why would Apple ever go as high as 90W? We might see some 15-25W parts next, but I doubt we'll ever see anything in the 90W range from Apple.
 
M1 won't do anything, Apple users will buy their BGA riddled walled garden Macs no matter what, and Windows machines will be sold as they are, people need GPUs and OS supporting their Software requirements, until Apple catches up to AMD or Nvidia that day is not going to come.

Just to share another side of the story, I have ordered an M1 Pro as my first mac, which I otherwise use Windows/Ubuntu at home and work for almost 3 decades.
What interested me in is its promising performance (if software transitions well) coupled with impressive looking battery life (guess it needs to be tested).
While I need my GPU to play ray-traced games, that's on my desktop; my laptop could be something else and now I am motivated to give it a shot.

Let's also not forget that the memory is shared between all the parts inside the SoC, which might affect performance negatively in some scenarios as well.


Why would Apple ever go as high as 90W? We might see some 15-25W parts next, but I doubt we'll ever see anything in the 90W range from Apple.
Maybe one day they scale up M1 to something that can compete with a 5950X?
Not sure if it is technically possible with their design, but ARM definitely is picking up some momentum lately.
 
Maybe one day they scale up M1 to something that can compete with a 5950X?
Not sure if it is technically possible with their design, but ARM definitely is picking up some momentum lately.
Oh, I have no doubt they'll improve the performance, but Wattage doesn't equal performance.
Depending on what Apple's plan is, they're obviously going to have to scale the performance upwards.
How they'll do this, I guess we're going to have to wait and see.
 
Back
Top