Tuesday, November 17th 2020

Apple M1 Beats Intel "Willow Cove" in Cinebench R23 Single Core Test?

Maxon ported the its latest Cinebench R23 benchmark to the macOS "Big Sur" Apple M1 platform, and the performance results are groundbreaking. An Apple M1-powered MacBook Pro allegedly scored 1498 points in the single-core Cinebench R23 test, beating the 1382 points of the Core i7-1165G7 reference score as tested by Maxon. These scores were posted to Twitter by an M1 MacBook Pro owner who goes by "@mnloona48_" The M1 chip was clocked at 3.10 GHz for the test. The i7-1165G7 uses Intel's latest "Willow Cove" CPU cores. In the same test, the M1 scores 7508 points in the multi-core test. If these numbers hold up, we can begin to see why Apple chose to dump Intel's x86 machine architecture in favor of its own Arm-powered custom silicon, as the performance on offer holds up against the highest IPC mobile processors in the market.
Sources: mloona48_ (Twitter), via Hexus.net Forums
Add your own comment

96 Comments on Apple M1 Beats Intel "Willow Cove" in Cinebench R23 Single Core Test?

#26
OGoc
The top score in the screen shot is Intel i7-1165G7 with a single score of 1532
Posted on Reply
#27
TheoneandonlyMrK
dyonoctisTrue. I tend to forget that this isn't a classic 8 core, but 4+4. So there's a bit of "cheating" there. It's a design philosophy that works well for those machine, but AMD already said that they are not interested in an Hybrid design.

Well, on paper AMD got the tech to make a great soc: zen3 core (low power zen 2 was already impressive in it's own right), rdna 2 with the AV1 acceleration...machine learning for the consumer is the thing that they don't do yet, but it's a whole mess on windows (intel with their odd deep link, and nvidia tensor cores are only for "beefy" laptops.) But then there is the software side. Open cl being what it is, cuda is the only thing close to the metal api on windows. Adobe and Nvidia have a love story where AMD is a third wheel at best.
Well progress is finally being made on your later points with windows now supporting openCl and openGl.
GPU can be used for ML though that will remain an advantage for the M1's specific unit's, both Intel with one API and AMD with God knows what (they do have ML in chip already on Ryzen for process optimization) tbf could incorporate better AI and Ml hardware catching Up to the M1's main advantage.
Posted on Reply
#28
Vya Domus
ValantarAlso, of course, it kind of demonstrates what can happen if you give a chip development company unlimited resources.
Really ? Apparently everyone said the same about Intel, that they also have an unlimited budget.
ValantarNo wonder they're doing things nobody else can.
They are doing things nobody else have an interest in doing. The segments in which Apple and Intel/AMD operate have little overlap actually, Apple is basically an exclusively mobile silicon company and Intel/AMD are ... not. Their designs are first and foremost meant for servers where the goal is to fit as much compute as possible on a single package, what we get on desktops and even on mobile is a cut down version of whatever that is and at core are basically the same architectures just configured differently.
Posted on Reply
#29
DeathtoGnomes
"Way to go Dall..uhhh Apple"

Not surprised here, Apple seems on course to truly separate itself. Next up their own discreet card? Maybe?
Posted on Reply
#30
Frick
Fishfaced Nincompoop
Vya DomusReally ? Apparently everyone said the same about Intel, that they also have an unlimited budget.
I would add the "and not being beholden to an open ecosystem burdened by legacy stuff" to the system.
theoneandonlymrkObviously, but then they're not going to replace Those needing more powerful systems like some are implying.
And being the fastest is Exactly what apple reported on being just days ago without proof.
I will never understand why people take PR statements at face value. "Fastest" obviousy comes with a fistful of asterisks.
Posted on Reply
#31
Ashtr1x
Chrispy_If the M1 does anything, it might finally get Microsoft to hurry the hell up and ditch all the legacy crap that is bogging down x86 Windows.

With x86 and Windows playing a chicken-and-egg game, it's never going to become a more streamlined architecture until someone takes the first step. AMD and Intel can't afford to cull features in hardware until they are dropped from the OS, and Microsoft is still unwilling to completely let go of 32-bit OS even in this day and age....
Well the world is not always a locked down gated bs environment right, so they are taking their time to do it, piece by piece. Making OS look and feel like a mobile OS, and then removing power featrures like Control Panel and then adding a UWP store with UWP drivers and MSIX to make exes go away, same stuff in a different method. Add the WaaS model, feels amazing right latest and greatest right on your desktop every day and every 6 months for big OS release. Wonderful right ? vs the old crappy Win7 RTM release which doesn't even need any sort of NTkernel updates to run the 2020 software.

"Legacy Crap bogging down Windows"
Wonder how the legacy crap is bogging the windows. Windows got it's strength from the software compatibility not gated bs like Apple. At Apple utopia, all is locked down only passed when Apple almighty says so. Do you even realize how big 32bit axe would be for the Applications ? Win10 32bit is already being phased out and so are GPU drivers, the Application support is very important for an OS to be very robust and non user restrictive. Did Linux got bogged down by all Legacy crap ?

M1 won't do anything, Apple users will buy their BGA riddled walled garden Macs no matter what, and Windows machines will be sold as they are, people need GPUs and OS supporting their Software requirements, until Apple catches up to AMD or Nvidia that day is not going to come.
Posted on Reply
#33
Valantar
Vya DomusReally ? Apparently everyone said the same about Intel, that they also have an unlimited budget.
Apple is AFAIK the highest valued company on the planet, and the one with the biggest cash hoard too. Intel has never been even close to that. So while Intel's R&D budgest might have been "unlimited" in terms of the tech industry at the time, Apple's R&D budgets are likely only limited by how many people it's possible for them to hire, how many concurrent projects it's possible for them to run, and what they are interested in doing.
Vya DomusThey are doing things nobody else have an interest in doing. The segments in which Apple and Intel/AMD operate have little overlap actually, Apple is basically an exclusively mobile silicon company and Intel/AMD are ... not. Their designs are first and foremost meant for servers where the goal is to fit as much compute as possible on a single package, what we get on desktops and even on mobile is a cut down version of whatever that is and at core are basically the same architectures just configured differently.
Intentionally or not, you are completely misunderstanding what I said. I was referring to the A14/M1 Firestorm "big core" microarchitecture and its features, and not mobile-specific features at that, just ones that massively increase the throughput of the core. An 8-wide decode block compared to 4-wide in x86; a re-order buffer 2-3x the size of Intel and AMD's newest architectures, 4x/2x the FP/clock throughput of current Intel architectures/Zen 3, dramatically deeper load/store queues than any other architecture, L1 caches 6x the size of current X86 architectures (seemingly without a latency penalty!) ....and the list goes on. These aren't mobile-specific features, these are common features across all modern CPU architectures. And somehow Apple is able to massively beat the competition on several fronts, doing things that seems to be impossible for the others. Some of it can be blamed on the X86 ISA (decode block width, for example), but not everything. And if you're trying to tell me that AMD and Intel could grow their L1 caches 6x without increasing latency by a single cycle, yet are choosing not to, then you need to provide some proof for that, because that's an outlandish thing to even suggest. If AMD and/or Intel could do what Apple is doing here to increase IPC without tanking performance through either killing power efficiency, introducing massive latency, etc., they would very clearly do so.

The thing is, this design is likely to scale extremely well for servers and workstations, as the clock speeds they are reaching are the same as the ones hit by current top-end multi-core server chips, just at much lower power. They'd obviously need to tweak the architecture in various ways and find a way to couple together more than 4+4 cores efficiently without introducing bottlenecks or massive latency, but ... given what they've already done, that should be doable. Whatever silicon Apple makes for the next Mac Pro, it's looking like it'll be extremely impressive.
TheLostSwedeLuckily we now have proper tests.
Yes, it's very fast in single threaded workloads, no it's not what Apple claims overall.
No, you still don't want to game on Apple hardware.





www.anandtech.com/show/16252/mac-mini-apple-m1-tested/
I would really, really like to see what RotTR performance would look like if it was compiled for this architecture rather than run through a translation layer. Even if gaming performance is lacklustre overall, that is damn impressive.
Posted on Reply
#34
ZoneDymo
interesting comments to read, so passionate.

Personally I could not care less, by their own design/philosophy I dont have anything to do with Apple and probably never will.
They could make a low power mobile chip that is as fast as an RTX3090 and it would not affect me in the slightest, I have nothing to do with Apple.
Posted on Reply
#36
Easy Rhino
Linux Advocate
It not only outperforms intel by a large margin it does it with massive power efficiency. And just a note to everyone... don't choose sides. If you think Apple is any worse than Google or Microsoft as far as culture goes then you are sorely mistaken. They are all equally terrible.
Posted on Reply
#37
phanbuey
ValantarI would really, really like to see what RotTR performance would look like if it was compiled for this architecture rather than run through a translation layer. Even if gaming performance is lacklustre overall, that is damn impressive.
Agreed ...

This is the version 1 of a new/unsupported arch in some instances BEATING the next gen best. For a first attempt it's quite insane how fast this is.
Posted on Reply
#38
Vya Domus
ValantarAn 8-wide decode block compared to 4-wide in x86; a re-order buffer 2-3x the size of Intel and AMD's newest architectures, 4x/2x the FP/clock throughput of current Intel architectures/Zen 3, dramatically deeper load/store queues than any other architecture, L1 caches 6x the size of current X86 architectures (seemingly without a latency penalty!) ....and the list goes on.
All of which make it horrendously inefficient in terms of area and transistor budget in order to extract the same performance Intel and AMD do with much smaller cores and on an bigger node by the way, am I really the only one noticing that ? There is a very good reason practically nobody is making cores this wide, it's a scalability dead end, everyone figured this out in the late 90s.
ValantarAnd if you're trying to tell me that AMD and Intel could grow their L1 caches 6x without increasing latency by a single cycle, yet are choosing not to, then you need to provide some proof for that, because that's an outlandish thing to even suggest. If AMD and/or Intel could do what Apple is doing here to increase IPC without tanking performance through either killing power efficiency, introducing massive latency, etc., they would very clearly do so.
Of course they can and they will gradually make wider cores. The reason Apple can use a 128KB cache (which is 4 times larger not 6, and that is for the L1 Data cache not the Instruction cache) is because they use a minimum 16KB page and not 4KB hence a cache that is 4 times larger with 8-way associativity, that's all there is to it and I don't have to explain why a cache that is 4 time bigger with the same associativity is pretty terrible and inefficient. I have no idea why everyone thinks Apple is using some sort of magic fairy dust to make these things.

Edit: I got confused that was for a 128KB cache, for a 192KB one that is 6 times larger it's basically the same explanation, they can do it because of the 16KB page.
Posted on Reply
#39
TheLostSwede
News Editor
phanbueyAgreed ...

This is the version 1 of a new/unsupported arch in some instances BEATING the next gen best. For a first attempt it's quite insane how fast this is.
It's easy for Apple to squeeze out some extra performance though, as they control the compilers, the OS, the driver layer, the drivers and now the hardware.
No other company in the world has a complete in-house platform that can be tuned to this degree for optimal performance.
It gives them what one could almost call an unfair advantage.

However, we still have to seem them scale this, as right now we're looking at an iPad with a keyboard, on steroids.
This is not going to be a hardware solution that will be competitive in all aspects and so far we're barely scratching the surface, as all the benchmarks so far are somewhat limited.
Single core performance is no longer as important as it once was and judging by the benchmarks, it's running into problems keeping up once we go beyond the performance cores.

Not saying Apple did a bad job, I'm just not buying into all the hype, as Apple clearly oversold this when it was announced.
Yes, it's going to be good enough of a work computer for a lot of people, but it's clearly not for everyone.
Posted on Reply
#40
Valantar
Vya DomusAll of which make it horrendously inefficient in terms of area and transistor budget in order to extract the same performance Intel and AMD do with much smaller cores and on an bigger node by the way, am I really the only one noticing that ? There is a very good reason practically nobody is making cores this wide, it's a scalability dead end, everyone figured this out in the late 90s.
Nope, not the only one noticing that, and there's no doubt that these chips are really expensive compared to the competition. Sizeable silicon, high transistor counts, and a very expensive node should make for quite a high silicon cost. There's another factor to the equation though: Intel (not really, but not that far behind) and AMD are delivering the same performance with much smaller cores and on a bigger node but with several times the power consumption. That's a notable difference. Of course comparing a ~25W mobile chip to a 105W desktop chip is an unfair metric of efficiency, but even if Renoir is really efficient for X86, this still beats it.
Vya DomusOf course they can and they will gradually make wider cores. The reason Apple can use a 128KB cache is because they use a minimum 16KB page and not 4KB hence a cache that is 6 times larger with 8-way associativity, that's all there is to it and I don't have to explain why a cache that is 6 time bigger with the same associativity is pretty terrible and inefficient. I have no idea why everyone thinks Apple is using some sort of magic fairy dust to make these things.
Intel grew their L1 cache from 32k to 48k with Ice Lake, which caused its latency to increase from 4 to 5 cycles. Apple manages a cache 3x the size with 3/5 the latency. Regardless of associativity, Apple is managing something that nobody else is. Also, if it's such an inefficient design, how come they're beating every other architecture in efficiency, even when factoring in the node advantage? One would expect a "pretty terrible and inefficient" L1 cache to be rather harmful to overall SoC efficiency, right?

As for making wider cores: yes, they likely will, but this is actually an area where x86 is a real problem. To quote AT:
AnandTechOther contemporary designs such as AMD’s Zen(1 through 3) and Intel’s µarch’s, x86 CPUs today still only feature a 4-wide decoder designs (Intel is 1+4) that is seemingly limited from going wider at this point in time due to the ISA’s inherent variable instruction length nature, making designing decoders that are able to deal with aspect of the architecture more difficult compared to the ARM ISA’s fixed-length instructions.
So they can, but they would need to take a significant efficiency penalty, or find some way to mitigate this.
Posted on Reply
#41
Punkenjoy
The performance of the M1 in 5 nanometer is where i would expect it to be. I don't find it game breaking and they already implemented most of the tricks that x86 use. The architecture is getting mature and they aren't really way more performance than x86.

modern processors are just so much more complex than just the instruction set that in the end it do not really matter. at least for pure performance. for low power, x86 still seem to have a bit higher overhead...

But the thing is ARM is getting more powerful by using more transitors and more power. It's not the 4 watt cpu in your phone that is doing that...

I am not an apple fan, but i am glad they do something powerful because we need competitions. AMD is starting to compete again and there is a lot more performance gain each year than when Intel and Nvidia had 0 competition.

Good stuff indeed, good stuff...
Posted on Reply
#42
Chrispy_
TheLostSwedeIt's easy for Apple to squeeze out some extra performance though, as they control the compilers, the OS, the driver layer, the drivers and now the hardware.
This is an important point that I think people are overlooking.

The closest thing to compare that to in the PC space is game consoles vs PC gaming; Last Gen XBox hardware is pitiful by modern standards but if you took the equivalent Radeon R7 260 DDR3 that the XBox One has in it and tried to run the PC version of an Xbox One game on that R7 260, you'd be greeted by a low-res, low-quality slideshow.

Meanwhile the same hardware in the XBone is getting 1080p30 with improved graphics quality. That's the power of optimising and compiling for a single-purpose, single-spec platform.
Posted on Reply
#43
Nordic
I really want to see what they can pull of in a 30w, 60w, and 90w package. This is some cool stuff.
Posted on Reply
#44
TheoneandonlyMrK
Chrispy_This is an important point that I think people are overlooking.

The closest thing to compare that to in the PC space is game consoles vs PC gaming; Last Gen XBox hardware is pitiful by modern standards but if you took the equivalent Radeon R7 260 DDR3 that the XBox One has in it and tried to run the PC version of an Xbox One game on that R7 260, you'd be greeted by a low-res, low-quality slideshow.

Meanwhile the same hardware in the XBone is getting 1080p30 with improved graphics quality. That's the power of optimising and compiling for a single-purpose, single-spec platform.
Exactly and the 10Watts the M1 pulls is also shared across the entire Soc so the CPU and GPU performance will be heavily compromised during actual gaming for example or any great use of both at the same time, where's the performance at then?!.
Posted on Reply
#45
Vya Domus
ValantarAlso, if it's such an inefficient design, how come they're beating every other architecture in efficiency, even when factoring in the node advantage? One would expect a "pretty terrible and inefficient" L1 cache to be rather harmful to overall SoC efficiency, right?
Clock speed and voltage, it's that simple. L1 caches basically have to run close to the clock speed of the core so their power scales right along with it (badly), that being said at 3 Ghz it's kept in check, for now. It's still inefficient though considering the performance gains from having up to 6 times more memory, you can bet the hit rate didn't go up by 600%. That's why L3 caches grew so much larger over the years because they don't have to scale along side with the cores themselves and why large L1 caches are avoided like the plague even outside x86.
ValantarThere's another factor to the equation though: Intel (not really, but not that far behind) and AMD are delivering the same performance with much smaller cores and on a bigger node but with several times the power consumption.
Of course because again, Intel and AMD design their cores with the goal of being able to fit as many as possible on a single package, hence smaller but higher clocked cores which are also less power efficient.

But there is another company which proves my point more than anyone else, ARM themselves, they're also close to extracting similar performance out of cores that are even smaller and consume way less power and area than Apple's. I'll maintain my opinion that Apple's approach is the wrong one long term.
Posted on Reply
#46
TheLostSwede
News Editor
theoneandonlymrkExactly and the 10Watts the M1 pulls is also shared across the entire Soc so the CPU and GPU performance will be heavily compromised during actual gaming for example or any great use of both at the same time, where's the performance at then?!.
Let's also not forget that the memory is shared between all the parts inside the SoC, which might affect performance negatively in some scenarios as well.
NordicI really want to see what they can pull of in a 30w, 60w, and 90w package. This is some cool stuff.
Why would Apple ever go as high as 90W? We might see some 15-25W parts next, but I doubt we'll ever see anything in the 90W range from Apple.
Posted on Reply
#47
hurakura
The marketing is strong in this one
Posted on Reply
#48
Sandbo
Ashtr1xM1 won't do anything, Apple users will buy their BGA riddled walled garden Macs no matter what, and Windows machines will be sold as they are, people need GPUs and OS supporting their Software requirements, until Apple catches up to AMD or Nvidia that day is not going to come.
Just to share another side of the story, I have ordered an M1 Pro as my first mac, which I otherwise use Windows/Ubuntu at home and work for almost 3 decades.
What interested me in is its promising performance (if software transitions well) coupled with impressive looking battery life (guess it needs to be tested).
While I need my GPU to play ray-traced games, that's on my desktop; my laptop could be something else and now I am motivated to give it a shot.
TheLostSwedeLet's also not forget that the memory is shared between all the parts inside the SoC, which might affect performance negatively in some scenarios as well.


Why would Apple ever go as high as 90W? We might see some 15-25W parts next, but I doubt we'll ever see anything in the 90W range from Apple.
Maybe one day they scale up M1 to something that can compete with a 5950X?
Not sure if it is technically possible with their design, but ARM definitely is picking up some momentum lately.
Posted on Reply
#49
TheLostSwede
News Editor
SandboMaybe one day they scale up M1 to something that can compete with a 5950X?
Not sure if it is technically possible with their design, but ARM definitely is picking up some momentum lately.
Oh, I have no doubt they'll improve the performance, but Wattage doesn't equal performance.
Depending on what Apple's plan is, they're obviously going to have to scale the performance upwards.
How they'll do this, I guess we're going to have to wait and see.
Posted on Reply
#50
Aquinus
Resident Wat-man
I think people are forgetting that this 10w chip is competing with chips that have TDPs as high as 35-45 watts. Come on, let's gain a little bit of perspective here. It's not the best, but it's pretty damn good for what it is. If this is what Apple can do with a 10w power budget, imagine what they can do with 45 watts.
Posted on Reply
Add your own comment
Nov 21st, 2024 12:52 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts