# Riding on the Success of the M1, Apple Readies 32-core Chip for High-end Macs



## btarunr (Dec 7, 2020)

Apple's M1 SoC is possibly the year's biggest semiconductor success story, as the chip has helped Apple begin its transition away from Intel's x86 machine architecture, and create its own silicon that's optimized for its software and devices; much like its A-series SoCs powering iOS devices. The company now plans to scale up this silicon with a new 32-core version designed for high-performance Mac devices, such as the fastest MacBook Pro models; and possibly even iMac Pros and Mac Pros. The new silicon could debut in a new-generation Mac Pro in 2022. Bloomberg reports that the new silicon will allow this workstation to be half the size of the current-gen Mac Pro workstation in form, while letting Apple keep its generational performance growth trajectory. 

In addition, Apple is reportedly developing a 16-core "big" + 4 "small" core version of the M1, which could power more middle-of-the-market Macs, such as the iMac desktop, and the bulk of the MacBook Pro lineup. The 16B+4s core chip could debut as early as Spring 2021. Elsewhere, the company is reportedly stepping up efforts to develop its own high-end professional-visualization GPU that it can use in its iMac Pro and Mac Pro workstations, replacing the AMD Radeon Pro solutions found in the current generation. This graphics architecture will be built from the ground-up for the Metal 3D graphics API, as well as a parallel compute accelerator. Perhaps the 2022 debut of the Arm-powered Mac Pro could feature this GPU.





*View at TechPowerUp Main Site*


----------



## Vya Domus (Dec 7, 2020)

They're likely gonna need a node beyond 5nm, the big core cluster on their current chip is huge, 32 cores would mean a ridiculously large chip not to mention that they would also probably need to increase the size of that system cache. Of course some of those cores could be like the ones inside the small core cluster so it would be "32 core" just in name really.


----------



## DeathtoGnomes (Dec 7, 2020)

Vya Domus said:


> They're likely gonna need a node beyond 5nm, the big core cluster on their current chip is huge, 32 cores would mean a ridiculously large chip not to mention that they would probably need to increase the size of that system cache a lot. Of course some of those cores could be like the ones inside the small core cluster so it would be "32 core" just in name really.



As long as they can cool it efficiently, ( incoming pun  ) the size doesnt matter.


----------



## Vya Domus (Dec 7, 2020)

DeathtoGnomes said:


> the size doesnt matter.



In this case it does, it might literately not be feasible on 5nm.


----------



## ThrashZone (Dec 7, 2020)

Hi,
Apple saying highend mac well looking at 20k.us here or what


----------



## Nordic (Dec 7, 2020)

I am excited to see what performance they get.


----------



## Mats (Dec 7, 2020)

DeathtoGnomes said:


> As long as they can cool it efficiently, ( incoming pun  ) the size doesnt matter.


It does, otherwise those dies will become more expensive, because production will have much lower yields.

Splitting up into multiple dies makes production easier in many ways, like Epyc/Threadripper/Ryzen.


----------



## dragontamer5788 (Dec 7, 2020)

Vya Domus said:


> They're likely gonna need a node beyond 5nm, the big core cluster on their current chip is huge, 32 cores would mean a ridiculously large chip not to mention that they would also probably need to increase the size of that system cache. Of course some of those cores could be like the ones inside the small core cluster so it would be "32 core" just in name really.



Yeah. For record: the M1 is 16-billion transistors for 4-big core + 4-little core + iGPU + Neural Engine + SOC. Renoir (Zen 2) is 8-core + iGPU + SOC for 10-billion transistors.

big.LITTLE doesn't seem to do much for high-end compute: you pretty much always want more big cores if you have a difficult task (like CPU-based Raytracing) running. big.LITTLE style has advantages in long-running compute however: maybe the LITTLE cores can feed a GPU in a GPU-heavy situation (GPU-based raytracing??).

-------

Hmmmm... I think I'm pretty open about my fanboyism towards SIMD-compute. The M1 is very disappointing from the SIMD-perspective: just 128-bit wide. Even if they extend up to 256-bits, those M1 cores are utterly huge compared to Zen. It seems unlikely that they can offer as much parallelism in a SIMD situation. But I'm willing to be proven wrong on this front.


----------



## qcmadness (Dec 7, 2020)

They will probably need to fix the transistor count issue for a 32-core M1.


----------



## xkm1948 (Dec 7, 2020)

First in house CPU now in house GPU. Apple is looking to ditch the entire backbone of current PC industry. Feels like going back in time TBH when Apple closed off everything. If the performance is there it can totally justify Apple doing so. Weird when consoles from MS and Sony look like PCs and Mac turning into Console like closing off.

Maybe that is what Intel  / AMD / Nvidia need to come out with ever better products. Apple has the advantage of positive consumer perception (among its fans).


----------



## ThrashZone (Dec 7, 2020)

xkm1948 said:


> First in house CPU now in house GPU. Apple is looking to ditch the entire backbone of current PC industry. Feels like going back in time TBH when Apple closed off everything. If the performance is there it can totally justify Apple doing so. Weird when consoles from MS and Sony look like PCs and Mac turning into Console like closing off.
> 
> Maybe that is what Intel  / AMD / Nvidia need to come out with ever better products. Apple has the advantage of positive consumer perception (among its fans).


Hi,
Unfortunately apple products are stupid expensive.


----------



## Vya Domus (Dec 7, 2020)

ThrashZone said:


> Unfortunately apple products are stupid expensive.



And they are segregated to a specific market, that's why it wont matter much to Intel/AMD/Nvidia/ARM.


----------



## xkm1948 (Dec 7, 2020)

ThrashZone said:


> Hi,
> Unfortunately apple products are stupid expensive.




Well they have not raised price for their M1 based SKU over their Intel offerings.


----------



## bug (Dec 7, 2020)

> Bloomberg reports that the new silicon will allow this workstation to be half the size of the current-gen Mac Pro workstation in form...


How clueless must you be, not to realize the CPU does not eat half the space in the chassis? Remove it altogether, remove the cooling that goes with it and you're still not looking at 50% space savings.


----------



## techisfun (Dec 7, 2020)

AMD Ryzen 5000 = Evolutionary
Apple M1 = Revolutionary

The M1 has made it obvious that ARM will dominate the desktop within the next 10 years. The average desktop user doesn't need an x86 CPU. x86 users are about to become a niche market. Efficiency has clearly won out over flexibility.

In terms of efficiency, the M1 is currently the best CPU and GPU on the market. Apple's more powerful chips will also be more efficient than any competing products.

A lot of PC users are already in denial over the M1's superiority, and they'll stay that way for a long time because they're stupid.


----------



## kapone32 (Dec 7, 2020)

techisfun said:


> AMD Ryzen 5000 = Evolutionary
> Apple M1 = Revolutionary
> 
> The M1 has made it obvious that ARM will dominate the desktop within the next 10 years. The average desktop user doesn't need an x86 CPU. x86 users are about to become a niche market. Efficiency has clearly won out over flexibility.
> ...


You have an interesting way of putting that Apple will price their tech to the point of refusal for any sane PC user.


----------



## dyonoctis (Dec 7, 2020)

techisfun said:


> AMD Ryzen 5000 = Evolutionary
> Apple M1 = Revolutionary
> 
> The M1 has made it obvious that ARM will dominate the desktop within the next 10 years. The average desktop user doesn't need an x86 CPU. x86 users are about to become a niche market. Efficiency has clearly won out over flexibility.
> ...


Hello my friend, how was your day ? Nothing like name calling on Monday am I right ? 

On a side note, Apple is currently the only one doing high performance mainstream ARM cpu, Qualcomm don't have the money to follow up on Apple, and windows on ARM is still a dodgy OS. You should really avoid to make a comparison between a closed and vertically integrated system, and an open system where every actor need to do his part for it to "work". 

Right windows and ARM doesn't look good because nobody seems a to have a solid plan to make a smooth transition, both on the software and hardware side. Qualcomm, AMD, or whever else would need to invest hard on ARM research. Don't forget that Apple is the most profitable company in the world with a gigantic R&D budget.


----------



## ThrashZone (Dec 7, 2020)

Hi,
Yeah not like apple is going to open the os so performance well as long as you can do what you want through the apple store you're all set lol


----------



## SamuelL (Dec 7, 2020)

techisfun said:


> AMD Ryzen 5000 = Evolutionary
> Apple M1 = Revolutionary
> 
> The M1 has made it obvious that ARM will dominate the desktop within the next 10 years. The average desktop user doesn't need an x86 CPU. x86 users are about to become a niche market. Efficiency has clearly won out over flexibility.
> ...



Something being revolutionary or not can only be determined in hindsight. The M1 may end up as that, but I have doubts and it’s impact on the market won’t be known for some time. It’s impressive on an efficiency vs performance standpoint and could be argued as the “best” ARM-based cpu + gpu silicon (requires some convoluted categorization to make claims beyond that).

On the other hand, I think the larger chips (like the hypothetical 32 core) could be revolutionary if they scale linearly and best the existing HEDT CPUs at real tasks. In that situation I could see Apple workstations (and maybe servers too!) and ARM becoming mainstream instead inhabiting relative niches in computing.

Anyways, exciting possibilities and I’m looking forward to seeing what Apple delivers in the future. In the meantime, let’s all remember that Apple’s marketing is just that while wait.


----------



## Hattu (Dec 7, 2020)

Are there any good reviews of those recent M1 laptops? Or aren't they not available yet? Saw a few benchmarks, but haven't really been searching lately.

I'm not sure my next laptop will be MacBook, but my MBP from late 2013 was the best decision at that time for my use. Got fed up with Windows, i lost a lot of data when it crashed. And yes, i had backups, but not all folders got marked manually. But that's another story...

Interesting to see where all this goes.


----------



## techisfun (Dec 7, 2020)

kapone32 said:


> You have an interesting way of putting that Apple will price their tech to the point of refusal for any sane PC user.



The M1 Macs have made PCs overpriced junk IMHO. You can't buy a PC that's as efficient as an M1 Mac because they don't exist, yet.

I think M-based Macs are going to be more efficient and better values than PCs for a long time.


----------



## sepheronx (Dec 7, 2020)

Holy Molly, Techisfun is copy + Pasting comments from his last tirade.

Anyway, I wonder what the end cost will be.  MAC's are usually way over priced.  But I am enticed to buy a Mac Mini with one of these new M1 processors.


----------



## TumbleGeorge (Dec 7, 2020)

techisfun said:


> AMD Ryzen 5000 = Evolutionary
> Apple M1 = Revolutionary


LoL. Moore's low is not death for Apple and too thin nm lithography doesn't matter? I think this "revolution" is just storm in a glass. Furious and ... short.


----------



## Lew Zealand (Dec 7, 2020)

This SOC is useless for most of the computing world until someone licenses it from Apple to be used in a PC, MS writes a full version of Windows for it, *and* includes a good Rosetta-like emulator for x86/x64.  Apple's advantage is they have experience doing hardware and code-based transitions, having done 3 of them now. MS has sorta done one, with Win98 - XP.

It's pretty clear that the M1 outcompetes any similar-tier Intel and likely Ryzen CPUs as the many reviews online note that most Intel apps run as fast or notably faster in bloody _emulation _on these M1s.  That's both due to the SOC design and good programming in Rosetta.  This can also be done on the PC side as MS also has the deep pockets to do it, but it'll take a long time as they lack the experience. 

And let's be honest, they also need the motivation to do so.  I doubt Apple will win many new customers with even notably superior hardware as how many people will switch from a low price laptop/small PC to a more expensive Mac, and that's even before considering the sunk cost fallacy.


----------



## Punkenjoy (Dec 7, 2020)

techisfun said:


> AMD Ryzen 5000 = Evolutionary
> Apple M1 = Revolutionary
> 
> The M1 has made it obvious that ARM will dominate the desktop within the next 10 years. The average desktop user doesn't need an x86 CPU. x86 users are about to become a niche market. Efficiency has clearly won out over flexibility.
> ...



The "Revolution" factor is not the cpu architecture itself or the CPU performance that is excellent in single thread but behind in multi thread. The real change there is it's the First ARM cpu that was design for Desktop/Laptop use and not a Phone SOC that they try to upscale to be used in a laptop. This show to the industry and the public that an Arm cpu can compete with x86. 

The performance, Single thread mostly is impressive, but it's the first 5nm CPU on the market. We will see how it compare to 5nm AMD and 7 nm Intel CPU if they get out. Apple had to suffer Intel Fabs issue and now they got their way out of these and are able to use the top process in the world

Still the revolution is the image that arm can be top cpu (some other arm architecture already show that it can really be performant like the Fujistu A64FX). Even rumors now have AMD to resurect K12 to benefits from the trend. 

There are many things that are impactful with the M1, but the most interesting one is they added in the core front end hardware decode of many x86 instruction to improve the speed they run x86 code in Rosetta 2. This can have a very huge impact on the future as we might see other manufacturer doing similar things and even become at some point able to natively run different ISA. 

But this is something that is still unclear if Intel will let that go or if Apple got the license in their Deals with Intel. Other vendors tried to do similar things and they got lawsuit from Intel so time will tell. But AMD or what left of Via, could do that without too much trouble.


----------



## v12dock (Dec 7, 2020)

techisfun said:


> AMD Ryzen 5000 = Evolutionary
> Apple M1 = Revolutionary
> 
> The M1 has made it obvious that ARM will dominate the desktop within the next 10 years. The average desktop user doesn't need an x86 CPU. x86 users are about to become a niche market. Efficiency has clearly won out over flexibility.
> ...



I agree 100% I think x86 is around to stay but it will be for gaming/HPC. Every day ARM CPUs will have more than enough horsepower for your day to day task. I just got an 13" 11th gen Intel laptop and I was so tempted to get an Macbook Air with the M1 simply because the battery life is OUTSTANDING and it can do anything I need to do on the X86 platform. It's also just apple who is making massive strides in ARM. Qualcomm but be a year behind but they have no problem parrying Apple when it comes to ARM. I will be very interested in an ARM based Windows platform in the future (assuming Windows is ready).


----------



## Vya Domus (Dec 7, 2020)

v12dock said:


> Every day ARM CPUs will have more than enough horsepower for your day to day task.



I have literally heard this for over a decade now, it's always in conjunction with something else. Netbooks were supposed to be enough for everyday tasks, tablets, chromebooks, etc, all supposedly powered by cheap efficient SoCs yet God knows how many millions of laptops with dedicated x86 CPU and GPUs continue being shipped every year.


----------



## dragontamer5788 (Dec 7, 2020)

So... I know that audio-professionals are super-interested in low-latency "big-cores". (Mostly because, I believe, those DSP programmers don't know how to take advantage of GPUs quite yet. GPUs are being used in 5GHz software-defined radios, I'm pretty sure your 44.1kHz audio-filters are easier to process... but I digress). Under *current* audio-programming paradigms, a core like the M1 is really, really good. You've got HUGE L1 cache to hold all sorts of looped-effects / reverb / instruments / blah blah blah, and you don't have any complicated latency issues to deal with. (A GPU would have microseconds of delay per kernel. It'd take an all-GPU design to negate the latency issue: more effort than current audio-engineers seem to have).

So I think a 32-core M1 probably would be a realtime audio-engineer's best platform. At least, until software teams figure out that 10TFlops of GPU-compute is a really good system to perform DSP-math on + rejig their kernels to work with the GPU's latency. (Microseconds of latency per kernel: small enough that its good for real-time audio, but you don't really have much room to play around with. It'd have to be optimized to just a few dozen kernel invocations to match the ridiculous latency requirements that musicians have)


----------



## InVasMani (Dec 7, 2020)

Vya Domus said:


> I have literally heard this for over a decade now, it's always in conjunction with something else. Netbooks were supposed to be enough for everyday tasks, tablets, chromebooks, etc, all supposedly powered by cheap efficient SoCs yet God knows how many millions of laptops with dedicated x86 CPU and GPUs continue being shipped every year.


 Very true someone will always want as similar as possible desktop performance in a portable device even though they never quite achieve it in the same sense of the meaning. Still today's desktop is tomorrows laptop.


----------



## z1n0x (Dec 7, 2020)

Assuming this is true, Apple aren't playing around. This could have consequences for the entire industry.
Of course big changes doesn't happen over night, and big software ecosystem is one off those things that's particularly slow to change.
Windows on ARM, Nvidia's bid for ARM, there're rumors of AMD working on ARM design. Intel?
x86 vs ARM. Interesting times ahead.


----------



## Vya Domus (Dec 7, 2020)

dragontamer5788 said:


> So... I know that audio-professionals are super-interested in low-latency "big-cores". (Mostly because, I believe, those DSP programmers don't know how to take advantage of GPUs quite yet. GPUs are being used in 5GHz software-defined radios, I'm pretty sure your 44.1kHz audio-filters are easier to process... but I digress). Under *current* audio-programming paradigms, a core like the M1 is really, really good. You've got HUGE L1 cache to hold all sorts of looped-effects / reverb / instruments / blah blah blah, and you don't have any complicated latency issues to deal with. (A GPU would have microseconds of delay per kernel. It'd take an all-GPU design to negate the latency issue: more effort than current audio-engineers seem to have).



I don't know if that's quite true, the larger the core, the worse the latency is because of all that front end pre-processing to figure out the best scheme to execute the micro-ops. If you want low latency you need a processor as basic as possible with a short pipeline, M1 is or will be good probably because of dedicated DSPs.


----------



## dragontamer5788 (Dec 7, 2020)

Vya Domus said:


> I don't know if that's quite true, the larger the core, the worse the latency is because of all that front end pre-processing to figure out the best scheme to execute the micro-ops. If you want low latency you need a processor as basic as possible with a short pipeline, M1 is or will be good probably because of dedicated DSPs.



Assuming 44.1kHz, you have 22-microseconds to generate a sample. That's your hard limit: 22-microseconds per sample. A CPU task-switch is on the order of ~10-microseconds. Reading from SSD is ~1-microsecond (aka: 100,000 IOPS). Talking with a GPU is ~5 uS. Etc. etc. You must deliver the sample otherwise the audio will "pop", and DJ's don't like that. You can batch-samples up together into 44 to 88 sample chunks (1ms to 2ms "delivered" to the audio driver) at a time, but if you go too far beyond that, you'll start to incur latency and DJ's also don't like that.

So we're not talking about nanosecond-level latency (where microarchitecture decisions matter). There's still 22,000 nanoseconds per sample after all. But it does mean that if you fit inside of L1 vs L2, or maybe L2 vs L3... those sorts of things really matter inside  the hundreds-of-microseconds timeframe.

Audio programs live within that area: from ~20 microseconds to 1000-microseconds range. Some things (ex: micro-op scheduling) are too fast: micro-op scheduling changes things at the 0.0005 microsecond (or half-a-nanosecond) level. That's not going to actually affect audio systems. Other things (ex: 5uS per GPU kernel invocation) are serious uses of time and need to be seriously considered and planned around. (Which is probably why no GPU-based audio software exists yet: that's cutting it close and it'd be a risk)

-------

The 128kB L1 cache of Apple's M1 means that L1 cache fits the most "instrument data" (or so I've been told). I'm neither an audio-engineer, nor an audio-programmer, nor audio-user / musician / DJ or whatever. But when I talk to audio-users, those are the issues they talk about.


----------



## Camm (Dec 8, 2020)

If they can, thats cool. But I don't know many users (including many ex Mac users) that want to pay Apple prices for the tier of performance at that price (and I highly doubt Apple wants to give up their margins).


----------



## Mats (Dec 8, 2020)

Here's a review.








						Apple MacBook Air 2020 M1 Entry Review: Apple M1 CPU humbles Intel and AMD
					

The M1 based entry-level Macbook in our extensive technical test.




					www.notebookcheck.net


----------



## timta2 (Dec 8, 2020)

and another:









						Apple’s M1 MacBook Air has that Apple Silicon magic
					

Review: No piece of hardware is perfect, but man, this is a hell of a laptop.




					arstechnica.com


----------



## InVasMani (Dec 8, 2020)

dragontamer5788 said:


> Assuming 44.1kHz, you have 22-microseconds to generate a sample. That's your hard limit: 22-microseconds per sample. A CPU task-switch is on the order of ~10-microseconds. Reading from SSD is ~1-microsecond (aka: 100,000 IOPS). Talking with a GPU is ~5 uS. Etc. etc. You must deliver the sample otherwise the audio will "pop", and DJ's don't like that. You can batch-samples up together into 44 to 88 sample chunks (1ms to 2ms "delivered" to the audio driver) at a time, but if you go too far beyond that, you'll start to incur latency and DJ's also don't like that.
> 
> So we're not talking about nanosecond-level latency (where microarchitecture decisions matter). There's still 22,000 nanoseconds per sample after all. But it does mean that if you fit inside of L1 vs L2, or maybe L2 vs L3... those sorts of things really matter inside  the hundreds-of-microseconds timeframe.
> 
> ...


 All the sampling I need. The only thing I'd gripe about is the sampling rate is a mere 44.1kHz CD quality I'm not sure how I can live with such lofi sound.


----------



## z1n0x (Dec 8, 2020)

timta2 said:


> and another:
> 
> 
> 
> ...


Is this a review or commercial? I'm having a hard time figuring out.


----------



## R0H1T (Dec 8, 2020)

For 32 "big" cores, heck 16 cores & above, what they'll need is something closer if not better than *IF*. As Intel have found out they don't grow them *glue* on trees anymore 
Apple better have something, really anything similar otherwise it's going to be a major issue no matter how or where their top of the line chips end up!


----------



## PowerPC (Dec 8, 2020)

Vya Domus said:


> They're likely gonna need a node beyond 5nm, the big core cluster on their current chip is huge, 32 cores would mean a ridiculously large chip not to mention that they would also probably need to increase the size of that system cache. Of course some of those cores could be like the ones inside the small core cluster so it would be "32 core" just in name really.


Why don't you go away? Every time there is news or leaks about Apple's progress, you're in the first post sh***ing on it. I have never seen you give them even an inch of slack without mocking them in the same sentence. So I have to assume you must be a troll or just an extremely irrational Apple hater.

Apple already proved that their own cores can stand up to those of Intel and AMD and even surpass them in some applications. This successor to the M1 will definitely give Apple computers an enormous performance-boost. This new chip is planned to be for High-End-Desktop mind you. So this is almost definitely for the new Mac Pro that is supposed to come out in 2022. Bloomberg already talked about it the next day Apple revealed the M1 for the first time last month.

Furthermore: The GPU cores in these future High-End-MacBooks and iMacs are supposed to go up in count massively. The iGPU of the M1 has just 7 or 8 cores depending on the model. Now they're talking about an increase to 64 or even 128 GPU cores. These chips are going to beat all the AMD and Nvidia dedicated GPUs that Apple is currently offering. This is written in the actual source for this news post.

"Apple Preps Next Mac Chips With Aim to *Outclass Top-End PCs*"
https://www.bloomberg.com/news/arti...hest-end-pcs?srnd=technology-vp&sref=51oIW18F

And let's just not ignore what's going to happen. Even if they have to slash all their old prices to win over hearts, they will do it. Apple is already doing it with the first iteration of notebooks. They want to increase the demand as much as possible and generating hype with low prices and deals, is like Apple's joker card they have never really needed to use. Until now. They have everything lined up for the big win here. All Apple. Total domination and control of the market by one company. This is Apple's dream. If they want demand to go up by as much as I think they are aiming for here, prices will go down. It's that simple. I don't think Apple is trying to just compete with Intel and AMD. The goal here is obviously to crush them. The first iteration of M1 already showed that in some sense. And they would be stupid to let down now.


----------



## R0H1T (Dec 8, 2020)

You mean a 32 core CPU+*128 core GPU* will beat say the *A100* 80GB outright? Yeah even if its Apple I doubt they'll pull that off, first of all the cooling on such a chip will have to be extreme level unless they're clocking & actively limiting both the components to unrealistically low levels!


----------



## PowerPC (Dec 8, 2020)

R0H1T said:


> You mean a 32 core CPU+*128 core GPU* will beat say the *A100* 80GB outright? Yeah even if its Apple I doubt they'll pull that off, first of all the cooling on such a chip will have to be extreme level unless they're clocking & actively limiting both the components to unrealistically low levels!


I was talking about consumer cards. I hope for maybe a 16 core CPU+64 core GPU iMac with better cooling. That should easily have 4K performance on most games above 60 fps even on high settings. The 128 core GPU in just 1 or 2 more years almost sounds too good to be true.


----------



## goodeedidid (Dec 13, 2020)

xkm1948 said:


> First in house CPU now in house GPU. Apple is looking to ditch the entire backbone of current PC industry. Feels like going back in time TBH when Apple closed off everything. If the performance is there it can totally justify Apple doing so. Weird when consoles from MS and Sony look like PCs and Mac turning into Console like closing off.
> 
> Maybe that is what Intel  / AMD / Nvidia need to come out with ever better products. Apple has the advantage of positive consumer perception (among its fans).



Apple has the advantage of optimizing almost every aspect of their systems vertically, which no other company does and this is why they are going to make the best performing AI systems, regardless of their fans. M1 is a fact.


----------



## Vya Domus (Dec 13, 2020)

goodeedidid said:


> this is why they are going to make the best performing AI systems, regardless of their fans.



That's particularity hilarious because apparently not even Apple themselves think their machine learning accelerators are good. Check this out : https://developer.apple.com/documentation/coreml/mlcomputeunits 

You can use the GPU and CPU explicitly but not the NPU, you can only vaguely let the API "decide". If it was that good, why don't they let people use it ? Hint : It probably isn't that good.


----------



## Aquinus (Dec 13, 2020)

Vya Domus said:


> That's particularity hilarious because apparently not even Apple themselves think their machine learning accelerators are good. Check this out : https://developer.apple.com/documentation/coreml/mlcomputeunits
> 
> You can use the GPU and CPU explicitly but not the NPU, you can only vaguely let the API "decide". If it was that good, why don't they let people use it ? Hint : It probably isn't that good.


Looks pretty decent with TensorFlow, even though support is only in Alpha. Maybe we need the software to mature a bit, but it sounds capable enough.









						M1 Mac Mini Scores Higher Than My NVIDIA RTX 2080Ti in TensorFlow Speed Test.
					

The two most popular deep-learning frameworks are TensorFlow and PyTorch. Both of them support NVIDIA GPU acceleration via the CUDA…




					medium.com
				












						Accelerating TensorFlow Performance on Mac
					

Accelerating TensorFlow 2 performance on Mac




					blog.tensorflow.org


----------



## Vya Domus (Dec 13, 2020)

Aquinus said:


> Looks pretty decent with TensorFlow, even though support is only in Alpha. Maybe we need the software to mature a bit, but it sounds capable enough.
> 
> 
> 
> ...



You'd get the same results on any half decent integrated GPU (apart from Intel's I guess but shouldn't surprise anyone). The only reason it runs fast when the data is small is not because the GPU itself is amazing it's simply because you don't need to wait for the data to be transferred across the PCIe connection since it's using the same pool of memory with the rest of the system (and some pretty large caches). When the data set grows in size that becomes less and less important and M1 GPU gets crushed, not to mention that the 2080ti isn't even the fastest card around anymore. Anyway, GPUs are GPUs, not much differs between them. I am sure that a dedicated GPU of theirs with a million billion cores will be faster, it's really just a matter of who can make the biggest GPU.

I was talking about the actual ML accelerator which Apple chose to not explicitly expose, that's a sign they're not that confident in the one thing that could really set them apart. If you can't chose the NPU in their own API, I don't think TensorFlow will get support for that any time soon.










This is guy is trying to get it to run code arbitrarily and let's just say Apple goes out of their way to make that really, really hard.


----------



## Aquinus (Dec 14, 2020)

Vya Domus said:


> was talking about the actual ML accelerator which Apple chose to not explicitly expose, that's a sign they're not that confident in the one thing that could really set them apart. If you can't chose the NPU in their own API, I don't think TensorFlow will get support for that any time soon.


Maybe there is a reason why they haven't exposed it. Maybe they've exposed the parts you need to know. I watched some of this video and the thing that strikes me is that the guy doesn't know Swift and that he's trying to used disassembled APIs to interact with it and a lot of the calls he was looking at (at least in the part I watched,) were things I would expect the kernel to handle. With that said, I get the distinct impression that this is 4 hours of a guy trying to figure out the platform he's working on.

I've taken a brief look at Apple documentation and he seems to be making it way harder than it has to be. Apple has simplified a lot of parts of model processing which is why the API is so thin. I suspect that between not understanding the platform, or Swift, while trying to reverse engineering system level calls, is probably going down the wrong rabbit hole.


Vya Domus said:


> You'd get the same results on any half decent integrated GPU (apart from Intel's I guess but shouldn't surprise anyone). The only reason it runs fast when the data is small is not because the GPU itself is amazing it's simply because you don't need to wait for the data to be transferred across the PCIe connection since it's using the same pool of memory with the rest of the system (and some pretty large caches). When the data set grows in size that becomes less and less important and M1 GPU gets crushed, not to mention that the 2080ti isn't even the fastest card around anymore.


Do I need to remind you that the M1 is literally Apple's entry level chip for the laptop/desktop market? Seems to do pretty well for an entry level product.


----------



## Vya Domus (Dec 14, 2020)

Aquinus said:


> Maybe there is a reason why they haven't exposed it. Maybe they've exposed the parts you need to know. I watched some of this video and the thing that strikes me is that the guy doesn't know Swift and that he's trying to used disassembled APIs to interact with it and a lot of the calls he was looking at (at least in the part I watched,) were things I would expect the kernel to handle. With that said, I get the distinct impression that this is 4 hours of a guy trying to figure out the platform he's working on.



They've exposed nothing, that's the point. He's trying to get the NPU to always execute the code he wants which Apple does not allow, that's the problem he's trying to solve, using the API calls is useless since it will always fallback to GPU or CPU and you have no control over that.


----------



## Aquinus (Dec 14, 2020)

Vya Domus said:


> They've exposed nothing, that's the point. *He's trying to get the NPU to always execute the code he wants which Apple does not allow*, that's the problem he's trying to solve, using the API calls is useless since it will always fallback to GPU or CPU and you have no control over that.


Without knowing more about how Apple implemented the hardware, it's hard to say, but there very well could be reasons for that. It could be very plausible that the AI circuitry consumes a lot more power than the CPUs or GPUs. It could be power management that dictates where it's run. Perhaps there are thermal reasons for it, or memory pressure reasons, or task complexity reasons. Maybe multiple tasks are running. Maybe it's a laptop that is in a low power mode and it's forcing it into the low power CPU cores compared to maybe a Mac Mini which would schedule the work differently with fewer power and thermal limitations. Apple probably suspects that they can better choose where the code needs to run than the developer and that the software shouldn't be specifically tied to a particular hardware implementation either.

As a software engineer, when I see something like this, it makes me think that it was done for a reason, not just for the sake of blackboxing everything. I know that Apple tends to do that, but they do that when they think they can do it better for you. Honestly, that's not a bad thing. I don't want to have to think about what part of the SoC is best going to run my code for the state that the machine is currently in. That's a decision best made for the OS in my opinion, particularly when you're tightly integrating all of the parts of a pretty complicated SoC like Apple is doing with their chips.


----------



## Atnevon (Dec 31, 2020)

Great news but it also makes me glad I got the last of the Windows supported MB Pros right when the 5600m launched. 

It's no gaming powerhouse but a nice way to have a little near-all-in-one of both worlds and flexibility on the go.

While I wish Windows support were there sadly I bet GPU support would be there too. It's interesting to see where the scene is inna years time. I was skeptical of the m1 bit glad to see positive traction.

(And before folks ask: there's no other way to work in Sketch and flip over to relaxing by blasting-faces in Borderlands 3.)


----------

