• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Riding on the Success of the M1, Apple Readies 32-core Chip for High-end Macs

v12dock

Block Caption of Rainey Street
Supporter
Joined
Dec 18, 2008
Messages
1,980 (0.34/day)
AMD Ryzen 5000 = Evolutionary
Apple M1 = Revolutionary

The M1 has made it obvious that ARM will dominate the desktop within the next 10 years. The average desktop user doesn't need an x86 CPU. x86 users are about to become a niche market. Efficiency has clearly won out over flexibility.

In terms of efficiency, the M1 is currently the best CPU and GPU on the market. Apple's more powerful chips will also be more efficient than any competing products.

A lot of PC users are already in denial over the M1's superiority, and they'll stay that way for a long time because they're stupid.

I agree 100% I think x86 is around to stay but it will be for gaming/HPC. Every day ARM CPUs will have more than enough horsepower for your day to day task. I just got an 13" 11th gen Intel laptop and I was so tempted to get an Macbook Air with the M1 simply because the battery life is OUTSTANDING and it can do anything I need to do on the X86 platform. It's also just apple who is making massive strides in ARM. Qualcomm but be a year behind but they have no problem parrying Apple when it comes to ARM. I will be very interested in an ARM based Windows platform in the future (assuming Windows is ready).
 
Joined
Jan 8, 2017
Messages
9,499 (3.27/day)
System Name Good enough
Processor AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard ASRock B650 Pro RS
Cooling 2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory 32GB - FURY Beast RGB 5600 Mhz
Video Card(s) Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage 1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s) LG UltraGear 32GN650-B + 4K Samsung TV
Case Phanteks NV7
Power Supply GPS-750C
Every day ARM CPUs will have more than enough horsepower for your day to day task.

I have literally heard this for over a decade now, it's always in conjunction with something else. Netbooks were supposed to be enough for everyday tasks, tablets, chromebooks, etc, all supposedly powered by cheap efficient SoCs yet God knows how many millions of laptops with dedicated x86 CPU and GPUs continue being shipped every year.
 
Joined
Apr 24, 2020
Messages
2,721 (1.60/day)
So... I know that audio-professionals are super-interested in low-latency "big-cores". (Mostly because, I believe, those DSP programmers don't know how to take advantage of GPUs quite yet. GPUs are being used in 5GHz software-defined radios, I'm pretty sure your 44.1kHz audio-filters are easier to process... but I digress). Under current audio-programming paradigms, a core like the M1 is really, really good. You've got HUGE L1 cache to hold all sorts of looped-effects / reverb / instruments / blah blah blah, and you don't have any complicated latency issues to deal with. (A GPU would have microseconds of delay per kernel. It'd take an all-GPU design to negate the latency issue: more effort than current audio-engineers seem to have).

So I think a 32-core M1 probably would be a realtime audio-engineer's best platform. At least, until software teams figure out that 10TFlops of GPU-compute is a really good system to perform DSP-math on + rejig their kernels to work with the GPU's latency. (Microseconds of latency per kernel: small enough that its good for real-time audio, but you don't really have much room to play around with. It'd have to be optimized to just a few dozen kernel invocations to match the ridiculous latency requirements that musicians have)
 
Joined
Mar 21, 2016
Messages
2,508 (0.79/day)
I have literally heard this for over a decade now, it's always in conjunction with something else. Netbooks were supposed to be enough for everyday tasks, tablets, chromebooks, etc, all supposedly powered by cheap efficient SoCs yet God knows how many millions of laptops with dedicated x86 CPU and GPUs continue being shipped every year.
Very true someone will always want as similar as possible desktop performance in a portable device even though they never quite achieve it in the same sense of the meaning. Still today's desktop is tomorrows laptop.
 
Joined
Jul 3, 2019
Messages
322 (0.16/day)
Location
Bulgaria
Processor 6700K
Motherboard M8G
Cooling D15S
Memory 16GB 3k15
Video Card(s) 2070S
Storage 850 Pro
Display(s) U2410
Case Core X2
Audio Device(s) ALC1150
Power Supply Seasonic
Mouse Razer
Keyboard Logitech
Software 22H2
Assuming this is true, Apple aren't playing around. This could have consequences for the entire industry.
Of course big changes doesn't happen over night, and big software ecosystem is one off those things that's particularly slow to change.
Windows on ARM, Nvidia's bid for ARM, there're rumors of AMD working on ARM design. Intel?
x86 vs ARM. Interesting times ahead.
 
Joined
Jan 8, 2017
Messages
9,499 (3.27/day)
System Name Good enough
Processor AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard ASRock B650 Pro RS
Cooling 2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory 32GB - FURY Beast RGB 5600 Mhz
Video Card(s) Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage 1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s) LG UltraGear 32GN650-B + 4K Samsung TV
Case Phanteks NV7
Power Supply GPS-750C
So... I know that audio-professionals are super-interested in low-latency "big-cores". (Mostly because, I believe, those DSP programmers don't know how to take advantage of GPUs quite yet. GPUs are being used in 5GHz software-defined radios, I'm pretty sure your 44.1kHz audio-filters are easier to process... but I digress). Under current audio-programming paradigms, a core like the M1 is really, really good. You've got HUGE L1 cache to hold all sorts of looped-effects / reverb / instruments / blah blah blah, and you don't have any complicated latency issues to deal with. (A GPU would have microseconds of delay per kernel. It'd take an all-GPU design to negate the latency issue: more effort than current audio-engineers seem to have).

I don't know if that's quite true, the larger the core, the worse the latency is because of all that front end pre-processing to figure out the best scheme to execute the micro-ops. If you want low latency you need a processor as basic as possible with a short pipeline, M1 is or will be good probably because of dedicated DSPs.
 
Joined
Apr 24, 2020
Messages
2,721 (1.60/day)
I don't know if that's quite true, the larger the core, the worse the latency is because of all that front end pre-processing to figure out the best scheme to execute the micro-ops. If you want low latency you need a processor as basic as possible with a short pipeline, M1 is or will be good probably because of dedicated DSPs.

Assuming 44.1kHz, you have 22-microseconds to generate a sample. That's your hard limit: 22-microseconds per sample. A CPU task-switch is on the order of ~10-microseconds. Reading from SSD is ~1-microsecond (aka: 100,000 IOPS). Talking with a GPU is ~5 uS. Etc. etc. You must deliver the sample otherwise the audio will "pop", and DJ's don't like that. You can batch-samples up together into 44 to 88 sample chunks (1ms to 2ms "delivered" to the audio driver) at a time, but if you go too far beyond that, you'll start to incur latency and DJ's also don't like that.

So we're not talking about nanosecond-level latency (where microarchitecture decisions matter). There's still 22,000 nanoseconds per sample after all. But it does mean that if you fit inside of L1 vs L2, or maybe L2 vs L3... those sorts of things really matter inside the hundreds-of-microseconds timeframe.

Audio programs live within that area: from ~20 microseconds to 1000-microseconds range. Some things (ex: micro-op scheduling) are too fast: micro-op scheduling changes things at the 0.0005 microsecond (or half-a-nanosecond) level. That's not going to actually affect audio systems. Other things (ex: 5uS per GPU kernel invocation) are serious uses of time and need to be seriously considered and planned around. (Which is probably why no GPU-based audio software exists yet: that's cutting it close and it'd be a risk)

-------

The 128kB L1 cache of Apple's M1 means that L1 cache fits the most "instrument data" (or so I've been told). I'm neither an audio-engineer, nor an audio-programmer, nor audio-user / musician / DJ or whatever. But when I talk to audio-users, those are the issues they talk about.
 
Last edited:
Joined
Sep 26, 2012
Messages
871 (0.19/day)
Location
Australia
System Name ATHENA
Processor AMD 7950X
Motherboard ASUS Crosshair X670E Extreme
Cooling ASUS ROG Ryujin III 360, 13 x Lian Li P28
Memory 2x32GB Trident Z RGB 6000Mhz CL30
Video Card(s) ASUS 4090 STRIX
Storage 3 x Kingston Fury 4TB, 4 x Samsung 870 QVO
Display(s) Acer X38S, Wacom Cintiq Pro 15
Case Lian Li O11 Dynamic EVO
Audio Device(s) Topping DX9, Fluid FPX7 Fader Pro, Beyerdynamic T1 G2, Beyerdynamic MMX300
Power Supply Seasonic PRIME TX-1600
Mouse Xtrfy MZ1 - Zy' Rail, Logitech MX Vertical, Logitech MX Master 3
Keyboard Logitech G915 TKL
VR HMD Oculus Quest 2
Software Windows 11 + Universal Blue
If they can, thats cool. But I don't know many users (including many ex Mac users) that want to pay Apple prices for the tier of performance at that price (and I highly doubt Apple wants to give up their margins).
 
Joined
Jan 16, 2008
Messages
1,349 (0.22/day)
Location
Milwaukee, Wisconsin, USA
Processor i7-3770K
Motherboard Biostar Hi-Fi Z77
Cooling Swiftech H20 (w/Custom External Rad Enclosure)
Memory 16GB DDR3-2400Mhz
Video Card(s) Alienware GTX 1070
Storage 1TB Samsung 850 EVO
Display(s) 32" LG 1440p
Case Cooler Master 690 (w/Mods)
Audio Device(s) Creative X-Fi Titanium
Power Supply Corsair 750-TX
Mouse Logitech G5
Keyboard G. Skill Mechanical
Software Windows 10 (X64)
Joined
Mar 21, 2016
Messages
2,508 (0.79/day)
Assuming 44.1kHz, you have 22-microseconds to generate a sample. That's your hard limit: 22-microseconds per sample. A CPU task-switch is on the order of ~10-microseconds. Reading from SSD is ~1-microsecond (aka: 100,000 IOPS). Talking with a GPU is ~5 uS. Etc. etc. You must deliver the sample otherwise the audio will "pop", and DJ's don't like that. You can batch-samples up together into 44 to 88 sample chunks (1ms to 2ms "delivered" to the audio driver) at a time, but if you go too far beyond that, you'll start to incur latency and DJ's also don't like that.

So we're not talking about nanosecond-level latency (where microarchitecture decisions matter). There's still 22,000 nanoseconds per sample after all. But it does mean that if you fit inside of L1 vs L2, or maybe L2 vs L3... those sorts of things really matter inside the hundreds-of-microseconds timeframe.

Audio programs live within that area: from ~20 microseconds to 1000-microseconds range. Some things (ex: micro-op scheduling) are too fast: micro-op scheduling changes things at the 0.0005 microsecond (or half-a-nanosecond) level. That's not going to actually affect audio systems. Other things (ex: 5uS per GPU kernel invocation) are serious uses of time and need to be seriously considered and planned around. (Which is probably why no GPU-based audio software exists yet: that's cutting it close and it'd be a risk)

-------

The 128kB L1 cache of Apple's M1 means that L1 cache fits the most "instrument data" (or so I've been told). I'm neither an audio-engineer, nor an audio-programmer, nor audio-user / musician / DJ or whatever. But when I talk to audio-users, those are the issues they talk about.
All the sampling I need. The only thing I'd gripe about is the sampling rate is a mere 44.1kHz CD quality I'm not sure how I can live with such lofi sound.
1607390974534.png
 
  • Love
Reactions: SL2
Joined
Jul 3, 2019
Messages
322 (0.16/day)
Location
Bulgaria
Processor 6700K
Motherboard M8G
Cooling D15S
Memory 16GB 3k15
Video Card(s) 2070S
Storage 850 Pro
Display(s) U2410
Case Core X2
Audio Device(s) ALC1150
Power Supply Seasonic
Mouse Razer
Keyboard Logitech
Software 22H2
Joined
Apr 12, 2013
Messages
7,563 (1.77/day)
For 32 "big" cores, heck 16 cores & above, what they'll need is something closer if not better than IF. As Intel have found out they don't grow them glue on trees anymore :slap:
Apple better have something, really anything similar otherwise it's going to be a major issue no matter how or where their top of the line chips end up!
 
Joined
Sep 11, 2015
Messages
624 (0.18/day)
They're likely gonna need a node beyond 5nm, the big core cluster on their current chip is huge, 32 cores would mean a ridiculously large chip not to mention that they would also probably need to increase the size of that system cache. Of course some of those cores could be like the ones inside the small core cluster so it would be "32 core" just in name really.
Why don't you go away? Every time there is news or leaks about Apple's progress, you're in the first post sh***ing on it. I have never seen you give them even an inch of slack without mocking them in the same sentence. So I have to assume you must be a troll or just an extremely irrational Apple hater.

Apple already proved that their own cores can stand up to those of Intel and AMD and even surpass them in some applications. This successor to the M1 will definitely give Apple computers an enormous performance-boost. This new chip is planned to be for High-End-Desktop mind you. So this is almost definitely for the new Mac Pro that is supposed to come out in 2022. Bloomberg already talked about it the next day Apple revealed the M1 for the first time last month.

Furthermore: The GPU cores in these future High-End-MacBooks and iMacs are supposed to go up in count massively. The iGPU of the M1 has just 7 or 8 cores depending on the model. Now they're talking about an increase to 64 or even 128 GPU cores. These chips are going to beat all the AMD and Nvidia dedicated GPUs that Apple is currently offering. This is written in the actual source for this news post.

"Apple Preps Next Mac Chips With Aim to Outclass Top-End PCs"
https://www.bloomberg.com/news/arti...hest-end-pcs?srnd=technology-vp&sref=51oIW18F

And let's just not ignore what's going to happen. Even if they have to slash all their old prices to win over hearts, they will do it. Apple is already doing it with the first iteration of notebooks. They want to increase the demand as much as possible and generating hype with low prices and deals, is like Apple's joker card they have never really needed to use. Until now. They have everything lined up for the big win here. All Apple. Total domination and control of the market by one company. This is Apple's dream. If they want demand to go up by as much as I think they are aiming for here, prices will go down. It's that simple. I don't think Apple is trying to just compete with Intel and AMD. The goal here is obviously to crush them. The first iteration of M1 already showed that in some sense. And they would be stupid to let down now.
 
Last edited:
Joined
Apr 12, 2013
Messages
7,563 (1.77/day)
You mean a 32 core CPU+128 core GPU will beat say the A100 80GB outright? Yeah even if its Apple I doubt they'll pull that off, first of all the cooling on such a chip will have to be extreme level unless they're clocking & actively limiting both the components to unrealistically low levels!
 
Joined
Sep 11, 2015
Messages
624 (0.18/day)
You mean a 32 core CPU+128 core GPU will beat say the A100 80GB outright? Yeah even if its Apple I doubt they'll pull that off, first of all the cooling on such a chip will have to be extreme level unless they're clocking & actively limiting both the components to unrealistically low levels!
I was talking about consumer cards. I hope for maybe a 16 core CPU+64 core GPU iMac with better cooling. That should easily have 4K performance on most games above 60 fps even on high settings. The 128 core GPU in just 1 or 2 more years almost sounds too good to be true.
 
Last edited:
Joined
Jun 27, 2016
Messages
294 (0.09/day)
System Name MacBook Pro 16"
Processor M1 Pro
Memory 16GB unified memory
Storage 1 TB
First in house CPU now in house GPU. Apple is looking to ditch the entire backbone of current PC industry. Feels like going back in time TBH when Apple closed off everything. If the performance is there it can totally justify Apple doing so. Weird when consoles from MS and Sony look like PCs and Mac turning into Console like closing off.

Maybe that is what Intel / AMD / Nvidia need to come out with ever better products. Apple has the advantage of positive consumer perception (among its fans).

Apple has the advantage of optimizing almost every aspect of their systems vertically, which no other company does and this is why they are going to make the best performing AI systems, regardless of their fans. M1 is a fact.
 
Joined
Jan 8, 2017
Messages
9,499 (3.27/day)
System Name Good enough
Processor AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard ASRock B650 Pro RS
Cooling 2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory 32GB - FURY Beast RGB 5600 Mhz
Video Card(s) Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage 1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s) LG UltraGear 32GN650-B + 4K Samsung TV
Case Phanteks NV7
Power Supply GPS-750C
this is why they are going to make the best performing AI systems, regardless of their fans.

That's particularity hilarious because apparently not even Apple themselves think their machine learning accelerators are good. Check this out : https://developer.apple.com/documentation/coreml/mlcomputeunits

You can use the GPU and CPU explicitly but not the NPU, you can only vaguely let the API "decide". If it was that good, why don't they let people use it ? Hint : It probably isn't that good.
 

Aquinus

Resident Wat-man
Joined
Jan 28, 2012
Messages
13,171 (2.80/day)
Location
Concord, NH, USA
System Name Apollo
Processor Intel Core i9 9880H
Motherboard Some proprietary Apple thing.
Memory 64GB DDR4-2667
Video Card(s) AMD Radeon Pro 5600M, 8GB HBM2
Storage 1TB Apple NVMe, 4TB External
Display(s) Laptop @ 3072x1920 + 2x LG 5k Ultrafine TB3 displays
Case MacBook Pro (16", 2019)
Audio Device(s) AirPods Pro, Sennheiser HD 380s w/ FIIO Alpen 2, or Logitech 2.1 Speakers
Power Supply 96w Power Adapter
Mouse Logitech MX Master 3
Keyboard Logitech G915, GL Clicky
Software MacOS 12.1
That's particularity hilarious because apparently not even Apple themselves think their machine learning accelerators are good. Check this out : https://developer.apple.com/documentation/coreml/mlcomputeunits

You can use the GPU and CPU explicitly but not the NPU, you can only vaguely let the API "decide". If it was that good, why don't they let people use it ? Hint : It probably isn't that good.
Looks pretty decent with TensorFlow, even though support is only in Alpha. Maybe we need the software to mature a bit, but it sounds capable enough.


 
Joined
Jan 8, 2017
Messages
9,499 (3.27/day)
System Name Good enough
Processor AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard ASRock B650 Pro RS
Cooling 2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory 32GB - FURY Beast RGB 5600 Mhz
Video Card(s) Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage 1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s) LG UltraGear 32GN650-B + 4K Samsung TV
Case Phanteks NV7
Power Supply GPS-750C
Looks pretty decent with TensorFlow, even though support is only in Alpha. Maybe we need the software to mature a bit, but it sounds capable enough.



You'd get the same results on any half decent integrated GPU (apart from Intel's I guess but shouldn't surprise anyone). The only reason it runs fast when the data is small is not because the GPU itself is amazing it's simply because you don't need to wait for the data to be transferred across the PCIe connection since it's using the same pool of memory with the rest of the system (and some pretty large caches). When the data set grows in size that becomes less and less important and M1 GPU gets crushed, not to mention that the 2080ti isn't even the fastest card around anymore. Anyway, GPUs are GPUs, not much differs between them. I am sure that a dedicated GPU of theirs with a million billion cores will be faster, it's really just a matter of who can make the biggest GPU.

I was talking about the actual ML accelerator which Apple chose to not explicitly expose, that's a sign they're not that confident in the one thing that could really set them apart. If you can't chose the NPU in their own API, I don't think TensorFlow will get support for that any time soon.


This is guy is trying to get it to run code arbitrarily and let's just say Apple goes out of their way to make that really, really hard.
 

Aquinus

Resident Wat-man
Joined
Jan 28, 2012
Messages
13,171 (2.80/day)
Location
Concord, NH, USA
System Name Apollo
Processor Intel Core i9 9880H
Motherboard Some proprietary Apple thing.
Memory 64GB DDR4-2667
Video Card(s) AMD Radeon Pro 5600M, 8GB HBM2
Storage 1TB Apple NVMe, 4TB External
Display(s) Laptop @ 3072x1920 + 2x LG 5k Ultrafine TB3 displays
Case MacBook Pro (16", 2019)
Audio Device(s) AirPods Pro, Sennheiser HD 380s w/ FIIO Alpen 2, or Logitech 2.1 Speakers
Power Supply 96w Power Adapter
Mouse Logitech MX Master 3
Keyboard Logitech G915, GL Clicky
Software MacOS 12.1
was talking about the actual ML accelerator which Apple chose to not explicitly expose, that's a sign they're not that confident in the one thing that could really set them apart. If you can't chose the NPU in their own API, I don't think TensorFlow will get support for that any time soon.
Maybe there is a reason why they haven't exposed it. Maybe they've exposed the parts you need to know. I watched some of this video and the thing that strikes me is that the guy doesn't know Swift and that he's trying to used disassembled APIs to interact with it and a lot of the calls he was looking at (at least in the part I watched,) were things I would expect the kernel to handle. With that said, I get the distinct impression that this is 4 hours of a guy trying to figure out the platform he's working on.

I've taken a brief look at Apple documentation and he seems to be making it way harder than it has to be. Apple has simplified a lot of parts of model processing which is why the API is so thin. I suspect that between not understanding the platform, or Swift, while trying to reverse engineering system level calls, is probably going down the wrong rabbit hole.
You'd get the same results on any half decent integrated GPU (apart from Intel's I guess but shouldn't surprise anyone). The only reason it runs fast when the data is small is not because the GPU itself is amazing it's simply because you don't need to wait for the data to be transferred across the PCIe connection since it's using the same pool of memory with the rest of the system (and some pretty large caches). When the data set grows in size that becomes less and less important and M1 GPU gets crushed, not to mention that the 2080ti isn't even the fastest card around anymore.
Do I need to remind you that the M1 is literally Apple's entry level chip for the laptop/desktop market? Seems to do pretty well for an entry level product.
 
Joined
Jan 8, 2017
Messages
9,499 (3.27/day)
System Name Good enough
Processor AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard ASRock B650 Pro RS
Cooling 2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory 32GB - FURY Beast RGB 5600 Mhz
Video Card(s) Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage 1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s) LG UltraGear 32GN650-B + 4K Samsung TV
Case Phanteks NV7
Power Supply GPS-750C
Maybe there is a reason why they haven't exposed it. Maybe they've exposed the parts you need to know. I watched some of this video and the thing that strikes me is that the guy doesn't know Swift and that he's trying to used disassembled APIs to interact with it and a lot of the calls he was looking at (at least in the part I watched,) were things I would expect the kernel to handle. With that said, I get the distinct impression that this is 4 hours of a guy trying to figure out the platform he's working on.

They've exposed nothing, that's the point. He's trying to get the NPU to always execute the code he wants which Apple does not allow, that's the problem he's trying to solve, using the API calls is useless since it will always fallback to GPU or CPU and you have no control over that.
 

Aquinus

Resident Wat-man
Joined
Jan 28, 2012
Messages
13,171 (2.80/day)
Location
Concord, NH, USA
System Name Apollo
Processor Intel Core i9 9880H
Motherboard Some proprietary Apple thing.
Memory 64GB DDR4-2667
Video Card(s) AMD Radeon Pro 5600M, 8GB HBM2
Storage 1TB Apple NVMe, 4TB External
Display(s) Laptop @ 3072x1920 + 2x LG 5k Ultrafine TB3 displays
Case MacBook Pro (16", 2019)
Audio Device(s) AirPods Pro, Sennheiser HD 380s w/ FIIO Alpen 2, or Logitech 2.1 Speakers
Power Supply 96w Power Adapter
Mouse Logitech MX Master 3
Keyboard Logitech G915, GL Clicky
Software MacOS 12.1
They've exposed nothing, that's the point. He's trying to get the NPU to always execute the code he wants which Apple does not allow, that's the problem he's trying to solve, using the API calls is useless since it will always fallback to GPU or CPU and you have no control over that.
Without knowing more about how Apple implemented the hardware, it's hard to say, but there very well could be reasons for that. It could be very plausible that the AI circuitry consumes a lot more power than the CPUs or GPUs. It could be power management that dictates where it's run. Perhaps there are thermal reasons for it, or memory pressure reasons, or task complexity reasons. Maybe multiple tasks are running. Maybe it's a laptop that is in a low power mode and it's forcing it into the low power CPU cores compared to maybe a Mac Mini which would schedule the work differently with fewer power and thermal limitations. Apple probably suspects that they can better choose where the code needs to run than the developer and that the software shouldn't be specifically tied to a particular hardware implementation either.

As a software engineer, when I see something like this, it makes me think that it was done for a reason, not just for the sake of blackboxing everything. I know that Apple tends to do that, but they do that when they think they can do it better for you. Honestly, that's not a bad thing. I don't want to have to think about what part of the SoC is best going to run my code for the state that the machine is currently in. That's a decision best made for the OS in my opinion, particularly when you're tightly integrating all of the parts of a pretty complicated SoC like Apple is doing with their chips.
 
Joined
Sep 5, 2007
Messages
512 (0.08/day)
System Name HAL_9017
Processor Intel i9 10850k
Motherboard Asus Prime z490-A
Cooling Corsair H115i
Memory GSkill 32GB DDR4-3000 Trident-Z RGB
Video Card(s) NVIDIA 1080GTX FE w/ EVGA Hybrid Water Cooler
Storage Samsung EVO 960 M.2 SSD 500Gb
Display(s) Asus XG279
Case In Win 805i
Audio Device(s) EVGA NuAudio
Power Supply CORSAIR RM750
Mouse Logitech Master 2s
Keyboard Keychron K4
Great news but it also makes me glad I got the last of the Windows supported MB Pros right when the 5600m launched.

It's no gaming powerhouse but a nice way to have a little near-all-in-one of both worlds and flexibility on the go.

While I wish Windows support were there sadly I bet GPU support would be there too. It's interesting to see where the scene is inna years time. I was skeptical of the m1 bit glad to see positive traction.

(And before folks ask: there's no other way to work in Sketch and flip over to relaxing by blasting-faces in Borderlands 3.)
 
Top