Monday, October 18th 2021

Apple Introduces M1 Pro and M1 Max: the Most Powerful Chips Apple Has Ever Built
Apple today announced M1 Pro and M1 Max, the next breakthrough chips for the Mac. Scaling up M1's transformational architecture, M1 Pro offers amazing performance with industry-leading power efficiency, while M1 Max takes these capabilities to new heights. The CPU in M1 Pro and M1 Max delivers up to 70 percent faster CPU performance than M1, so tasks like compiling projects in Xcode are faster than ever. The GPU in M1 Pro is up to 2x faster than M1, while M1 Max is up to an astonishing 4x faster than M1, allowing pro users to fly through the most demanding graphics workflows.
M1 Pro and M1 Max introduce a system-on-a-chip (SoC) architecture to pro systems for the first time. The chips feature fast unified memory, industry-leading performance per watt, and incredible power efficiency, along with increased memory bandwidth and capacity. M1 Pro offers up to 200 GB/s of memory bandwidth with support for up to 32 GB of unified memory. M1 Max delivers up to 400 GB/s of memory bandwidth—2x that of M1 Pro and nearly 6x that of M1—and support for up to 64 GB of unified memory. And while the latest PC laptops top out at 16 GB of graphics memory, having this huge amount of memory enables graphics-intensive workflows previously unimaginable on a notebook. The efficient architecture of M1 Pro and M1 Max means they deliver the same level of performance whether MacBook Pro is plugged in or using the battery. M1 Pro and M1 Max also feature enhanced media engines with dedicated ProRes accelerators specifically for pro video processing. M1 Pro and M1 Max are by far the most powerful chips Apple has ever built."M1 has transformed our most popular systems with incredible performance, custom technologies, and industry-leading power efficiency. No one has ever applied a system-on-a-chip design to a pro system until today with M1 Pro and M1 Max," said Johny Srouji, Apple's senior vice president of Hardware Technologies. "With massive gains in CPU and GPU performance, up to six times the memory bandwidth, a new media engine with ProRes accelerators, and other advanced technologies, M1 Pro and M1 Max take Apple silicon even further, and are unlike anything else in a pro notebook."
M1 Pro: A Whole New Level of Performance and Capability
Utilizing the industry-leading 5-nanometer process technology, M1 Pro packs in 33.7 billion transistors, more than 2x the amount in M1. A new 10-core CPU, including eight high-performance cores and two high-efficiency cores, is up to 70 percent faster than M1, resulting in unbelievable pro CPU performance. Compared with the latest 8-core PC laptop chip, M1 Pro delivers up to 1.7x more CPU performance at the same power level and achieves the PC chip's peak performance using up to 70 percent less power. Even the most demanding tasks, like high-resolution photo editing, are handled with ease by M1 Pro.
M1 Pro has an up-to-16-core GPU that is up to 2x faster than M1 and up to 7x faster than the integrated graphics on the latest 8-core PC laptop chip. Compared to a powerful discrete GPU for PC notebooks, M1 Pro delivers more performance while using up to 70 percent less power. And M1 Pro can be configured with up to 32 GB of fast unified memory, with up to 200 GB/s of memory bandwidth, enabling creatives like 3D artists and game developers to do more on the go than ever before.M1 Max: The World's Most Powerful Chip for a Pro Notebook
M1 Max features the same powerful 10-core CPU as M1 Pro and adds a massive 32-core GPU for up to 4x faster graphics performance than M1. With 57 billion transistors—70 percent more than M1 Pro and 3.5x more than M1—M1 Max is the largest chip Apple has ever built. In addition, the GPU delivers performance comparable to a high-end GPU in a compact pro PC laptop while consuming up to 40 percent less power, and performance similar to that of the highest-end GPU in the largest PC laptops while using up to 100 watts less power. This means less heat is generated, fans run quietly and less often, and battery life is amazing in the new MacBook Pro. M1 Max transforms graphics-intensive workflows, including up to 13x faster complex timeline rendering in Final Cut Pro compared to the previous-generation 13-inch MacBook Pro.M1 Max also offers a higher-bandwidth on-chip fabric, and doubles the memory interface compared with M1 Pro for up to 400 GB/s, or nearly 6x the memory bandwidth of M1. This allows M1 Max to be configured with up to 64 GB of fast unified memory. With its unparalleled performance, M1 Max is the most powerful chip ever built for a pro notebook.
Fast, Efficient Media Engine, Now with ProRes
M1 Pro and M1 Max include an Apple-designed media engine that accelerates video processing while maximizing battery life. M1 Pro also includes dedicated acceleration for the ProRes professional video codec, allowing playback of multiple streams of high-quality 4K and 8K ProRes video while using very little power. M1 Max goes even further, delivering up to 2x faster video encoding than M1 Pro, and features two ProRes accelerators. With M1 Max, the new MacBook Pro can transcode ProRes video in Compressor up to a remarkable 10x faster compared with the previous-generation 16-inch MacBook Pro.Advanced Technologies for a Complete Pro System
Both M1 Pro and M1 Max are loaded with advanced custom technologies that help push pro workflows to the next level:
macOS Monterey is engineered to unleash the power of M1 Pro and M1 Max, delivering breakthrough performance, phenomenal pro capabilities, and incredible battery life. By designing Monterey for Apple silicon, the Mac wakes instantly from sleep, and the entire system is fast and incredibly responsive. Developer technologies like Metal let apps take full advantage of the new chips, and optimizations in Core ML utilize the powerful Neural Engine so machine learning models can run even faster. Pro app workload data is used to help optimize how macOS assigns multi-threaded tasks to the CPU cores for maximum performance, and advanced power management features intelligently allocate tasks between the performance and efficiency cores for both incredible speed and battery life.
The combination of macOS with M1, M1 Pro, or M1 Max also delivers industry-leading security protections, including hardware-verified secure boot, runtime anti-exploitation technologies, and fast, in-line encryption for files. All of Apple's Mac apps are optimized for—and run natively on—Apple silicon, and there are over 10,000 Universal apps and plug-ins available. Existing Mac apps that have not yet been updated to Universal will run seamlessly with Apple's Rosetta 2 technology, and users can also run iPhone and iPad apps directly on the Mac, opening a huge new universe of possibilities.Apple's Commitment to the Environment
Today, Apple is carbon neutral for global corporate operations, and by 2030, plans to have net-zero climate impact across the entire business, which includes manufacturing supply chains and all product life cycles. This also means that every chip Apple creates, from design to manufacturing, will be 100 percent carbon neutral.
M1 Pro and M1 Max introduce a system-on-a-chip (SoC) architecture to pro systems for the first time. The chips feature fast unified memory, industry-leading performance per watt, and incredible power efficiency, along with increased memory bandwidth and capacity. M1 Pro offers up to 200 GB/s of memory bandwidth with support for up to 32 GB of unified memory. M1 Max delivers up to 400 GB/s of memory bandwidth—2x that of M1 Pro and nearly 6x that of M1—and support for up to 64 GB of unified memory. And while the latest PC laptops top out at 16 GB of graphics memory, having this huge amount of memory enables graphics-intensive workflows previously unimaginable on a notebook. The efficient architecture of M1 Pro and M1 Max means they deliver the same level of performance whether MacBook Pro is plugged in or using the battery. M1 Pro and M1 Max also feature enhanced media engines with dedicated ProRes accelerators specifically for pro video processing. M1 Pro and M1 Max are by far the most powerful chips Apple has ever built."M1 has transformed our most popular systems with incredible performance, custom technologies, and industry-leading power efficiency. No one has ever applied a system-on-a-chip design to a pro system until today with M1 Pro and M1 Max," said Johny Srouji, Apple's senior vice president of Hardware Technologies. "With massive gains in CPU and GPU performance, up to six times the memory bandwidth, a new media engine with ProRes accelerators, and other advanced technologies, M1 Pro and M1 Max take Apple silicon even further, and are unlike anything else in a pro notebook."
M1 Pro: A Whole New Level of Performance and Capability
Utilizing the industry-leading 5-nanometer process technology, M1 Pro packs in 33.7 billion transistors, more than 2x the amount in M1. A new 10-core CPU, including eight high-performance cores and two high-efficiency cores, is up to 70 percent faster than M1, resulting in unbelievable pro CPU performance. Compared with the latest 8-core PC laptop chip, M1 Pro delivers up to 1.7x more CPU performance at the same power level and achieves the PC chip's peak performance using up to 70 percent less power. Even the most demanding tasks, like high-resolution photo editing, are handled with ease by M1 Pro.
M1 Pro has an up-to-16-core GPU that is up to 2x faster than M1 and up to 7x faster than the integrated graphics on the latest 8-core PC laptop chip. Compared to a powerful discrete GPU for PC notebooks, M1 Pro delivers more performance while using up to 70 percent less power. And M1 Pro can be configured with up to 32 GB of fast unified memory, with up to 200 GB/s of memory bandwidth, enabling creatives like 3D artists and game developers to do more on the go than ever before.M1 Max: The World's Most Powerful Chip for a Pro Notebook
M1 Max features the same powerful 10-core CPU as M1 Pro and adds a massive 32-core GPU for up to 4x faster graphics performance than M1. With 57 billion transistors—70 percent more than M1 Pro and 3.5x more than M1—M1 Max is the largest chip Apple has ever built. In addition, the GPU delivers performance comparable to a high-end GPU in a compact pro PC laptop while consuming up to 40 percent less power, and performance similar to that of the highest-end GPU in the largest PC laptops while using up to 100 watts less power. This means less heat is generated, fans run quietly and less often, and battery life is amazing in the new MacBook Pro. M1 Max transforms graphics-intensive workflows, including up to 13x faster complex timeline rendering in Final Cut Pro compared to the previous-generation 13-inch MacBook Pro.M1 Max also offers a higher-bandwidth on-chip fabric, and doubles the memory interface compared with M1 Pro for up to 400 GB/s, or nearly 6x the memory bandwidth of M1. This allows M1 Max to be configured with up to 64 GB of fast unified memory. With its unparalleled performance, M1 Max is the most powerful chip ever built for a pro notebook.
Fast, Efficient Media Engine, Now with ProRes
M1 Pro and M1 Max include an Apple-designed media engine that accelerates video processing while maximizing battery life. M1 Pro also includes dedicated acceleration for the ProRes professional video codec, allowing playback of multiple streams of high-quality 4K and 8K ProRes video while using very little power. M1 Max goes even further, delivering up to 2x faster video encoding than M1 Pro, and features two ProRes accelerators. With M1 Max, the new MacBook Pro can transcode ProRes video in Compressor up to a remarkable 10x faster compared with the previous-generation 16-inch MacBook Pro.Advanced Technologies for a Complete Pro System
Both M1 Pro and M1 Max are loaded with advanced custom technologies that help push pro workflows to the next level:
- A 16-core Neural Engine for on-device machine learning acceleration and improved camera performance.
- A new display engine drives multiple external displays.
- Additional integrated Thunderbolt 4 controllers provide even more I/O bandwidth.
- Apple's custom image signal processor, along with the Neural Engine, uses computational video to enhance image quality for sharper video and more natural-looking skin tones on the built-in camera.
- Best-in-class security, including Apple's latest Secure Enclave, hardware-verified secure boot, and runtime anti-exploitation technologies.A Huge Step in the Transition to Apple Silicon
- The Mac is now one year into its two-year transition to Apple silicon, and M1 Pro and M1 Max represent another huge step forward. These are the most powerful and capable chips Apple has ever created, and together with M1, they form a family of chips that lead the industry in performance, custom technologies, and power efficiency.
macOS Monterey is engineered to unleash the power of M1 Pro and M1 Max, delivering breakthrough performance, phenomenal pro capabilities, and incredible battery life. By designing Monterey for Apple silicon, the Mac wakes instantly from sleep, and the entire system is fast and incredibly responsive. Developer technologies like Metal let apps take full advantage of the new chips, and optimizations in Core ML utilize the powerful Neural Engine so machine learning models can run even faster. Pro app workload data is used to help optimize how macOS assigns multi-threaded tasks to the CPU cores for maximum performance, and advanced power management features intelligently allocate tasks between the performance and efficiency cores for both incredible speed and battery life.
The combination of macOS with M1, M1 Pro, or M1 Max also delivers industry-leading security protections, including hardware-verified secure boot, runtime anti-exploitation technologies, and fast, in-line encryption for files. All of Apple's Mac apps are optimized for—and run natively on—Apple silicon, and there are over 10,000 Universal apps and plug-ins available. Existing Mac apps that have not yet been updated to Universal will run seamlessly with Apple's Rosetta 2 technology, and users can also run iPhone and iPad apps directly on the Mac, opening a huge new universe of possibilities.Apple's Commitment to the Environment
Today, Apple is carbon neutral for global corporate operations, and by 2030, plans to have net-zero climate impact across the entire business, which includes manufacturing supply chains and all product life cycles. This also means that every chip Apple creates, from design to manufacturing, will be 100 percent carbon neutral.
156 Comments on Apple Introduces M1 Pro and M1 Max: the Most Powerful Chips Apple Has Ever Built
edit: and for max, you can just read the wikichip page of a given processor, and check how many instructions it can dispatch every cycle. Is that something that relates to actual application performance? No.
IPC is chip ,no core specific everything else in a system is changeable.
And I do get your point , so do others that's why reviews exist showing different application performance metrics.
In short, what it shows is that Apple is somehow managing L1 and L2 caches several times the size of the competition (6x the L1I size!) with lower latency - which is downright incredible, as conventional logic says that any cache size increase will increase latency too (which has borne out over several generations of Intel and AMD CPUs, for example) - while also having re-order buffers 2-3x the size of Intel and AMD, an 8-wide (compared to 4-wide for both Intel and AMD) decoder, and 2-3x the execution ports, etc. Managing to design a CPU core this wide without significant performance or power penalties and managing to keep it fed is very impressive - and likely highly dependent on tightly integrated RAM, as well as those massive caches, but that doesn't take away from the performance results. The main drawback of ultra-wide core designs is clock speeds, but Apple seems to be doing decently there as well with >3GHz sustained and even 3GHz on the mobile A14.
Is this "the best CPU out there"? Not necessarily. That depends on your use case and software needs. But is it the most advanced architecture out there? Without a doubt. Do AMD and Intel have their work cut out for them to keep up, let alone catch up? Absolutely.
Me? I really hope this leads AMD to bet on more integrated APUs, and unified memory. I would love a balls-to-the-wall APU with heaps of LPDDR5 for my next laptop. 20-30CUs at low clocks? That would be amazing. It wouldn't be cheap, but it would be fantastic, as long as they can get unified memory working in Windows.
The PC parts need to be supported in various system, they need to be upgradable. (like expanding memory). This is an advantages over the M1 but the inconvenient is slower standard adoption, more latency due to the fact that the memory isn't standard, and is further away from the CPU. They also have less flexibility on the memory design since adding channels require a new socket.
On the M1 part, they are specifically designed for specific form factor. The memory isn't upgradable and is being soldered on the motherboard close to the CPU. Their design allow them to scale up and down the memory bus and adopt new standard rapidly since they don't have to deal with a standard form factor for upgrade .This also allow them to have the memory very close for better latency and better energy efficiency. But if you want to get more memory because you didn't buy large enough, you have to buy a new device. This is good for apple because people will tend to buy higher than they need because they will not want to have a costly upgrade later.
Apple is just pushing their advantages since no one seems to care about their inconvenient on their platform. But if AMD and Intel would do something similar, many PC enthusiast wouldn't like that.
It still make a lot of sense to do on a laptop since a lot of the time, it will never be upgraded. Also Apple own their entire stack. If they want to put an accelerator, they can leverage an API in the OS and make their compiler to use it whenever it needed.
In reality, i think they are where they are supposed to be regarding their own performance. The fact isn't that they outperform now, it's that they sucked for 2 decade being slowed down by Intel chips. They are just where a company that own their full stack should be right now.
And the fun things is you can buy if you want, and you can buy a PC if you prefer. PC isn't dead. In the purest form, this is IPC.
But IPC is problematic across different instruction set.
Let say in theory, you have a CISC instruction that load a number, increment it by 1 then save it into the memory. It can do that accross 3 cycles. On the other side, you have a RISC CPU that need a load instruction, an increment instruction and and a store instruction to do the same amount of work, but each take 1 cycle to run. This mean this cpu run 3 time the number of instruction for the same amount of work. We could say it have 3x the IPC than the CISC cpu but in the end nothing more was done.
This is why in it's purest form. IPC is only a good comparison within the same Instruction Set. And it's probably only really useful to compare 2 cpu of the same manufacturer once you factor in the frequency they run.
Also, the same processor can get a higher IPC at lower frequency than at higher if it have to wait less for I/O or Memory. Waiting 60 ns for data to arrive at 2 GHz is less cycle loss than the same wait at 5 GHz. This is why it's hard to extract IPC in it's purest form from Benchmark.
What most people Call IPC these days is mostly a somehow standardized metric like the Spec Benchmark. It's no longer the amount of Instruction per clock but the amount of Work per clock. And in the end, that is what really matter.
but we should say WPC or something similar instead of IPC.
IPC is a bit silly, as for example avx512 lowers IPC, but improves WPC.
So to make a long story short, how the different levels of the memory hierarchy are built out really influences how it benefits the SoC as a whole. A huge LLC won't do you a whole lot of good if your L2 is absolutely tiny. So it's a bit more complicated than just throwing more of x, y, or z at a problem. Wide memory interfaces for DRAM costs a lot of die space, power, and traces for the memory chips makes boards expensive to produce for it. It's not a good path forward for traditional DRAM. Now, I would agree with respect to HBM2 given its bandwidth and power characteristics, but it also comes with trade-offs in the sense that it's relatively expensive to produce. Apple is basically doing that with their DRAM, so they have the advantage of economy of scale.
Zen3 with 3200 jedec has ram latency of 80ns or so. The worst possible IPC i can think of would require the most program instructions to be fetched from ram. That requires just a huge 3d LUT check and goto based on that. So one ram latency per two instructions, meaning an ipc of around 1/200. If the prediction logic can see through that, you’ll need to add some stupid instruction to do an address conversion that cannot easily be predicted (some hash function maybe, that has a single instruction in some extension) and you end up with an IPC of around 1/133.
edit: a cleaner solution would to just write a simple routine that reads a byte at addr, then writes some hash (the processors must have some hash extension, so that it is simply one instruction) of byte to addr and loops. That would produce an ipc of 1/100 or so. I was talking about the min and max IPC. trying to measure them is pointless.
But given that Intel's latest L1 cache size increase (24 to 32K, IIRC) came with a 1-cycle latency penalty, I can't quite see how they (or AMD) would suddenly pull a 3-6x increase in cache sizes out of their sleeves without also dramatically increasing latencies, which begs the question of whether others would even be able to make a similarly huge, wide, and cache-rich core without it being hobbled by slow cache accesses and thus not being fed. That seems to be the case, as we would otherwise most likely see much wider designs for servers and other markets where costs don't matter. No, that's why we have industry-standard benchmarks based on real-world workloads. It's obvious that no such thing will ever be perfect, but it is a reasonable approximation of performance across a wide range of real-world usage scenarios.
It's a core micro architecture measurement, and is only indicative of performance not demonstrative of all performance.
Just read from here: en.m.wikipedia.org/wiki/Instructions_per_cycle
”The number of instructions executed per clock is not a constant for a given processor; it depends on how the particular software being run interacts with the processor, and indeed the entire machine, particularly the memory hierarchy.”
And that's the key here: the individual parts of what Apple is doing here might not be that impressive, but that they're managing to make all of this into a functional, well balanced and highly performant and efficient core? That is impressive. Very much so. In the end, what matters at the user end is performance and power consumption, which are always in tension, especially in mobile and SFF use cases. The M1 (and upcoming siblings) manages to shift to an entirely different level in this balance, most likely delivering 5800X-level performance (if not higher) at half the power or less (a 5800X is ~140W under full boost and an all-core load after all, these are 50-60W chips), while also containing either a mid-range or high-end dPGU-level iGPU. That is obviously impressive. Will it come with tradeoffs? Of course it will. Concurrent CPU and GPU loads will be power and/or thermally limited, as always, and they do spend an almost silly amount of silicon per chip. But does that matter when the laptop is comparably priced to competitors? No. And sure, you can no doubt find a comparable laptop for less. But a 5980HX+3080/Quadro RTX workstation isn't going to cost you any less than an M1 Max MBP, and both that and the cheaper consumer-focused version is going to be much bigger and heavier, and have terrible battery life. Making a product is, when it comes down to it, about the full package. These chips clearly have downsides, but they are downsides that are largely immaterial in the context of the overall package. And that's what makes them impressive.
A 400+ mm^2 SoC on the newest node with 400GB/s bandwidth that's really fast ? Wow... I guess.
Terrible IPC always requires terrible code to go with it.
my example got to around one instruction per 100 clock cycles, and is likely the worst you can get to without disabling processor features. What is the IPC in your misaligned avx loads?
all I know is that people were very happy with it especially with things like rendering and battery life and having the same performance on on battery as on power something nor AMD or Intel can ever do with x86. Video rendering discussion can be closed as Intel/AMD wont be able to come even close to pro/max. Also OSX support is light-years ahead of Windows for ARM. ARM can be the future if Microsoft can make something as efficient as Rosetta. IMO 95% of people use only 10% of the instructions set so why not rip all the benefits of a RISC chip for the majority of people and for those 5% can always go with Intel/AMD. So it makes much more sense ARM to be the mainstream option not the other way around. The problem is only way we get proper PC/Windows ARM platform is for AMD and Intel to enter that market. And it will all depend on what Apple does with it but Apple being Apple they create their own markets and sell expensive laptops to only very small portion of the global laptop market so it won't be like AMD or Intel will ever be in position where they have no choice but to switch to ARM.
I have never stated that knowing it is of any importance.
IPC in itself (or rather WPC) is a great way to compare different systems analytically. I.e. to understand differences in generic application performance and where the differences might come from.