• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Apple's Graphics Performance Claims Proven Exaggerated by Mac Studio Reviews

studentrights

New Member
Joined
Mar 19, 2022
Messages
11 (0.01/day)
Im the one thats saying it and yes, its true. It gets around 44k stock and 49k overclocked. It's on a whole different level
I found the test which matches your claim. If the rumor that the MacPro M1 chip will double the Ultra's chip performance. It should be able to compete with any PC CPU or GPU from INTEL, AMD or NVIDA. We'll see this June. Of course, this will all be accomplished with a fraction of the power consumption. Pretty impressive feat for Apple, being able to hit the top of the pack with their 1st series of Apple silicon chips, all on a single SOC. Can't wait for the M2.
 
Joined
Dec 1, 2020
Messages
468 (0.32/day)
Processor Ryzen 5 7600X
Motherboard ASRock B650M PG Riptide
Cooling Noctua NH-D15
Memory DDR5 6000Mhz CL28 32GB
Video Card(s) Nvidia Geforce RTX 3070 Palit GamingPro OC
Storage Corsair MP600 Force Series Gen.4 1TB
I found the test which matches your claim. If the rumor that the MacPro M1 chip will double the Ultra's chip performance. It should be able to compete with any PC CPU or GPU from INTEL, AMD or NVIDA. We'll see this June. Of course, this will all be accomplished with a fraction of the power consumption. Pretty impressive feat for Apple, being able to hit the top of the pack with their 1st series of Apple silicon chips, all on a single SOC. Can't wait for the M2.
With any PC CPU or GPU? Are you aware that 3970 is 32core Zen2 and there are 64core 3990x and maybe by june Zen3 64core? Even if they double the GPU in june, the next gen AMD and NVIDIA will come in few months and will wipe the floor with whatever Apple bring in June
 
Joined
Jun 14, 2020
Messages
3,474 (2.13/day)
System Name Mean machine
Processor 12900k
Motherboard MSI Unify X
Cooling Noctua U12A
Memory 7600c34
Video Card(s) 4090 Gamerock oc
Storage 980 pro 2tb
Display(s) Samsung crg90
Case Fractal Torent
Audio Device(s) Hifiman Arya / a30 - d30 pro stack
Power Supply Be quiet dark power pro 1200
Mouse Viper ultimate
Keyboard Blackwidow 65%
I found the test which matches your claim. If the rumor that the MacPro M1 chip will double the Ultra's chip performance. It should be able to compete with any PC CPU or GPU from INTEL, AMD or NVIDA. We'll see this June. Of course, this will all be accomplished with a fraction of the power consumption. Pretty impressive feat for Apple, being able to hit the top of the pack with their 1st series of Apple silicon chips, all on a single SOC. Can't wait for the M2.
Uhm, nope? The 3970x is now a 3year old CPU that wasn't even the high end back when it was released. There are higher end Threadrippers, and the new gen is supposedly coming out soon. For example the 3990x is double as fast as the 3970x.

In terms of CPU power the m1 ultra is still competing - and losing, with mainstream x86 cpus from last gen. Meteorlake and zen 4 is due to release soon that will be on a completely different level as well. Saying that apple competes with the top of the stack is just a joke, it competes with the mainstream parts, and it doesn't even beat those
 

studentrights

New Member
Joined
Mar 19, 2022
Messages
11 (0.01/day)
With any PC CPU or GPU? Are you aware that 3970 is 32core Zen2 and there are 64core 3990x and maybe by june Zen3 64core? Even if they double the GPU in june, the next gen AMD and NVIDIA will come in few months and will wipe the floor with whatever Apple bring in June
But it's still a narrow advantage since the chip cannot directly compete with the M1s overall capability. The reality is that outside of narrow use cases, overall the M1 is easily the best overall chip, since you'll need to two separate CPUs and GPUs to match that overall capability. I guess we'll see in June. What's clear is that Apple has gone from iPhone chips to match or best high end PC chips in their 1st generation SOCs. More importantly, in mobile they are unmatched whether that is smartphone performance or laptops, where power consumption and weight matter. The MacBook Pros can run unthrottled with the exactly same scores of the MacStudio with the M1 Max chips. INTEL and AMD lost any real laptop advantage.
 
Joined
Jun 14, 2020
Messages
3,474 (2.13/day)
System Name Mean machine
Processor 12900k
Motherboard MSI Unify X
Cooling Noctua U12A
Memory 7600c34
Video Card(s) 4090 Gamerock oc
Storage 980 pro 2tb
Display(s) Samsung crg90
Case Fractal Torent
Audio Device(s) Hifiman Arya / a30 - d30 pro stack
Power Supply Be quiet dark power pro 1200
Mouse Viper ultimate
Keyboard Blackwidow 65%
But it's still a narrow advantage since the chip cannot directly compete with the M1s overall capability. The reality is that outside of narrow use cases, overall the M1 is easily the best overall chip, since you'll need to two separate CPUs and GPUs to match that overall capability. I guess we'll see in June. What's clear is that Apple has gone from iPhone chips to match or best high end PC chips in their 1st generation SOCs. More importantly, in mobile they are unmatched whether that is smartphone performance or laptops, where power consumption and weight matter. The MacBook Pros can run unthrottled with the exactly same scores of the MacStudio with the M1 Max chips. INTEL and AMD lost any real laptop advantage.
What do you mean you need two separate CPUs? The 3990x is a single CPU and its 4 times faster than the m1 ultra.
 

studentrights

New Member
Joined
Mar 19, 2022
Messages
11 (0.01/day)
Uhm, nope? The 3970x is now a 3year old CPU that wasn't even the high end back when it was released. There are higher end Threadrippers, and the new gen is supposedly coming out soon. For example the 3990x is double as fast as the 3970x.

In terms of CPU power the m1 ultra is still competing - and losing, with mainstream x86 cpus from last gen. Meteorlake and zen 4 is due to release soon that will be on a completely different level as well. Saying that apple competes with the top of the stack is just a joke, it competes with the mainstream parts, and it doesn't even beat those
The M1 architecture is already two years old. All they've really been doing in doubling and quadrupling the chips. The M2 is just around the corner.

What do you mean you need two separate CPUs? The 3990x is a single CPU and its 4 times faster than the m1 ultra.
No, you missed the point.

What do you mean you need two separate CPUs? The 3990x is a single CPU and its 4 times faster than the m1 ultra.
How does the 3990x compare against NVIDA GPUs? As favorable as the M1 Ultra?
 
Joined
Jun 14, 2020
Messages
3,474 (2.13/day)
System Name Mean machine
Processor 12900k
Motherboard MSI Unify X
Cooling Noctua U12A
Memory 7600c34
Video Card(s) 4090 Gamerock oc
Storage 980 pro 2tb
Display(s) Samsung crg90
Case Fractal Torent
Audio Device(s) Hifiman Arya / a30 - d30 pro stack
Power Supply Be quiet dark power pro 1200
Mouse Viper ultimate
Keyboard Blackwidow 65%
The M1 architecture is already two years old. All they've really been doing in doubling and quadrupling the chips. The M2 is just around the corner.
So it's newer than the 3990x which is 4 times faster?

How does the 3990x compare against NVIDA GPUs? As favorable as the M1 Ultra?
Who cares? You said its comparing with the best and that is not true. It gets beat very very handily both in the CPU and in the GPU department even by mainstream parts, let alone the high end ones.
 
Joined
Dec 1, 2020
Messages
468 (0.32/day)
Processor Ryzen 5 7600X
Motherboard ASRock B650M PG Riptide
Cooling Noctua NH-D15
Memory DDR5 6000Mhz CL28 32GB
Video Card(s) Nvidia Geforce RTX 3070 Palit GamingPro OC
Storage Corsair MP600 Force Series Gen.4 1TB
But it's still a narrow advantage since the chip cannot directly compete with the M1s overall capability. The reality is that outside of narrow use cases, overall the M1 is easily the best overall chip, since you'll need to two separate CPUs and GPUs to match that overall capability. I guess we'll see in June. What's clear is that Apple has gone from iPhone chips to match or best high end PC chips in their 1st generation SOCs. More importantly, in mobile they are unmatched whether that is smartphone performance or laptops, where power consumption and weight matter. The MacBook Pros can run unthrottled with the exactly same scores of the MacStudio with the M1 Max chips. INTEL and AMD lost any real laptop advantage.
Can't you check the benchmarks? 3970x is 2 times faster than M1U, 3990x is 3 or even more times faster. This are not ARM toys, you don't need 2 CPUs, because you are already destroying whatever apple will bring in the next 3-4 years. Also you are really deluded if you think this is their first try and they haven't experience with SOCs. Also don't speak nonsense for the mobile market, 6900HS and 12900HK already smashed M1 max. And I will remind you once again, don't even dare to speak for advantage when speak for this toys with few use cases where they are actually better, because for each case where apple is better, there are 10 where x86 is WAY better
 
Last edited by a moderator:
Joined
May 8, 2020
Messages
578 (0.35/day)
System Name Mini efficient rig.
Processor R9 3900, @4ghz -0.05v offset. 110W peak.
Motherboard Gigabyte B450M DS3H, bios f41 pcie 4.0 unlocked.
Cooling some server blower @1500rpm
Memory 2x16GB oem Samsung D-Die. 3200MHz
Video Card(s) RX 6600 Pulse w/conductonaut @65C hotspot
Storage 1x 128gb nvme Samsung 950 Pro - 4x 1tb sata Hitachi 2.5" hdds
Display(s) Samsung C24RG50FQI
Case Jonsbo C2 (almost itx sized)
Audio Device(s) integrated Realtek crap
Power Supply Seasonic SSR-750FX
Mouse Logitech G502
Keyboard Redragon K539 brown switches
Software Windows 7 Ultimate SP1 + Windows 10 21H2 LTSC (patched).
Benchmark Scores Cinebench: R15 3050 pts, R20 7000 pts, R23 17800 pts, r2024 1050 pts.
Well m1 max sounds attractive considering that where i live for $1000 you can only get a 3060 lol. And m1 ultra is excessive and will probably remain unoptimized (considering that m1 ultra is 2x m1 max glued together).
 
Joined
Apr 1, 2009
Messages
60 (0.01/day)
Which is why we need to wait and see what Apple offers in the MacPro, which is likely to be twice as powerful and be geared towards the upper end of the market. At this point, we're just comparing the M1 Ultra which is clearly the middle of the road, not their top of the line system. It's not far away, June 2022 is coming quickly.

If its twice as powerful then what is it going to cost? $8k, 10k? I assume the IO will be better. At least dual lan ports. HDMI 2.1 which the Mac Studio lacks. etc. At least you would hope so. But the more expensive the MAC Pro gets the higher end hardware it has to compete with.
 

studentrights

New Member
Joined
Mar 19, 2022
Messages
11 (0.01/day)
So it's newer than the 3990x which is 4 times faster?


Who cares? You said its comparing with the best and that is not true. It gets beat very very handily both in the CPU and in the GPU department even by mainstream parts, let alone the high end ones.
Exactly, you still need two chips. The M1 Ultra is all in one.
 
Joined
Mar 10, 2010
Messages
11,878 (2.21/day)
Location
Manchester uk
System Name RyzenGtEvo/ Asus strix scar II
Processor Amd R5 5900X/ Intel 8750H
Motherboard Crosshair hero8 impact/Asus
Cooling 360EK extreme rad+ 360$EK slim all push, cpu ek suprim Gpu full cover all EK
Memory Corsair Vengeance Rgb pro 3600cas14 16Gb in four sticks./16Gb/16GB
Video Card(s) Powercolour RX7900XT Reference/Rtx 2060
Storage Silicon power 2TB nvme/8Tb external/1Tb samsung Evo nvme 2Tb sata ssd/1Tb nvme
Display(s) Samsung UAE28"850R 4k freesync.dell shiter
Case Lianli 011 dynamic/strix scar2
Audio Device(s) Xfi creative 7.1 on board ,Yamaha dts av setup, corsair void pro headset
Power Supply corsair 1200Hxi/Asus stock
Mouse Roccat Kova/ Logitech G wireless
Keyboard Roccat Aimo 120
VR HMD Oculus rift
Software Win 10 Pro
Benchmark Scores 8726 vega 3dmark timespy/ laptop Timespy 6506
Exactly, you still need two chips. The M1 Ultra is all in one.
It's made out of many with two core complexes! And costs an arm and a leg.

So exactly what, are you on about what point was made.
 
Joined
Jun 14, 2020
Messages
3,474 (2.13/day)
System Name Mean machine
Processor 12900k
Motherboard MSI Unify X
Cooling Noctua U12A
Memory 7600c34
Video Card(s) 4090 Gamerock oc
Storage 980 pro 2tb
Display(s) Samsung crg90
Case Fractal Torent
Audio Device(s) Hifiman Arya / a30 - d30 pro stack
Power Supply Be quiet dark power pro 1200
Mouse Viper ultimate
Keyboard Blackwidow 65%
Exactly, you still need two chips. The M1 Ultra is all in one.
Sure, but i still dont get why that matters to you. When you are paying upwards or like 10k for a pc, its the performance you care about. And so when it comes to compute performance the m1 is terrible. Its good in 2-3 workloads that it has specialised hardware for it, but in everything else its kind of a dog.

If you want to compare it to some proper equipment, compare it to an A 6000 at fp64. Its how many times slower, 20? More than 20?
 
Joined
Dec 1, 2020
Messages
468 (0.32/day)
Processor Ryzen 5 7600X
Motherboard ASRock B650M PG Riptide
Cooling Noctua NH-D15
Memory DDR5 6000Mhz CL28 32GB
Video Card(s) Nvidia Geforce RTX 3070 Palit GamingPro OC
Storage Corsair MP600 Force Series Gen.4 1TB
Exactly, you still need two chips. The M1 Ultra is all in one.
Exactly this is the biggest disadvantage of m1u, you are locked to specific hardware, while with with x86 you can choose many fast GPUs with slow cpu with not that much cores, or the opposite, fast cpu with gtx 1050 or rx 560 just for display output, huge amount of cheap RAM and storage and upgrade path when and whatever you decide. You choose your hardware and software and you are not limited.
 
Joined
May 2, 2017
Messages
7,762 (2.80/day)
Location
Back in Norway
System Name Hotbox
Processor AMD Ryzen 7 5800X, 110/95/110, PBO +150Mhz, CO -7,-7,-20(x6),
Motherboard ASRock Phantom Gaming B550 ITX/ax
Cooling LOBO + Laing DDC 1T Plus PWM + Corsair XR5 280mm + 2x Arctic P14
Memory 32GB G.Skill FlareX 3200c14 @3800c15
Video Card(s) PowerColor Radeon 6900XT Liquid Devil Ultimate, UC@2250MHz max @~200W
Storage 2TB Adata SX8200 Pro
Display(s) Dell U2711 main, AOC 24P2C secondary
Case SSUPD Meshlicious
Audio Device(s) Optoma Nuforce μDAC 3
Power Supply Corsair SF750 Platinum
Mouse Logitech G603
Keyboard Keychron K3/Cooler Master MasterKeys Pro M w/DSA profile caps
Software Windows 10 Pro
Going back to Anandtech's SPEC testing of the M1 Max, I think there's a clue there as to the wildly divergent performance results across different benchmarks: reliance on integer vs. floating point math as well as memory bandwidth. In AT's testing, the M1 Max (8P+2E, 10t) outperformed the 8P, 16t Ryzen 7 5800X by 4.8% (53.38 vs. 50.98 points), but only delivered 64% of the 5950X's performance (83,13 points) in the SPECint suite. In SPECfp on the other hand, the M1 Max outperforms the 5800X by 72.1% (81.07 points vs. 47.1), and even trounces the 5950X by 25.9% (64.39 points). Apple's scores drop somewhat if the Icestorm efficiency cores are excluded, to 48.57 and 75.67 points, which puts it behind the 5800X in integer, but still ahead of the 5950X in floating point. AT notes that "The fp2017 suite has more workloads that are more memory-bound". In one SPECfp nT subtest, the M1 Max beats the Ryzen 9 5980HS (mobile, 35W, 8c16t) by a staggering 4.8 times. It's an outlier, but illustrates what can happen in a memory bound edge case.

Also worth noting: AT measures per-core memory access speeds for the M1 to be significantly lower than the total bandwidth of the memory interface. This is also true for other architectures (you need more than 1c to max out your bandwidth), but it makes direct bandwidth comparisons troublesome.

Still, this tells us several things:
- The Apple Firestorm cores (1t) are ever so slightly behind AMD's Zen 3 cores (2t) in integer workloads, but are almost matched core for core.
- The Apple Firestorm cores have a massive advantage over Zen3 in floating point workloads, at least those present in the SPEC suite, delivering more than 2x the performance per core. This has the caveat that these workloads are more memory bound.
- nT scaling is quite different across the architectures and workloads: the 5950X scales 10.87x from 1t to 32t in int and 5.3x in fp. The 5800X is sadly not in the 1t chart, but should be marginally slower than the 5950X (2-300MHz). If they were identical, its scaling would be 6.7x and 3.9x from 1t to 16t, so it's likely a tad higher than that. The M1 Max scales by 7.1x and 6.3x from 1t to 10t. Thread scaling comparisons are made difficult by Apple's big.little architecture and AMD's SMT, but it does seem that the higher memory bandwidth helps Apple scale better with additional cores to some extent (though this could also be affected by many other factors, including software). Or this could be formulated as AMD's MSDT platform being significantly held back by memory bandwidth.

How does this affect the discussions here? Well, it's well documented that Cinebench doesn't care much about memory bandwidth. It likes higher IF and memory clocks on AMD, but it scales poorly with more memory channels, meaning latency is more important than bandwidth. Other workloads are quite different - but they tend to be quite specialized. There are essentially no consumer workloads that are particularly bandwidth limited, with most caring more about compute power or latency.

Does this mean Cinebench is a more, or less, reliable benchmark? Depends what you're looking for. It's clear that Apple has designed the larger M1 chips for bandwidth-hungry applications (though the large GPU sharing it also needs that, of course). So, if your workloads are bound by memory bandwidth, the M1 Ultra and its siblings are likely to deliver staggeringly good performance and efficiency. If not? Then it's likely competitive with competing CPUs with a similar number of cores (not threads), but YMMV depending on the workload.

Does this mean Apple lied? Again, no. There's no doubt that the M1 Ultra is the most powerful chip for PCs ever made. Not CPU, not GPU, but in sum.

Sure, but i still dont get why that matters to you. When you are paying upwards or like 10k for a pc, its the performance you care about. And so when it comes to compute performance the m1 is terrible. Its good in 2-3 workloads that it has specialised hardware for it, but in everything else its kind of a dog.

If you want to compare it to some proper equipment, compare it to an A 6000 at fp64. Its how many times slower, 20? More than 20?
Exactly this is the biggest disadvantage of m1u, you are locked to specific hardware, while with with x86 you can choose many fast GPUs with slow cpu with not that much cores, or the opposite, fast cpu with gtx 1050 or rx 560 just for display output, huge amount of cheap RAM and storage and upgrade path when and whatever you decide. You choose your hardware and software and you are not limited.
Being a single SoC has some great advantages, particularly in efficiency (near zero interconnect power, far less embodied energy in duplicate componentry), but also to some extent in latency-sensitive performance scenarios. But as you say, it's also inflexible, and those advantages don't necessarily apply to all workloads. Both approaches have distinct pros and cons. You can't get past the fact that a tightly integrated package will always be more efficient than a collection of discrete components (as long as they are othewise comparable in terms of architectural efficiency). There's a reason why a 5W smartphone delivers a lot more than 1/20th the performance of a 100W laptop. That doesn't invalidate the value or performance of the laptop - after all, it delivers a degree of performance impossible in a smartphone.

As for that fp64 comparison: according to this source the M1 GPU arch has 1/4 speed fp64, so it should deliver ~2.6TF/s of FP64 (assuming 10.4TF FP32 numbers online are accurate). For comparison, Ampere has 1/32 speed FP64, and delivers ~1.21TF/s of FP64. Not that this necessarily matters much: the reason Ampere performs poorly in FP64 is that double precision floating point math has become ever more of a niche application, and Nvidia thus doesn't prioritize it whatsoever outside of their datacenter GA100 chips, which have 1:2 FP64 (and the A100 80GB SXM4 delivers ~9.746TF/s of FP64). As such, it seems that Apple cares more about FP64 than Nvidia does outside of datacenters, likely indicating that pro applications common on MacOS tend to use that a bit more than most PC applications (though it might also just be a small but profitable subset of apps, wanting to cater to a niche that pays well).
 
Joined
Jan 8, 2017
Messages
9,438 (3.27/day)
System Name Good enough
Processor AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard ASRock B650 Pro RS
Cooling 2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory 32GB - FURY Beast RGB 5600 Mhz
Video Card(s) Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage 1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s) LG UltraGear 32GN650-B + 4K Samsung TV
Case Phanteks NV7
Power Supply GPS-750C
Apple based their GPU claims off of the Media Engine, which soundly destroys a 3090. People aren't buying a Mac Studio for gaming.
That still an outright lie because the Media Engine is not the GPU.
 
Joined
May 2, 2017
Messages
7,762 (2.80/day)
Location
Back in Norway
System Name Hotbox
Processor AMD Ryzen 7 5800X, 110/95/110, PBO +150Mhz, CO -7,-7,-20(x6),
Motherboard ASRock Phantom Gaming B550 ITX/ax
Cooling LOBO + Laing DDC 1T Plus PWM + Corsair XR5 280mm + 2x Arctic P14
Memory 32GB G.Skill FlareX 3200c14 @3800c15
Video Card(s) PowerColor Radeon 6900XT Liquid Devil Ultimate, UC@2250MHz max @~200W
Storage 2TB Adata SX8200 Pro
Display(s) Dell U2711 main, AOC 24P2C secondary
Case SSUPD Meshlicious
Audio Device(s) Optoma Nuforce μDAC 3
Power Supply Corsair SF750 Platinum
Mouse Logitech G603
Keyboard Keychron K3/Cooler Master MasterKeys Pro M w/DSA profile caps
Software Windows 10 Pro
That still an outright lie because the Media Engine is not the GPU.
I don't think they did base the comparisons on that though. They don't claim that the ME is part of the GPU, after all, they present it (and its specs) entirely separately, just as with the Neural Engine.

It's possible they used some form of benchmark that could partially make use of the ME - say, a video editing timeline performance benchmark with various filters applied, where the video playback itself is accelerated through the ME - but that's rather unlikely (and still places most of the workload on the GPU, not the ME).
 
Joined
Jun 14, 2020
Messages
3,474 (2.13/day)
System Name Mean machine
Processor 12900k
Motherboard MSI Unify X
Cooling Noctua U12A
Memory 7600c34
Video Card(s) 4090 Gamerock oc
Storage 980 pro 2tb
Display(s) Samsung crg90
Case Fractal Torent
Audio Device(s) Hifiman Arya / a30 - d30 pro stack
Power Supply Be quiet dark power pro 1200
Mouse Viper ultimate
Keyboard Blackwidow 65%
Going back to Anandtech's SPEC testing of the M1 Max, I think there's a clue there as to the wildly divergent performance results across different benchmarks: reliance on integer vs. floating point math as well as memory bandwidth. In AT's testing, the M1 Max (8P+2E, 10t) outperformed the 8P, 16t Ryzen 7 5800X by 4.8% (53.38 vs. 50.98 points), but only delivered 64% of the 5950X's performance (83,13 points) in the SPECint suite. In SPECfp on the other hand, the M1 Max outperforms the 5800X by 72.1% (81.07 points vs. 47.1), and even trounces the 5950X by 25.9% (64.39 points). Apple's scores drop somewhat if the Icestorm efficiency cores are excluded, to 48.57 and 75.67 points, which puts it behind the 5800X in integer, but still ahead of the 5950X in floating point. AT notes that "The fp2017 suite has more workloads that are more memory-bound". In one SPECfp nT subtest, the M1 Max beats the Ryzen 9 5980HS (mobile, 35W, 8c16t) by a staggering 4.8 times. It's an outlier, but illustrates what can happen in a memory bound edge case.

Also worth noting: AT measures per-core memory access speeds for the M1 to be significantly lower than the total bandwidth of the memory interface. This is also true for other architectures (you need more than 1c to max out your bandwidth), but it makes direct bandwidth comparisons troublesome.

Still, this tells us several things:
- The Apple Firestorm cores (1t) are ever so slightly behind AMD's Zen 3 cores (2t) in integer workloads, but are almost matched core for core.
- The Apple Firestorm cores have a massive advantage over Zen3 in floating point workloads, at least those present in the SPEC suite, delivering more than 2x the performance per core. This has the caveat that these workloads are more memory bound.
- nT scaling is quite different across the architectures and workloads: the 5950X scales 10.87x from 1t to 32t in int and 5.3x in fp. The 5800X is sadly not in the 1t chart, but should be marginally slower than the 5950X (2-300MHz). If they were identical, its scaling would be 6.7x and 3.9x from 1t to 16t, so it's likely a tad higher than that. The M1 Max scales by 7.1x and 6.3x from 1t to 10t. Thread scaling comparisons are made difficult by Apple's big.little architecture and AMD's SMT, but it does seem that the higher memory bandwidth helps Apple scale better with additional cores to some extent (though this could also be affected by many other factors, including software). Or this could be formulated as AMD's MSDT platform being significantly held back by memory bandwidth.

How does this affect the discussions here? Well, it's well documented that Cinebench doesn't care much about memory bandwidth. It likes higher IF and memory clocks on AMD, but it scales poorly with more memory channels, meaning latency is more important than bandwidth. Other workloads are quite different - but they tend to be quite specialized. There are essentially no consumer workloads that are particularly bandwidth limited, with most caring more about compute power or latency.

Does this mean Cinebench is a more, or less, reliable benchmark? Depends what you're looking for. It's clear that Apple has designed the larger M1 chips for bandwidth-hungry applications (though the large GPU sharing it also needs that, of course). So, if your workloads are bound by memory bandwidth, the M1 Ultra and its siblings are likely to deliver staggeringly

Does this mean Apple lied? Again, no. There's no doubt that the M1 Ultra is the most powerful chip for PCs ever made. Not CPU, not GPU, but in sum.

Going back to Anandtech's SPEC testing of the M1 Max, I think there's a clue there as to the wildly divergent performance results across different benchmarks: reliance on integer vs. floating point math as well as memory bandwidth. In AT's testing, the M1 Max (8P+2E, 10t) outperformed the 8P, 16t Ryzen 7 5800X by 4.8% (53.38 vs. 50.98 points), but only delivered 64% of the 5950X's performance (83,13 points) in the SPECint suite. In SPECfp on the other hand, the M1 Max outperforms the 5800X by 72.1% (81.07 points vs. 47.1), and even trounces the 5950X by 25.9% (64.39 points). Apple's scores drop somewhat if the Icestorm efficiency cores are excluded, to 48.57 and 75.67 points, which puts it behind the 5800X in integer, but still ahead of the 5950X in floating point. AT notes that "The fp2017 suite has more workloads that are more memory-bound". In one SPECfp nT subtest, the M1 Max beats the Ryzen 9 5980HS (mobile, 35W, 8c16t) by a staggering 4.8 times. It's an outlier, but illustrates what can happen in a memory bound edge case.

Also worth noting: AT measures per-core memory access speeds for the M1 to be significantly lower than the total bandwidth of the memory interface. This is also true for other architectures (you need more than 1c to max out your bandwidth), but it makes direct bandwidth comparisons troublesome.

Still, this tells us several things:
- The Apple Firestorm cores (1t) are ever so slightly behind AMD's Zen 3 cores (2t) in integer workloads, but are almost matched core for core.
- The Apple Firestorm cores have a massive advantage over Zen3 in floating point workloads, at least those present in the SPEC suite, delivering more than 2x the performance per core. This has the caveat that these workloads are more memory bound.
- nT scaling is quite different across the architectures and workloads: the 5950X scales 10.87x from 1t to 32t in int and 5.3x in fp. The 5800X is sadly not in the 1t chart, but should be marginally slower than the 5950X (2-300MHz). If they were identical, its scaling would be 6.7x and 3.9x from 1t to 16t, so it's likely a tad higher than that. The M1 Max scales by 7.1x and 6.3x from 1t to 10t. Thread scaling comparisons are made difficult by Apple's big.little architecture and AMD's SMT, but it does seem that the higher memory bandwidth helps Apple scale better with additional cores to some extent (though this could also be affected by many other factors, including software). Or this could be formulated as AMD's MSDT platform being significantly held back by memory bandwidth.

How does this affect the discussions here? Well, it's well documented that Cinebench doesn't care much about memory bandwidth. It likes higher IF and memory clocks on AMD, but it scales poorly with more memory channels, meaning latency is more important than bandwidth. Other workloads are quite different - but they tend to be quite specialized. There are essentially no consumer workloads that are particularly bandwidth limited, with most caring more about compute power or latency.

Does this mean Cinebench is a more, or less, reliable benchmark? Depends what you're looking for. It's clear that Apple has designed the larger M1 chips for bandwidth-hungry applications (though the large GPU sharing it also needs that, of course). So, if your workloads are bound by memory bandwidth, the M1 Ultra and its siblings are likely to deliver staggeringly good performance and efficiency. If not? Then it's likely competitive with competing CPUs with a similar number of cores (not threads), but YMMV depending on the workload.

Does this mean Apple lied? Again, no. There's no doubt that the M1 Ultra is the most powerful chip for PCs ever made. Not CPU, not GPU, but in sum.



Being a single SoC has some great advantages, particularly in efficiency (near zero interconnect power, far less embodied energy in duplicate componentry), but also to some extent in latency-sensitive performance scenarios. But as you say, it's also inflexible, and those advantages don't necessarily apply to all workloads. Both approaches have distinct pros and cons. You can't get past the fact that a tightly integrated package will always be more efficient than a collection of discrete components (as long as they are othewise comparable in terms of architectural efficiency). There's a reason why a 5W smartphone delivers a lot more than 1/20th the performance of a 100W laptop. That doesn't invalidate the value or performance of the laptop - after all, it delivers a degree of performance impossible in a smartphone.

As for that fp64 comparison: according to this source the M1 GPU arch has 1/4 speed fp64, so it should deliver ~2.6TF/s of FP64 (assuming 10.4TF FP32 numbers online are accurate). For comparison, Ampere has 1/32 speed FP64, and delivers ~1.21TF/s of FP64. Not that this necessarily matters much: the reason Ampere performs poorly in FP64 is that double precision floating point math has become ever more of a niche application, and Nvidia thus doesn't prioritize it whatsoever outside of their datacenter GA100 chips, which have 1:2 FP64 (and the A100 80GB SXM4 delivers ~9.746TF/s of FP64). As such, it seems that Apple cares more about FP64 than Nvidia does outside of datacenters, likely indicating that pro applications common on MacOS tend to use that a bit more than most PC applications (though it might also just be a small but profitable subset of apps, wanting to cater to a niche that pays well).
What you are saying might be true but the benefits of a single chip can be actually measured in the benchmarks. Being a single chip by itself is irrelevant, whats relevant is the performance and the efficiency.
 
Joined
May 2, 2017
Messages
7,762 (2.80/day)
Location
Back in Norway
System Name Hotbox
Processor AMD Ryzen 7 5800X, 110/95/110, PBO +150Mhz, CO -7,-7,-20(x6),
Motherboard ASRock Phantom Gaming B550 ITX/ax
Cooling LOBO + Laing DDC 1T Plus PWM + Corsair XR5 280mm + 2x Arctic P14
Memory 32GB G.Skill FlareX 3200c14 @3800c15
Video Card(s) PowerColor Radeon 6900XT Liquid Devil Ultimate, UC@2250MHz max @~200W
Storage 2TB Adata SX8200 Pro
Display(s) Dell U2711 main, AOC 24P2C secondary
Case SSUPD Meshlicious
Audio Device(s) Optoma Nuforce μDAC 3
Power Supply Corsair SF750 Platinum
Mouse Logitech G603
Keyboard Keychron K3/Cooler Master MasterKeys Pro M w/DSA profile caps
Software Windows 10 Pro
What you are saying might be true but the benefits of a single chip can be actually measured in the benchmarks. Being a single chip by itself is irrelevant, whats relevant is the performance and the efficiency.
That's true - and that's why Apple is managing performance that, depending on the workload rivals 150W (5950X)-250W (12900K)-280W (3970X) CPUs and 250W+ (3070 or higher) GPUs in a single package that never exceeds 200W of power draw in any workload, and mostly stays well below that, with the CPU peaking at ~60W of power draw. Are those higher power CPUs faster in some or even most workloads? Yep. Can you build a more powerful workstation with a high end TR chip and one or more high end GPUs? Again, yep. But when one takes into consideration that the CPUs consume 2-3-4-5x the power, and the GPUs also quite a bit more, that starts looking far less impressive. And that is of course not to say that AMD or Intel couldn't make a similarly efficient design if they really wanted to, but they haven't so far. (Which is understandable: they need to sell their chips to (very conservative) third party OEMs and convince those the chips are worth implementing, so pricing and production costs matter a lot more. The failure of KBL-G illustrates how difficult it can be to get an innovative, good chip design adopted by OEMs.)

We still don't have sufficient in-depth benchmarks to really conclude about anything - and there seems to be a lack of good cross-platform benchmarks in general, which makes comparisons difficult. But from what we've seen so far, Apple's framing seems like typical PR: grounded in some specific truth, but highly specific and borderline misleading unless you really pay attention. Their headline statements are still true - this is the most powerful chip ever in a PC - but only when you pay attention to what they're actually saying, rather than what they are (very strongly) insinuating. Which just tells us that Apple has a very good marketing team.
 
Joined
Jan 8, 2017
Messages
9,438 (3.27/day)
System Name Good enough
Processor AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard ASRock B650 Pro RS
Cooling 2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory 32GB - FURY Beast RGB 5600 Mhz
Video Card(s) Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage 1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s) LG UltraGear 32GN650-B + 4K Samsung TV
Case Phanteks NV7
Power Supply GPS-750C
I don't think they did base the comparisons on that though. They don't claim that the ME is part of the GPU, after all, they present it (and its specs) entirely separately, just as with the Neural Engine.

It's possible they used some form of benchmark that could partially make use of the ME - say, a video editing timeline performance benchmark with various filters applied, where the video playback itself is accelerated through the ME - but that's rather unlikely (and still places most of the workload on the GPU, not the ME).

Whatever they're using, the point that they're trying to make with these annoyingly difficult to understand charts is that their GPUs are more power efficient at the same performance level. Which I am sure is true but that's not exactly an amazing achievement as we all know that desktop GPUs are clocked way out of their optimum power curve. I am convinced that an underclocked and undervolted 3090 would not only still outperform their GPUs in terms of raw performance but also in terms of efficiency because they're still not even comparable in terms of area and transistor count.
 
Joined
May 2, 2017
Messages
7,762 (2.80/day)
Location
Back in Norway
System Name Hotbox
Processor AMD Ryzen 7 5800X, 110/95/110, PBO +150Mhz, CO -7,-7,-20(x6),
Motherboard ASRock Phantom Gaming B550 ITX/ax
Cooling LOBO + Laing DDC 1T Plus PWM + Corsair XR5 280mm + 2x Arctic P14
Memory 32GB G.Skill FlareX 3200c14 @3800c15
Video Card(s) PowerColor Radeon 6900XT Liquid Devil Ultimate, UC@2250MHz max @~200W
Storage 2TB Adata SX8200 Pro
Display(s) Dell U2711 main, AOC 24P2C secondary
Case SSUPD Meshlicious
Audio Device(s) Optoma Nuforce μDAC 3
Power Supply Corsair SF750 Platinum
Mouse Logitech G603
Keyboard Keychron K3/Cooler Master MasterKeys Pro M w/DSA profile caps
Software Windows 10 Pro
Whatever they're using, the point that they're trying to make with these annoyingly difficult to understand charts is that their GPUs are more power efficient at the same performance level. Which I am sure is true but that's not exactly an amazing achievement as we all know that desktop GPUs are clocked way out of their optimum power curve. I am convinced that an underclocked and undervolted 3090 would not only still outperform their GPUs in terms of raw performance but also in terms of efficiency because they're still not even comparable in terms of area and transistor count.
You're not wrong there, but that doesn't make their advantage any more real. It just demonstrates the differences of operating in a wholly self-contained hardware and software ecosystem vs. operating in a (somewhat) competitive DIY space. Both Nvidia and AMD's workstation and enterprise parts tend to be clocked a bit more reasonably, but even there due to the competitive market they still aim to max out however much power can reasonably be cooled in whatever form factor they're aiming for.

I'm sure a well tuned 3090 could bring these power comparisons at iso performance much closer, but I kind of doubt they'd be able to overtake Apple - though making these cross-architectural comparisons without anything resembling good benchmarking at these power levels is of course a major challenge, rendering this just loose speculation. Still, Nvidia is on a relatively inefficient node, while Apple uses the best available, and a low power version of it to boot. It's not quite the same as the 5nm node other mobile customers use, but they're clearly not using high performance libraries here - clocks are too low and density is too high for that. The question then becomes whether Apple's architecture is sufficiently behind Nvidia's to nullify the effects of their near 2-node advantage.

Of course that's also a part of why Nvidia is struggling with efficiency this generation - being stuck on an underperforming node isn't helping them compete with a resurgent AMD, who used to be miles behind in efficiency but is now beating them. AMD also has a node advantage of course, but the efficiency improvements of RDNA and RDNA2 seem to have caught Nvidia somewhat unaware, forcing them to push clocks higher than what their node can really comfortably handle.

As for area and transistor count, it's of course difficult to tell just how large the GPU (and other parts matching hardware present on an Nvidia GPU, especially once you start factoring in memory interfaces, caches, etc.) on this is, but it seems reasonable to estimate that it (including memory interfaces and caches) covers ~50% of the die area, or a bit less. Accounting for the density advantage of TSMC's custom Apple 5nm process vs. Samsung 8nm, that likely means the GPU is comparable to the 628mm² GA102 - though that doesn't really add up in terms of transistor counts, seeing how even even the M1 Max has 2x the transistor count of GA102. Of course transistor counts discussed on this level are kind of arbitrary, and mainly speak to architectural and node differences, but it still seems reasonable to assume that the M1 Ultra has a GPU that hardware-wise is in the same ballpark as the 3090 - but clocked much lower.

The M1 Max was rated at 10.4TF/s FP32, and assuming the same clock speeds on the Ultra, that's a 20.8TF/s GPU - that's half the FP32 performance estimated for the 3090 Ti, but that's also a bit of a skewed comparison given how consumer Ampere doubled its theoretical FP32 output through implementing dual-mode ALUs. Which also means that you don't always get that level of performance, but anything from half to full depending on the composition of your workload. Of course we don't really know anything about Apple's GPU architecture, but given that it originated from an Imagination design, and the fact that it has a relatively high FP64-to-FP32 ratio, it's pretty safe to assume it is overall more similar to pre-Ampere Nvidia architectures or the GA100. (For reference, the compute-focused GA100 doesn't have the dual-mode ALUs and thus seemingly has fewer shading units despite being a much larger GPU.)

Given all of that, you're probably right that this is comparable to a low-clocked 3090 in many ways (though on a much more efficent node). Though given the quirks of architecture, OS, APIs and software, this comparison will likely skew wildly across various workloads.
 
Joined
Aug 14, 2017
Messages
359 (0.13/day)
Location
Edge of the Void
System Name Serious Series - Serious Server (99.99%)
Processor 4x Intel Xeon E7-8870's
Motherboard HP 512843-001/591196-001 (rev 0B) + 588137-B21/591205-001
Cooling HP ProLiant OEM cooling fans(s) + heatsinks
Memory 256GB (64x4GB) DDR3-1333 PC3-10600R ECC
Video Card(s) AMD FirePro S9300 X2 + nVIDIA GeForce GTX Titan Xp
Storage 1x HGST HUSMM8040ASS200 + 4x HP 507127-B21's + 1x WD Blue 3D NAND 500GB + 1x Intel SSDSA2CW600G3
Display(s) Samsung ViewFinity S70A UHD 32" (S32A700)
Case HP ProLiant DL580 G7 chassis
Audio Device(s) 1x Creative Sound Blaster Audigy Rx
Power Supply 4x HP 441830-001/438203-001's (1200W PSU's)
Mouse Dell MS819
Keyboard Logitech K845 (Cherry MX Blue)
VR HMD N/a
Software VMware ESXi 6.5u3 Enterprise Plus (VM: Windows 10 Enterprise LTSC)
Benchmark Scores 3DMark won't let me post my scores publicly at this time...
Something tells me that Apple only made the comparison to the RTX 3090 for clicks. They only beat/compete with that card in select workloads. Anyone with a brain can tell it was a bit of a reach on their part, seeing that the two components are in completely different weight classes (in terms of power draw, efficiency, and primary focus). They wanted to be noticed, and they may have just gotten it...
 
Joined
Aug 14, 2017
Messages
359 (0.13/day)
Location
Edge of the Void
System Name Serious Series - Serious Server (99.99%)
Processor 4x Intel Xeon E7-8870's
Motherboard HP 512843-001/591196-001 (rev 0B) + 588137-B21/591205-001
Cooling HP ProLiant OEM cooling fans(s) + heatsinks
Memory 256GB (64x4GB) DDR3-1333 PC3-10600R ECC
Video Card(s) AMD FirePro S9300 X2 + nVIDIA GeForce GTX Titan Xp
Storage 1x HGST HUSMM8040ASS200 + 4x HP 507127-B21's + 1x WD Blue 3D NAND 500GB + 1x Intel SSDSA2CW600G3
Display(s) Samsung ViewFinity S70A UHD 32" (S32A700)
Case HP ProLiant DL580 G7 chassis
Audio Device(s) 1x Creative Sound Blaster Audigy Rx
Power Supply 4x HP 441830-001/438203-001's (1200W PSU's)
Mouse Dell MS819
Keyboard Logitech K845 (Cherry MX Blue)
VR HMD N/a
Software VMware ESXi 6.5u3 Enterprise Plus (VM: Windows 10 Enterprise LTSC)
Benchmark Scores 3DMark won't let me post my scores publicly at this time...
Oh they got noticed, but not in a good way. Many people were not impressed by the blatant BS they were shoveling.
And what of the laymen who don't know how to vet these claims? Perhaps Apple wasn't aiming directly at the tech-literate audience, but instead intending to use the PR to advertise the new M1 products to those who aren't so technically inclined.
 
Top