Apple's Graphics Performance Claims Proven Exaggerated by Mac Studio Reviews

studentrights · Mar 20, 2022

fevgatos said:
Im the one thats saying it and yes, its true. It gets around 44k stock and 49k overclocked. It's on a whole different level

I found the test which matches your claim. If the rumor that the MacPro M1 chip will double the Ultra's chip performance. It should be able to compete with any PC CPU or GPU from INTEL, AMD or NVIDA. We'll see this June. Of course, this will all be accomplished with a fraction of the power consumption. Pretty impressive feat for Apple, being able to hit the top of the pack with their 1st series of Apple silicon chips, all on a single SOC. Can't wait for the M2.

usiname · Mar 20, 2022

studentrights said:
I found the test which matches your claim. If the rumor that the MacPro M1 chip will double the Ultra's chip performance. It should be able to compete with any PC CPU or GPU from INTEL, AMD or NVIDA. We'll see this June. Of course, this will all be accomplished with a fraction of the power consumption. Pretty impressive feat for Apple, being able to hit the top of the pack with their 1st series of Apple silicon chips, all on a single SOC. Can't wait for the M2.

With any PC CPU or GPU? Are you aware that 3970 is 32core Zen2 and there are 64core 3990x and maybe by june Zen3 64core? Even if they double the GPU in june, the next gen AMD and NVIDIA will come in few months and will wipe the floor with whatever Apple bring in June

JustBenching · Mar 20, 2022

studentrights said:
I found the test which matches your claim. If the rumor that the MacPro M1 chip will double the Ultra's chip performance. It should be able to compete with any PC CPU or GPU from INTEL, AMD or NVIDA. We'll see this June. Of course, this will all be accomplished with a fraction of the power consumption. Pretty impressive feat for Apple, being able to hit the top of the pack with their 1st series of Apple silicon chips, all on a single SOC. Can't wait for the M2.

Uhm, nope? The 3970x is now a 3year old CPU that wasn't even the high end back when it was released. There are higher end Threadrippers, and the new gen is supposedly coming out soon. For example the 3990x is double as fast as the 3970x.

In terms of CPU power the m1 ultra is still competing - and losing, with mainstream x86 cpus from last gen. Meteorlake and zen 4 is due to release soon that will be on a completely different level as well. Saying that apple competes with the top of the stack is just a joke, it competes with the mainstream parts, and it doesn't even beat those

studentrights · Mar 20, 2022

usiname said:
With any PC CPU or GPU? Are you aware that 3970 is 32core Zen2 and there are 64core 3990x and maybe by june Zen3 64core? Even if they double the GPU in june, the next gen AMD and NVIDIA will come in few months and will wipe the floor with whatever Apple bring in June

But it's still a narrow advantage since the chip cannot directly compete with the M1s overall capability. The reality is that outside of narrow use cases, overall the M1 is easily the best overall chip, since you'll need to two separate CPUs and GPUs to match that overall capability. I guess we'll see in June. What's clear is that Apple has gone from iPhone chips to match or best high end PC chips in their 1st generation SOCs. More importantly, in mobile they are unmatched whether that is smartphone performance or laptops, where power consumption and weight matter. The MacBook Pros can run unthrottled with the exactly same scores of the MacStudio with the M1 Max chips. INTEL and AMD lost any real laptop advantage.

JustBenching · Mar 20, 2022

studentrights said:
But it's still a narrow advantage since the chip cannot directly compete with the M1s overall capability. The reality is that outside of narrow use cases, overall the M1 is easily the best overall chip, since you'll need to two separate CPUs and GPUs to match that overall capability. I guess we'll see in June. What's clear is that Apple has gone from iPhone chips to match or best high end PC chips in their 1st generation SOCs. More importantly, in mobile they are unmatched whether that is smartphone performance or laptops, where power consumption and weight matter. The MacBook Pros can run unthrottled with the exactly same scores of the MacStudio with the M1 Max chips. INTEL and AMD lost any real laptop advantage.

What do you mean you need two separate CPUs? The 3990x is a single CPU and its 4 times faster than the m1 ultra.

studentrights · Mar 20, 2022

fevgatos said:
Uhm, nope? The 3970x is now a 3year old CPU that wasn't even the high end back when it was released. There are higher end Threadrippers, and the new gen is supposedly coming out soon. For example the 3990x is double as fast as the 3970x.

In terms of CPU power the m1 ultra is still competing - and losing, with mainstream x86 cpus from last gen. Meteorlake and zen 4 is due to release soon that will be on a completely different level as well. Saying that apple competes with the top of the stack is just a joke, it competes with the mainstream parts, and it doesn't even beat those

The M1 architecture is already two years old. All they've really been doing in doubling and quadrupling the chips. The M2 is just around the corner.

fevgatos said:
What do you mean you need two separate CPUs? The 3990x is a single CPU and its 4 times faster than the m1 ultra.

No, you missed the point.

fevgatos said:
What do you mean you need two separate CPUs? The 3990x is a single CPU and its 4 times faster than the m1 ultra.

How does the 3990x compare against NVIDA GPUs? As favorable as the M1 Ultra?

JustBenching · Mar 20, 2022

studentrights said:
The M1 architecture is already two years old. All they've really been doing in doubling and quadrupling the chips. The M2 is just around the corner.

So it's newer than the 3990x which is 4 times faster?

studentrights said:
How does the 3990x compare against NVIDA GPUs? As favorable as the M1 Ultra?

Who cares? You said its comparing with the best and that is not true. It gets beat very very handily both in the CPU and in the GPU department even by mainstream parts, let alone the high end ones.

usiname · Mar 20, 2022

studentrights said:
But it's still a narrow advantage since the chip cannot directly compete with the M1s overall capability. The reality is that outside of narrow use cases, overall the M1 is easily the best overall chip, since you'll need to two separate CPUs and GPUs to match that overall capability. I guess we'll see in June. What's clear is that Apple has gone from iPhone chips to match or best high end PC chips in their 1st generation SOCs. More importantly, in mobile they are unmatched whether that is smartphone performance or laptops, where power consumption and weight matter. The MacBook Pros can run unthrottled with the exactly same scores of the MacStudio with the M1 Max chips. INTEL and AMD lost any real laptop advantage.

Can't you check the benchmarks? 3970x is 2 times faster than M1U, 3990x is 3 or even more times faster. This are not ARM toys, you don't need 2 CPUs, because you are already destroying whatever apple will bring in the next 3-4 years. Also you are really deluded if you think this is their first try and they haven't experience with SOCs. Also don't speak nonsense for the mobile market, 6900HS and 12900HK already smashed M1 max. And I will remind you once again, don't even dare to speak for advantage when speak for this toys with few use cases where they are actually better, because for each case where apple is better, there are 10 where x86 is WAY better

Nike_486DX · Mar 20, 2022

Well m1 max sounds attractive considering that where i live for $1000 you can only get a 3060 lol. And m1 ultra is excessive and will probably remain unoptimized (considering that m1 ultra is 2x m1 max glued together).

95Viper · Mar 20, 2022

Discuss the topic.
Do not start insulting... follow the guidelines/rules.

dlgh7 · Mar 20, 2022

studentrights said:
Which is why we need to wait and see what Apple offers in the MacPro, which is likely to be twice as powerful and be geared towards the upper end of the market. At this point, we're just comparing the M1 Ultra which is clearly the middle of the road, not their top of the line system. It's not far away, June 2022 is coming quickly.

If its twice as powerful then what is it going to cost? $8k, 10k? I assume the IO will be better. At least dual lan ports. HDMI 2.1 which the Mac Studio lacks. etc. At least you would hope so. But the more expensive the MAC Pro gets the higher end hardware it has to compete with.

studentrights · Mar 21, 2022

fevgatos said:
So it's newer than the 3990x which is 4 times faster?

Who cares? You said its comparing with the best and that is not true. It gets beat very very handily both in the CPU and in the GPU department even by mainstream parts, let alone the high end ones.

Exactly, you still need two chips. The M1 Ultra is all in one.

TheoneandonlyMrK · Mar 21, 2022

studentrights said:
Exactly, you still need two chips. The M1 Ultra is all in one.

It's made out of many with two core complexes! And costs an arm and a leg.

So exactly what, are you on about what point was made.

JustBenching · Mar 21, 2022

studentrights said:
Exactly, you still need two chips. The M1 Ultra is all in one.

Sure, but i still dont get why that matters to you. When you are paying upwards or like 10k for a pc, its the performance you care about. And so when it comes to compute performance the m1 is terrible. Its good in 2-3 workloads that it has specialised hardware for it, but in everything else its kind of a dog.

If you want to compare it to some proper equipment, compare it to an A 6000 at fp64. Its how many times slower, 20? More than 20?

usiname · Mar 21, 2022

studentrights said:
Exactly, you still need two chips. The M1 Ultra is all in one.

Exactly this is the biggest disadvantage of m1u, you are locked to specific hardware, while with with x86 you can choose many fast GPUs with slow cpu with not that much cores, or the opposite, fast cpu with gtx 1050 or rx 560 just for display output, huge amount of cheap RAM and storage and upgrade path when and whatever you decide. You choose your hardware and software and you are not limited.

Valantar · Mar 21, 2022

Going back to Anandtech's SPEC testing of the M1 Max, I think there's a clue there as to the wildly divergent performance results across different benchmarks: reliance on integer vs. floating point math as well as memory bandwidth. In AT's testing, the M1 Max (8P+2E, 10t) outperformed the 8P, 16t Ryzen 7 5800X by 4.8% (53.38 vs. 50.98 points), but only delivered 64% of the 5950X's performance (83,13 points) in the SPECint suite. In SPECfp on the other hand, the M1 Max outperforms the 5800X by 72.1% (81.07 points vs. 47.1), and even trounces the 5950X by 25.9% (64.39 points). Apple's scores drop somewhat if the Icestorm efficiency cores are excluded, to 48.57 and 75.67 points, which puts it behind the 5800X in integer, but still ahead of the 5950X in floating point. AT notes that "The fp2017 suite has more workloads that are more memory-bound". In one SPECfp nT subtest, the M1 Max beats the Ryzen 9 5980HS (mobile, 35W, 8c16t) by a staggering 4.8 times. It's an outlier, but illustrates what can happen in a memory bound edge case.

Also worth noting: AT measures per-core memory access speeds for the M1 to be significantly lower than the total bandwidth of the memory interface. This is also true for other architectures (you need more than 1c to max out your bandwidth), but it makes direct bandwidth comparisons troublesome.

Still, this tells us several things:
- The Apple Firestorm cores (1t) are ever so slightly behind AMD's Zen 3 cores (2t) in integer workloads, but are almost matched core for core.
- The Apple Firestorm cores have a massive advantage over Zen3 in floating point workloads, at least those present in the SPEC suite, delivering more than 2x the performance per core. This has the caveat that these workloads are more memory bound.
- nT scaling is quite different across the architectures and workloads: the 5950X scales 10.87x from 1t to 32t in int and 5.3x in fp. The 5800X is sadly not in the 1t chart, but should be marginally slower than the 5950X (2-300MHz). If they were identical, its scaling would be 6.7x and 3.9x from 1t to 16t, so it's likely a tad higher than that. The M1 Max scales by 7.1x and 6.3x from 1t to 10t. Thread scaling comparisons are made difficult by Apple's big.little architecture and AMD's SMT, but it does seem that the higher memory bandwidth helps Apple scale better with additional cores to some extent (though this could also be affected by many other factors, including software). Or this could be formulated as AMD's MSDT platform being significantly held back by memory bandwidth.

How does this affect the discussions here? Well, it's well documented that Cinebench doesn't care much about memory bandwidth. It likes higher IF and memory clocks on AMD, but it scales poorly with more memory channels, meaning latency is more important than bandwidth. Other workloads are quite different - but they tend to be quite specialized. There are essentially no consumer workloads that are particularly bandwidth limited, with most caring more about compute power or latency.

Does this mean Cinebench is a more, or less, reliable benchmark? Depends what you're looking for. It's clear that Apple has designed the larger M1 chips for bandwidth-hungry applications (though the large GPU sharing it also needs that, of course). So, if your workloads are bound by memory bandwidth, the M1 Ultra and its siblings are likely to deliver staggeringly good performance and efficiency. If not? Then it's likely competitive with competing CPUs with a similar number of cores (not threads), but YMMV depending on the workload.

Does this mean Apple lied? Again, no. There's no doubt that the M1 Ultra is the most powerful chip for PCs ever made. Not CPU, not GPU, but in sum.

fevgatos said:
Sure, but i still dont get why that matters to you. When you are paying upwards or like 10k for a pc, its the performance you care about. And so when it comes to compute performance the m1 is terrible. Its good in 2-3 workloads that it has specialised hardware for it, but in everything else its kind of a dog.

If you want to compare it to some proper equipment, compare it to an A 6000 at fp64. Its how many times slower, 20? More than 20?

usiname said:
Exactly this is the biggest disadvantage of m1u, you are locked to specific hardware, while with with x86 you can choose many fast GPUs with slow cpu with not that much cores, or the opposite, fast cpu with gtx 1050 or rx 560 just for display output, huge amount of cheap RAM and storage and upgrade path when and whatever you decide. You choose your hardware and software and you are not limited.

Being a single SoC has some great advantages, particularly in efficiency (near zero interconnect power, far less embodied energy in duplicate componentry), but also to some extent in latency-sensitive performance scenarios. But as you say, it's also inflexible, and those advantages don't necessarily apply to all workloads. Both approaches have distinct pros and cons. You can't get past the fact that a tightly integrated package will always be more efficient than a collection of discrete components (as long as they are othewise comparable in terms of architectural efficiency). There's a reason why a 5W smartphone delivers a lot more than 1/20th the performance of a 100W laptop. That doesn't invalidate the value or performance of the laptop - after all, it delivers a degree of performance impossible in a smartphone.

As for that fp64 comparison: according to this source the M1 GPU arch has 1/4 speed fp64, so it should deliver ~2.6TF/s of FP64 (assuming 10.4TF FP32 numbers online are accurate). For comparison, Ampere has 1/32 speed FP64, and delivers ~1.21TF/s of FP64. Not that this necessarily matters much: the reason Ampere performs poorly in FP64 is that double precision floating point math has become ever more of a niche application, and Nvidia thus doesn't prioritize it whatsoever outside of their datacenter GA100 chips, which have 1:2 FP64 (and the A100 80GB SXM4 delivers ~9.746TF/s of FP64). As such, it seems that Apple cares more about FP64 than Nvidia does outside of datacenters, likely indicating that pro applications common on MacOS tend to use that a bit more than most PC applications (though it might also just be a small but profitable subset of apps, wanting to cater to a niche that pays well).

Vya Domus · Mar 21, 2022

RealKGB said:
Apple based their GPU claims off of the Media Engine, which soundly destroys a 3090. People aren't buying a Mac Studio for gaming.

That still an outright lie because the Media Engine is not the GPU.

Valantar · Mar 21, 2022

Vya Domus said:
That still an outright lie because the Media Engine is not the GPU.

I don't think they did base the comparisons on that though. They don't claim that the ME is part of the GPU, after all, they present it (and its specs) entirely separately, just as with the Neural Engine.

It's possible they used some form of benchmark that could partially make use of the ME - say, a video editing timeline performance benchmark with various filters applied, where the video playback itself is accelerated through the ME - but that's rather unlikely (and still places most of the workload on the GPU, not the ME).

JustBenching · Mar 21, 2022

Valantar said:
Going back to Anandtech's SPEC testing of the M1 Max, I think there's a clue there as to the wildly divergent performance results across different benchmarks: reliance on integer vs. floating point math as well as memory bandwidth. In AT's testing, the M1 Max (8P+2E, 10t) outperformed the 8P, 16t Ryzen 7 5800X by 4.8% (53.38 vs. 50.98 points), but only delivered 64% of the 5950X's performance (83,13 points) in the SPECint suite. In SPECfp on the other hand, the M1 Max outperforms the 5800X by 72.1% (81.07 points vs. 47.1), and even trounces the 5950X by 25.9% (64.39 points). Apple's scores drop somewhat if the Icestorm efficiency cores are excluded, to 48.57 and 75.67 points, which puts it behind the 5800X in integer, but still ahead of the 5950X in floating point. AT notes that "The fp2017 suite has more workloads that are more memory-bound". In one SPECfp nT subtest, the M1 Max beats the Ryzen 9 5980HS (mobile, 35W, 8c16t) by a staggering 4.8 times. It's an outlier, but illustrates what can happen in a memory bound edge case.

Also worth noting: AT measures per-core memory access speeds for the M1 to be significantly lower than the total bandwidth of the memory interface. This is also true for other architectures (you need more than 1c to max out your bandwidth), but it makes direct bandwidth comparisons troublesome.

Still, this tells us several things:
- The Apple Firestorm cores (1t) are ever so slightly behind AMD's Zen 3 cores (2t) in integer workloads, but are almost matched core for core.
- The Apple Firestorm cores have a massive advantage over Zen3 in floating point workloads, at least those present in the SPEC suite, delivering more than 2x the performance per core. This has the caveat that these workloads are more memory bound.
- nT scaling is quite different across the architectures and workloads: the 5950X scales 10.87x from 1t to 32t in int and 5.3x in fp. The 5800X is sadly not in the 1t chart, but should be marginally slower than the 5950X (2-300MHz). If they were identical, its scaling would be 6.7x and 3.9x from 1t to 16t, so it's likely a tad higher than that. The M1 Max scales by 7.1x and 6.3x from 1t to 10t. Thread scaling comparisons are made difficult by Apple's big.little architecture and AMD's SMT, but it does seem that the higher memory bandwidth helps Apple scale better with additional cores to some extent (though this could also be affected by many other factors, including software). Or this could be formulated as AMD's MSDT platform being significantly held back by memory bandwidth.

How does this affect the discussions here? Well, it's well documented that Cinebench doesn't care much about memory bandwidth. It likes higher IF and memory clocks on AMD, but it scales poorly with more memory channels, meaning latency is more important than bandwidth. Other workloads are quite different - but they tend to be quite specialized. There are essentially no consumer workloads that are particularly bandwidth limited, with most caring more about compute power or latency.

Does this mean Cinebench is a more, or less, reliable benchmark? Depends what you're looking for. It's clear that Apple has designed the larger M1 chips for bandwidth-hungry applications (though the large GPU sharing it also needs that, of course). So, if your workloads are bound by memory bandwidth, the M1 Ultra and its siblings are likely to deliver staggeringly

Does this mean Apple lied? Again, no. There's no doubt that the M1 Ultra is the most powerful chip for PCs ever made. Not CPU, not GPU, but in sum.

Valantar said:
Going back to Anandtech's SPEC testing of the M1 Max, I think there's a clue there as to the wildly divergent performance results across different benchmarks: reliance on integer vs. floating point math as well as memory bandwidth. In AT's testing, the M1 Max (8P+2E, 10t) outperformed the 8P, 16t Ryzen 7 5800X by 4.8% (53.38 vs. 50.98 points), but only delivered 64% of the 5950X's performance (83,13 points) in the SPECint suite. In SPECfp on the other hand, the M1 Max outperforms the 5800X by 72.1% (81.07 points vs. 47.1), and even trounces the 5950X by 25.9% (64.39 points). Apple's scores drop somewhat if the Icestorm efficiency cores are excluded, to 48.57 and 75.67 points, which puts it behind the 5800X in integer, but still ahead of the 5950X in floating point. AT notes that "The fp2017 suite has more workloads that are more memory-bound". In one SPECfp nT subtest, the M1 Max beats the Ryzen 9 5980HS (mobile, 35W, 8c16t) by a staggering 4.8 times. It's an outlier, but illustrates what can happen in a memory bound edge case.

Also worth noting: AT measures per-core memory access speeds for the M1 to be significantly lower than the total bandwidth of the memory interface. This is also true for other architectures (you need more than 1c to max out your bandwidth), but it makes direct bandwidth comparisons troublesome.

Still, this tells us several things:
- The Apple Firestorm cores (1t) are ever so slightly behind AMD's Zen 3 cores (2t) in integer workloads, but are almost matched core for core.
- The Apple Firestorm cores have a massive advantage over Zen3 in floating point workloads, at least those present in the SPEC suite, delivering more than 2x the performance per core. This has the caveat that these workloads are more memory bound.
- nT scaling is quite different across the architectures and workloads: the 5950X scales 10.87x from 1t to 32t in int and 5.3x in fp. The 5800X is sadly not in the 1t chart, but should be marginally slower than the 5950X (2-300MHz). If they were identical, its scaling would be 6.7x and 3.9x from 1t to 16t, so it's likely a tad higher than that. The M1 Max scales by 7.1x and 6.3x from 1t to 10t. Thread scaling comparisons are made difficult by Apple's big.little architecture and AMD's SMT, but it does seem that the higher memory bandwidth helps Apple scale better with additional cores to some extent (though this could also be affected by many other factors, including software). Or this could be formulated as AMD's MSDT platform being significantly held back by memory bandwidth.

How does this affect the discussions here? Well, it's well documented that Cinebench doesn't care much about memory bandwidth. It likes higher IF and memory clocks on AMD, but it scales poorly with more memory channels, meaning latency is more important than bandwidth. Other workloads are quite different - but they tend to be quite specialized. There are essentially no consumer workloads that are particularly bandwidth limited, with most caring more about compute power or latency.

Does this mean Cinebench is a more, or less, reliable benchmark? Depends what you're looking for. It's clear that Apple has designed the larger M1 chips for bandwidth-hungry applications (though the large GPU sharing it also needs that, of course). So, if your workloads are bound by memory bandwidth, the M1 Ultra and its siblings are likely to deliver staggeringly good performance and efficiency. If not? Then it's likely competitive with competing CPUs with a similar number of cores (not threads), but YMMV depending on the workload.

Does this mean Apple lied? Again, no. There's no doubt that the M1 Ultra is the most powerful chip for PCs ever made. Not CPU, not GPU, but in sum.

Being a single SoC has some great advantages, particularly in efficiency (near zero interconnect power, far less embodied energy in duplicate componentry), but also to some extent in latency-sensitive performance scenarios. But as you say, it's also inflexible, and those advantages don't necessarily apply to all workloads. Both approaches have distinct pros and cons. You can't get past the fact that a tightly integrated package will always be more efficient than a collection of discrete components (as long as they are othewise comparable in terms of architectural efficiency). There's a reason why a 5W smartphone delivers a lot more than 1/20th the performance of a 100W laptop. That doesn't invalidate the value or performance of the laptop - after all, it delivers a degree of performance impossible in a smartphone.

As for that fp64 comparison: according to this source the M1 GPU arch has 1/4 speed fp64, so it should deliver ~2.6TF/s of FP64 (assuming 10.4TF FP32 numbers online are accurate). For comparison, Ampere has 1/32 speed FP64, and delivers ~1.21TF/s of FP64. Not that this necessarily matters much: the reason Ampere performs poorly in FP64 is that double precision floating point math has become ever more of a niche application, and Nvidia thus doesn't prioritize it whatsoever outside of their datacenter GA100 chips, which have 1:2 FP64 (and the A100 80GB SXM4 delivers ~9.746TF/s of FP64). As such, it seems that Apple cares more about FP64 than Nvidia does outside of datacenters, likely indicating that pro applications common on MacOS tend to use that a bit more than most PC applications (though it might also just be a small but profitable subset of apps, wanting to cater to a niche that pays well).

What you are saying might be true but the benefits of a single chip can be actually measured in the benchmarks. Being a single chip by itself is irrelevant, whats relevant is the performance and the efficiency.

Valantar · Mar 21, 2022

fevgatos said:
What you are saying might be true but the benefits of a single chip can be actually measured in the benchmarks. Being a single chip by itself is irrelevant, whats relevant is the performance and the efficiency.

That's true - and that's why Apple is managing performance that, depending on the workload rivals 150W (5950X)-250W (12900K)-280W (3970X) CPUs and 250W+ (3070 or higher) GPUs in a single package that never exceeds 200W of power draw in any workload, and mostly stays well below that, with the CPU peaking at ~60W of power draw. Are those higher power CPUs faster in some or even most workloads? Yep. Can you build a more powerful workstation with a high end TR chip and one or more high end GPUs? Again, yep. But when one takes into consideration that the CPUs consume 2-3-4-5x the power, and the GPUs also quite a bit more, that starts looking far less impressive. And that is of course not to say that AMD or Intel couldn't make a similarly efficient design if they really wanted to, but they haven't so far. (Which is understandable: they need to sell their chips to (very conservative) third party OEMs and convince those the chips are worth implementing, so pricing and production costs matter a lot more. The failure of KBL-G illustrates how difficult it can be to get an innovative, good chip design adopted by OEMs.)

We still don't have sufficient in-depth benchmarks to really conclude about anything - and there seems to be a lack of good cross-platform benchmarks in general, which makes comparisons difficult. But from what we've seen so far, Apple's framing seems like typical PR: grounded in some specific truth, but highly specific and borderline misleading unless you really pay attention. Their headline statements are still true - this is the most powerful chip ever in a PC - but only when you pay attention to what they're actually saying, rather than what they are (very strongly) insinuating. Which just tells us that Apple has a very good marketing team.

Vya Domus · Mar 21, 2022

Valantar said:
I don't think they did base the comparisons on that though. They don't claim that the ME is part of the GPU, after all, they present it (and its specs) entirely separately, just as with the Neural Engine.

It's possible they used some form of benchmark that could partially make use of the ME - say, a video editing timeline performance benchmark with various filters applied, where the video playback itself is accelerated through the ME - but that's rather unlikely (and still places most of the workload on the GPU, not the ME).

Whatever they're using, the point that they're trying to make with these annoyingly difficult to understand charts is that their GPUs are more power efficient at the same performance level. Which I am sure is true but that's not exactly an amazing achievement as we all know that desktop GPUs are clocked way out of their optimum power curve. I am convinced that an underclocked and undervolted 3090 would not only still outperform their GPUs in terms of raw performance but also in terms of efficiency because they're still not even comparable in terms of area and transistor count.

Valantar · Mar 22, 2022

Vya Domus said:
Whatever they're using, the point that they're trying to make with these annoyingly difficult to understand charts is that their GPUs are more power efficient at the same performance level. Which I am sure is true but that's not exactly an amazing achievement as we all know that desktop GPUs are clocked way out of their optimum power curve. I am convinced that an underclocked and undervolted 3090 would not only still outperform their GPUs in terms of raw performance but also in terms of efficiency because they're still not even comparable in terms of area and transistor count.

You're not wrong there, but that doesn't make their advantage any more real. It just demonstrates the differences of operating in a wholly self-contained hardware and software ecosystem vs. operating in a (somewhat) competitive DIY space. Both Nvidia and AMD's workstation and enterprise parts tend to be clocked a bit more reasonably, but even there due to the competitive market they still aim to max out however much power can reasonably be cooled in whatever form factor they're aiming for.

I'm sure a well tuned 3090 could bring these power comparisons at iso performance much closer, but I kind of doubt they'd be able to overtake Apple - though making these cross-architectural comparisons without anything resembling good benchmarking at these power levels is of course a major challenge, rendering this just loose speculation. Still, Nvidia is on a relatively inefficient node, while Apple uses the best available, and a low power version of it to boot. It's not quite the same as the 5nm node other mobile customers use, but they're clearly not using high performance libraries here - clocks are too low and density is too high for that. The question then becomes whether Apple's architecture is sufficiently behind Nvidia's to nullify the effects of their near 2-node advantage.

Of course that's also a part of why Nvidia is struggling with efficiency this generation - being stuck on an underperforming node isn't helping them compete with a resurgent AMD, who used to be miles behind in efficiency but is now beating them. AMD also has a node advantage of course, but the efficiency improvements of RDNA and RDNA2 seem to have caught Nvidia somewhat unaware, forcing them to push clocks higher than what their node can really comfortably handle.

As for area and transistor count, it's of course difficult to tell just how large the GPU (and other parts matching hardware present on an Nvidia GPU, especially once you start factoring in memory interfaces, caches, etc.) on this is, but it seems reasonable to estimate that it (including memory interfaces and caches) covers ~50% of the die area, or a bit less. Accounting for the density advantage of TSMC's custom Apple 5nm process vs. Samsung 8nm, that likely means the GPU is comparable to the 628mm² GA102 - though that doesn't really add up in terms of transistor counts, seeing how even even the M1 Max has 2x the transistor count of GA102. Of course transistor counts discussed on this level are kind of arbitrary, and mainly speak to architectural and node differences, but it still seems reasonable to assume that the M1 Ultra has a GPU that hardware-wise is in the same ballpark as the 3090 - but clocked much lower.

The M1 Max was rated at 10.4TF/s FP32, and assuming the same clock speeds on the Ultra, that's a 20.8TF/s GPU - that's half the FP32 performance estimated for the 3090 Ti, but that's also a bit of a skewed comparison given how consumer Ampere doubled its theoretical FP32 output through implementing dual-mode ALUs. Which also means that you don't always get that level of performance, but anything from half to full depending on the composition of your workload. Of course we don't really know anything about Apple's GPU architecture, but given that it originated from an Imagination design, and the fact that it has a relatively high FP64-to-FP32 ratio, it's pretty safe to assume it is overall more similar to pre-Ampere Nvidia architectures or the GA100. (For reference, the compute-focused GA100 doesn't have the dual-mode ALUs and thus seemingly has fewer shading units despite being a much larger GPU.)

Given all of that, you're probably right that this is comparable to a low-clocked 3090 in many ways (though on a much more efficent node). Though given the quirks of architecture, OS, APIs and software, this comparison will likely skew wildly across various workloads.

TopHatProductions115 · Apr 2, 2022

Something tells me that Apple only made the comparison to the RTX 3090 for clicks. They only beat/compete with that card in select workloads. Anyone with a brain can tell it was a bit of a reach on their part, seeing that the two components are in completely different weight classes (in terms of power draw, efficiency, and primary focus). They wanted to be noticed, and they may have just gotten it...

lexluthermiester · Apr 3, 2022

TopHatProductions115 said:
They wanted to be noticed

Oh they got noticed, but not in a good way. Many people were not impressed by the blatant BS they were shoveling.

TopHatProductions115 · Apr 3, 2022

lexluthermiester said:
Oh they got noticed, but not in a good way. Many people were not impressed by the blatant BS they were shoveling.

And what of the laymen who don't know how to vet these claims? Perhaps Apple wasn't aiming directly at the tech-literate audience, but instead intending to use the PR to advertise the new M1 products to those who aren't so technically inclined.

Processor	Ryzen 5 7600X
Motherboard	ASRock B650M PG Riptide
Cooling	Noctua NH-D15
Memory	DDR5 6000Mhz CL28 32GB
Video Card(s)	Nvidia Geforce RTX 3070 Palit GamingPro OC
Storage	Corsair MP600 Force Series Gen.4 1TB

System Name	Mean machine
Processor	12900k
Motherboard	MSI Unify X
Cooling	Noctua U12A
Memory	7600c34
Video Card(s)	4090 Gamerock oc
Storage	980 pro 2tb
Display(s)	Samsung crg90
Case	Fractal Torent
Audio Device(s)	Hifiman Arya / a30 - d30 pro stack
Power Supply	Be quiet dark power pro 1200
Mouse	Viper ultimate
Keyboard	Blackwidow 65%

System Name	Mean machine
Processor	12900k
Motherboard	MSI Unify X
Cooling	Noctua U12A
Memory	7600c34
Video Card(s)	4090 Gamerock oc
Storage	980 pro 2tb
Display(s)	Samsung crg90
Case	Fractal Torent
Audio Device(s)	Hifiman Arya / a30 - d30 pro stack
Power Supply	Be quiet dark power pro 1200
Mouse	Viper ultimate
Keyboard	Blackwidow 65%

System Name	Mean machine
Processor	12900k
Motherboard	MSI Unify X
Cooling	Noctua U12A
Memory	7600c34
Video Card(s)	4090 Gamerock oc
Storage	980 pro 2tb
Display(s)	Samsung crg90
Case	Fractal Torent
Audio Device(s)	Hifiman Arya / a30 - d30 pro stack
Power Supply	Be quiet dark power pro 1200
Mouse	Viper ultimate
Keyboard	Blackwidow 65%

Processor	Ryzen 5 7600X
Motherboard	ASRock B650M PG Riptide
Cooling	Noctua NH-D15
Memory	DDR5 6000Mhz CL28 32GB
Video Card(s)	Nvidia Geforce RTX 3070 Palit GamingPro OC
Storage	Corsair MP600 Force Series Gen.4 1TB

Apple's Graphics Performance Claims Proven Exaggerated by Mac Studio Reviews

studentrights

New Member

usiname

JustBenching

studentrights

New Member

JustBenching

studentrights

New Member

JustBenching

usiname

Nike_486DX

95Viper

Super Moderator

dlgh7

studentrights

New Member

TheoneandonlyMrK

JustBenching

usiname

Valantar

Vya Domus

Valantar

JustBenching

Valantar

Vya Domus

Valantar

TopHatProductions115

lexluthermiester

TopHatProductions115

System Name	Mini efficient rig.
Processor	R9 3900, @4ghz -0.05v offset. 110W peak.
Motherboard	Gigabyte B450M DS3H, bios f41 pcie 4.0 unlocked.
Cooling	some server blower @1500rpm
Memory	2x16GB oem Samsung D-Die. 3200MHz
Video Card(s)	RX 6600 Pulse w/conductonaut @65C hotspot
Storage	1x 128gb nvme Samsung 950 Pro - 4x 1tb sata Hitachi 2.5" hdds
Display(s)	Samsung C24RG50FQI
Case	Jonsbo C2 (almost itx sized)
Audio Device(s)	integrated Realtek crap
Power Supply	Seasonic SSR-750FX
Mouse	Logitech G502
Keyboard	Redragon K539 brown switches
Software	Windows 7 Ultimate SP1 + Windows 10 21H2 LTSC (patched).
Benchmark Scores	Cinebench: R15 3050 pts, R20 7000 pts, R23 17800 pts, r2024 1050 pts.

System Name	RyzenGtEvo/ Asus strix scar II
Processor	Amd R5 5900X/ Intel 8750H
Motherboard	Crosshair hero8 impact/Asus
Cooling	360EK extreme rad+ 360$EK slim all push, cpu ek suprim Gpu full cover all EK
Memory	Corsair Vengeance Rgb pro 3600cas14 16Gb in four sticks./16Gb/16GB
Video Card(s)	Powercolour RX7900XT Reference/Rtx 2060
Storage	Silicon power 2TB nvme/8Tb external/1Tb samsung Evo nvme 2Tb sata ssd/1Tb nvme
Display(s)	Samsung UAE28"850R 4k freesync.dell shiter
Case	Lianli 011 dynamic/strix scar2
Audio Device(s)	Xfi creative 7.1 on board ,Yamaha dts av setup, corsair void pro headset
Power Supply	corsair 1200Hxi/Asus stock
Mouse	Roccat Kova/ Logitech G wireless
Keyboard	Roccat Aimo 120
VR HMD	Oculus rift
Software	Win 10 Pro
Benchmark Scores	8726 vega 3dmark timespy/ laptop Timespy 6506

System Name	Hotbox
Processor	AMD Ryzen 7 5800X, 110/95/110, PBO +150Mhz, CO -7,-7,-20(x6),
Motherboard	ASRock Phantom Gaming B550 ITX/ax
Cooling	LOBO + Laing DDC 1T Plus PWM + Corsair XR5 280mm + 2x Arctic P14
Memory	32GB G.Skill FlareX 3200c14 @3800c15
Video Card(s)	PowerColor Radeon 6900XT Liquid Devil Ultimate, UC@2250MHz max @~200W
Storage	2TB Adata SX8200 Pro
Display(s)	Dell U2711 main, AOC 24P2C secondary
Case	SSUPD Meshlicious
Audio Device(s)	Optoma Nuforce μDAC 3
Power Supply	Corsair SF750 Platinum
Mouse	Logitech G603
Keyboard	Keychron K3/Cooler Master MasterKeys Pro M w/DSA profile caps
Software	Windows 10 Pro

System Name	Good enough
Processor	AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard	ASRock B650 Pro RS
Cooling	2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory	32GB - FURY Beast RGB 5600 Mhz
Video Card(s)	Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage	1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s)	LG UltraGear 32GN650-B + 4K Samsung TV
Case	Phanteks NV7
Power Supply	GPS-750C

System Name	Serious Series - Serious Server (99.99%)
Processor	4x Intel Xeon E7-8870's
Motherboard	HP 512843-001/591196-001 (rev 0B) + 588137-B21/591205-001
Cooling	HP ProLiant OEM cooling fans(s) + heatsinks
Memory	256GB (64x4GB) DDR3-1333 PC3-10600R ECC
Video Card(s)	AMD FirePro S9300 X2 + nVIDIA GeForce GTX Titan Xp
Storage	1x HGST HUSMM8040ASS200 + 4x HP 507127-B21's + 1x WD Blue 3D NAND 500GB + 1x Intel SSDSA2CW600G3
Display(s)	Samsung ViewFinity S70A UHD 32" (S32A700)
Case	HP ProLiant DL580 G7 chassis
Audio Device(s)	1x Creative Sound Blaster Audigy Rx
Power Supply	4x HP 441830-001/438203-001's (1200W PSU's)
Mouse	Dell MS819
Keyboard	Logitech K845 (Cherry MX Blue)
VR HMD	N/a
Software	VMware ESXi 6.5u3 Enterprise Plus (VM: Windows 10 Enterprise LTSC)
Benchmark Scores	3DMark won't let me post my scores publicly at this time...