Wednesday, January 3rd 2024
Intel Meteor Lake P-cores Show IPC Regression Over Raptor Lake?
Intel Core Ultra "Meteor Lake" mobile processor may be the the company's most efficient, but isn't a generation ahead of the 13th Gen Core "Raptor Lake" mobile processors in terms of performance. This isn't just because it has an overall lower CPU core count in its H-segment of SKUs, but also because its performance cores (P-cores) actually post a generational reduction in IPC, as David Huang in his blog testing contemporary mobile processors found out, through a series of single-threaded benchmarks. Huang did a SPECint 2017 performance comparison of Intel's Core Ultra 7 155H, and Core i7-13700H "Raptor Lake," with AMD Ryzen 7 7840HS, 7840H "Phoenix, Zen 4," and Apple M3 Pro and M2 Pro.
In his testing, the 155H, an H-segment processor, was found roughly matching the "Zen 4" based 7840U and 7840HS; while the Core i7-13700H was ahead of the three. Apple's M2 Pro and M3 Pro are a league ahead of all the other chips in terms of IPC. To determine IPC, Huang tested all processors with only one core, and their default clock speeds, and divided SPECint 2017 scores upon average clock speed of the loaded core logged during the course of the benchmark. Its worth noting here that the i7-13000H notebook was using dual-channel (4 sub-channel) DDR5 memory, while the Core Ultra 7 155H notebook was using LPDDR5, however Huang remarks that this shouldn't affect his conclusion that there has been an IPC regression between "Raptor Lake" and "Meteor Lake."
Sources:
David Huang's Blog, Tom's Hardware
In his testing, the 155H, an H-segment processor, was found roughly matching the "Zen 4" based 7840U and 7840HS; while the Core i7-13700H was ahead of the three. Apple's M2 Pro and M3 Pro are a league ahead of all the other chips in terms of IPC. To determine IPC, Huang tested all processors with only one core, and their default clock speeds, and divided SPECint 2017 scores upon average clock speed of the loaded core logged during the course of the benchmark. Its worth noting here that the i7-13000H notebook was using dual-channel (4 sub-channel) DDR5 memory, while the Core Ultra 7 155H notebook was using LPDDR5, however Huang remarks that this shouldn't affect his conclusion that there has been an IPC regression between "Raptor Lake" and "Meteor Lake."
85 Comments on Intel Meteor Lake P-cores Show IPC Regression Over Raptor Lake?
Might need some light modding though.. But I'll be fine after that...
Or maybe they are going one step back, so they can sell higher IPC with their next series.
Or both.
Meteor Lake is the first consumer chip to use them, and also the first consumer product utilizing the Intel 4 process.
IMO the new process has a higher chance of influencing IPC than the fact it uses tiles, but more research is required.
I did some benching and on some Linpack tests, I was able to get about 500 gigaflops on 24 threads on Sapphire Rapids, whereas I got 333 gigaflops on 8 threads on the 12700k in stock form (e-cores disabled in BIOS).
I plan to do more tests to profile the Sapphire Rapids soon.
Still glad I opted for 7840hs :cool:
That's hardly their worst investment.
Plus if Intel Fabs are good then they might be manufacturing for AMD and Apple in the future.
In any case the meteor lake architecture seems pretty weak -- hopefully the arrow lake core is much stronger.
Server CPUs are usually running at lower frequencies than desktop parts are capable of, especially for 1-core turbo. The highest 1T frequency for 24-core Xeons is 4.0GHz while 12700K goes to 5.0GHz and can be OCd further. Xeon-W can go up to 4.8GHz but their TDP is ~100W higher than equivalent Xeon. This behaviour is not unique to Intel.
While testing you should also consider that by default desktop Intel CPUs are no longer limited in PL2 turbo duration (since Alder Lake), while professional CPUs adhere to their limits. Some mainstream motherboards will also modify power limits by default.
Also for Alder Lake the OS puts threads on the P cores first, whereas with Meteor Lake the OS starts with the LPE cores. How do we know this benchmark isn't starting on an LPE core and getting moved too late?
And what I've heard is that Redwood Cove is little more than a die shrink of Raptor Cove. So I expect it to have identical IPC and I want stronger evidence than this to stop believing that.
Then again, the memory controller in Meteor Lake is on a different tile. Maybe that hurts performance more than we realized. It ought to hurt Ryzen's chiplets even more but Zen 2 introduced chiplets and I've never seen an IPC comparison between Zen 2 desktop (chiplets) and mobile (monolithic). (And Zen 2 had much higher IPC than Zen 1.)
As you said this increases latency, so Intel supports emulating NUMA on those CPUs so that you can split the memory and cores into 2/4 logical regions to avoid the memory latency issues. The OS can then optimize process scheduling to keep the cores and their memory together.
This isn't unique to SPR since previous Xeons, that are monolithic, also allowed this because even though all of the controllers were on the same die they were located at the edges and still had differing latency depending on core location. This issue is present in every multi-core CPU, but its impact is negligible unless you start scaling core counts up.
Every AMD EPYC generation also has a mechanism like that, and even one that goes further and divides the CPU along the physical L3 cache locations. Even EPYCs with IO dies, containing a monolithic memory controllers, can be set to "lock" memory channels to Infinity Fabric links that serve particular CCDs.
Many workloads yield improvements by using those mechanisms, and both vendors publish extensive documentation on how to configure their professional CPUs.