Thursday, September 15th 2022
IPC Comparisons Between Raptor Cove, Zen 4, and Golden Cove Spring Surprising Results
OneRaichu, who has access to engineering samples of both the AMD "Raphael" Ryzen 7000-series, and Intel 13th Gen Core "Raptor Lake," performed IPC comparisons between the two, by disabling E-cores on the "Raptor Lake," fixing the clock speeds of both chips to 3.60 GHz, and testing them across a variety of DDR5 memory configurations. The IPC testing was done with SPEC, a mostly enterprise-relevant benchmark, but one that could prove useful in tracing where the moderately-clocked enterprise processors such as EPYC "Genoa" and Xeon Scalable "Sapphire Rapids" land in the performance charts. OneRaichu also threw in scores obtained from a 12th Gen Core "Alder Lake" processor for this reason, as its "Golden Cove" P-core powers "Sapphire Rapids" (albeit with more L2 cache).
With DDR5-4800 memory, and testing on SPECCPU2017 Rate 1, at 3.60 GHz, the AMD "Zen 4" core ends up with the highest scores in SPECint, topping even the "Raptor Cove" P-core. It scores 6.66, compared to 6.63 total of the "Raptor Cove," and 6.52 of the "Golden Cove." In the SPECfp tests, however, the "Zen 4" core falls beind "Raptor Cove." Here, scores a 9.99 total compared to 9.91 of the "Golden Cove," and 10.21 of the "Raptor Cove." Things get interesting at DDR5-6000, a frequency AMD considers its "sweetspot," The 13th Gen "Raptor Cove" P-core tops SPECint at 6.81, compared to 6.77 of the "Zen 4," and 6.71 of "Golden Cove." SPECfp sees the "Zen 4" fall behind even the "Golden Cove" at 10.04, compared to 10.20 of the "Golden Cove," and 10.46 of "Raptor Cove."The big surprise here is just how good the "Gracemont" E-cores are in SPECint. OneRaichu made a distinction between the "Gracemont" E-cores of "Alder Lake" (GLC-12) and those of "Raptor Lake" (GLC-13,) as the latter have double the amount of shared L2 cache per E-core cluster. The E-core is fast approaching IPC levels comparable to that of "Skylake," which really is Intel's calculation in giving its processors a large number of E-cores next to a small number of P-cores. The idea is that the E-cores will soak up all the moderately-intensive compute workloads and background processes, keeping the P-cores free for gruelling compute-heavy tasks.
Source:
OneRaichu (Twitter)
With DDR5-4800 memory, and testing on SPECCPU2017 Rate 1, at 3.60 GHz, the AMD "Zen 4" core ends up with the highest scores in SPECint, topping even the "Raptor Cove" P-core. It scores 6.66, compared to 6.63 total of the "Raptor Cove," and 6.52 of the "Golden Cove." In the SPECfp tests, however, the "Zen 4" core falls beind "Raptor Cove." Here, scores a 9.99 total compared to 9.91 of the "Golden Cove," and 10.21 of the "Raptor Cove." Things get interesting at DDR5-6000, a frequency AMD considers its "sweetspot," The 13th Gen "Raptor Cove" P-core tops SPECint at 6.81, compared to 6.77 of the "Zen 4," and 6.71 of "Golden Cove." SPECfp sees the "Zen 4" fall behind even the "Golden Cove" at 10.04, compared to 10.20 of the "Golden Cove," and 10.46 of "Raptor Cove."The big surprise here is just how good the "Gracemont" E-cores are in SPECint. OneRaichu made a distinction between the "Gracemont" E-cores of "Alder Lake" (GLC-12) and those of "Raptor Lake" (GLC-13,) as the latter have double the amount of shared L2 cache per E-core cluster. The E-core is fast approaching IPC levels comparable to that of "Skylake," which really is Intel's calculation in giving its processors a large number of E-cores next to a small number of P-cores. The idea is that the E-cores will soak up all the moderately-intensive compute workloads and background processes, keeping the P-cores free for gruelling compute-heavy tasks.
34 Comments on IPC Comparisons Between Raptor Cove, Zen 4, and Golden Cove Spring Surprising Results
Zen 4 has an even smaller die size, but even higher power draw, which will make it way harder to cool than zen 3. On the other hand Raptor will have a bigger die size than alderlake but similar power draw, which makes it easier. Assuming the zen 4 rumors are true and the 7950x draws north of 200w, it will be way harder to cool than the 13900k at 250watts. Thats just physics
We'll see once the chips are out, but something tells me the 13900k will be another miniature stove while the 7950x will be reasonable.
The reality is they are both very different and it looks like both have good designs and AMD and Intel will pretty much directly competing overall. Its just one test but it is pretty insane just how close these very different architectures perform when normalized at the same clock, I would not have expected that at all.
Real IPC is a constant and is given by the architectural design, it's the architecture's ability to process instructions across "any" workload, and is measured in clocks. Real IPC isn't possible for us to measure, so we approximate it by locking clock speed far below any throttling point, choosing memory hopefully fast enough not to cause a bottleneck, and hopefully selecting a good amount of workloads able to saturate a single core. What we get is a relative IPC, which is an approximation, and the quality of this approximation is dependent on the aforementioned factors which will affect the benchmark scores.
How do you account for the fact that, as an example, a Skylake core can do four non-vector additions at the same time (they probably execute in one cycle but I haven't checked) but only one division (which, again, takes many cycles to execute)?
I seriously don't know how that is so hard to understand by so many.
Architecture A may be great at X software, while Y architecture may excel with Z software and its a balance act to make one great at everything, which is also why a great architecture at in order execution has a long/deep pipeline but a out of order architecture must have a either shallow pipeline and or a great predictive branching unit and lots of cache.
Why are Arm CPUs so good on phones and closed environments? They have a closed environment and can be optimized for typical handheld devices. The same program can run significantly faster on a desktop CPU through a emulator though, so which architecture is superior? Which has higher IPC.
x86 is a CISC architecture which means they have a wider set of instructions, some of which are very complex and take a lot of hardware and power to implement.
The advantage of RISC is efficiency for small tasks, the advantage of CISC is performance on highly complex tasks, neither is superior in absolute.
in other words the x86 CPU can do the same thing with less instructions so this doesn't really reflect IPC.
stackoverflow.com/questions/37041009/what-is-the-maximum-possible-ipc-can-be-achieved-by-intel-nehalem-microarchitect This was single-threaded benchmarking. While it does reveal a lot, it would have been great if it was also done with two threads and four threads.
2 threads on a single P core vs. 2 threads on the same E core cluster: each thread's performance on P should drop sharply (by 35% or so) but what about E?
4 threads on two P cores vs. 4 threads on the same E core cluster: similar but the E cores would be even more constrained because they share L2 and access to L3 and bus.
There may be optimisations (or regressions, for that matter) in how a P core handles SMT, and such benchmarking would have exposed that.