IPC Comparisons Between Raptor Cove, Zen 4, and Golden Cove Spring Surprising Results

progste · Sep 16, 2022

fevgatos said:
Dont think thats true. 5800x has one ccd measuring 85mm2. 5950x has 2 ccds measuring double that.

When it comes to cooling, what i mean by easy or hard to cool is iso wattage with the same cooler. With a tdp of 170w the 7950x will probably go above 200w even at stock.

The cores per ccd is the same (8), if anything the 5950x should put out more heat since it has two of them but the opposite is tru as AMD refined their manufacturing process.
By hard to cool eveyrone compares similar coolers in similar use conditions, not watt per watt.
Also if the TDP is 170W it will do at most that at stock.

JustBenching · Sep 16, 2022

progste said:
The cores per ccd is the same (8), if anything the 5950x should put out more heat since it has two of them but the opposite is tru as AMD refined their manufacturing process.
By hard to cool eveyrone compares similar coolers in similar use conditions, not watt per watt.
Also if the TDP is 170W it will do at most that at stock.

You don't understand the fundamental parts of thermodynamics. The 5950x is power limited to the same wattage as the 5800x,but that wattage is spread out to double the die size, thats why its easier to cool.

Zen 4 has an even smaller die size, but even higher power draw, which will make it way harder to cool than zen 3. On the other hand Raptor will have a bigger die size than alderlake but similar power draw, which makes it easier. Assuming the zen 4 rumors are true and the 7950x draws north of 200w, it will be way harder to cool than the 13900k at 250watts. Thats just physics

progste · Sep 16, 2022

fevgatos said:
You don't understand the fundamental parts of thermodynamics. The 5950x is power limited to the same wattage as the 5800x,but that wattage is spread out to double the die size, thats why its easier to cool.

Zen 4 has an even smaller die size, but even higher power draw, which will make it way harder to cool than zen 3. On the other hand Raptor will have a bigger die size than alderlake but similar power draw, which makes it easier. Assuming the zen 4 rumors are true and the 7950x draws north of 200w, it will be way harder to cool than the 13900k at 250watts. Thats just physics

The die size is the same, the 5950x uses two 8 core chiplets while the 5800x uses one of them.
We'll see once the chips are out, but something tells me the 13900k will be another miniature stove while the 7950x will be reasonable.

JustBenching · Sep 16, 2022

progste said:
The die size is the same, the 5950x uses two 8 core chiplets while the 5800x uses one of them.
We'll see once the chips are out, but something tells me the 13900k will be another miniature stove while the 7950x will be reasonable.

You are confusing the ihs with the die. The ihs is the same yes, the die isn't. The 5950x has 2 ccds of 85mm2 each. The 5800x has one.

Operandi · Sep 16, 2022

Richards said:
Raptor cove still superior on an older node.. intel architecture is more advanced

But AMD is matching Intel's performance using significantly fewer transistors so clearly AMD is still superior.

The reality is they are both very different and it looks like both have good designs and AMD and Intel will pretty much directly competing overall.

AlwaysHope said:
Those scores are pretty close if not within the margin of error. It's like splitting hairs here... I also think bios immaturity with RPL could be a handicap.

Its just one test but it is pretty insane just how close these very different architectures perform when normalized at the same clock, I would not have expected that at all.

efikkan · Sep 16, 2022

agent_x007 said:
IPC is a constant (and depends on task), and it is independent of core frequency (and why you multiple both together to approximate performance FYI).

The higher the core frequency, the higher it will be scewed by buses/IMC/DRAM performance, and higher chance of throttling based on cooling/power requirements …

This is a typical misconception.
Real IPC is a constant and is given by the architectural design, it's the architecture's ability to process instructions across "any" workload, and is measured in clocks. Real IPC isn't possible for us to measure, so we approximate it by locking clock speed far below any throttling point, choosing memory hopefully fast enough not to cause a bottleneck, and hopefully selecting a good amount of workloads able to saturate a single core. What we get is a relative IPC, which is an approximation, and the quality of this approximation is dependent on the aforementioned factors which will affect the benchmark scores.

Wirko · Sep 17, 2022

efikkan said:
This is a typical misconception.
Real IPC is a constant and is given by the architectural design, it's the architecture's ability to process instructions across "any" workload, and is measured in clocks. Real IPC isn't possible for us to measure, so we approximate it by locking clock speed far below any throttling point, choosing memory hopefully fast enough not to cause a bottleneck, and hopefully selecting a good amount of workloads able to saturate a single core. What we get is a relative IPC, which is an approximation, and the quality of this approximation is dependent on the aforementioned factors which will affect the benchmark scores.

How do you account for the fact that different instructions take different number of cycles to execute, from zero (sometimes, if the front end manages to fuse two instructions into one micro-op) to several tens (division, whose time to execute also depends of the actual data being divided)?
How do you account for the fact that, as an example, a Skylake core can do four non-vector additions at the same time (they probably execute in one cycle but I haven't checked) but only one division (which, again, takes many cycles to execute)?

Steevo · Sep 18, 2022

Wirko said:
How do you account for the fact that different instructions take different number of cycles to execute, from zero (sometimes, if the front end manages to fuse two instructions into one micro-op) to several tens (division, whose time to execute also depends of the actual data being divided)?
How do you account for the fact that, as an example, a Skylake core can do four non-vector additions at the same time (they probably execute in one cycle but I haven't checked) but only one division (which, again, takes many cycles to execute)?

Cause that is the actual real world effect of architecture on IPC in real world software at a set frequency so we can determine the efficiency of a architecture at a given task.

I seriously don't know how that is so hard to understand by so many.

Architecture A may be great at X software, while Y architecture may excel with Z software and its a balance act to make one great at everything, which is also why a great architecture at in order execution has a long/deep pipeline but a out of order architecture must have a either shallow pipeline and or a great predictive branching unit and lots of cache.

Why are Arm CPUs so good on phones and closed environments? They have a closed environment and can be optimized for typical handheld devices. The same program can run significantly faster on a desktop CPU through a emulator though, so which architecture is superior? Which has higher IPC.

progste · Sep 18, 2022

arm is built on a RISC architecture which means they have less a simpler and smaller instruction set which means less space and lower power.
x86 is a CISC architecture which means they have a wider set of instructions, some of which are very complex and take a lot of hardware and power to implement.

The advantage of RISC is efficiency for small tasks, the advantage of CISC is performance on highly complex tasks, neither is superior in absolute.
in other words the x86 CPU can do the same thing with less instructions so this doesn't really reflect IPC.

Wirko · Sep 18, 2022

efikkan said:
Real IPC is a constant and is given by the architectural design, it's the architecture's ability to process instructions across "any" workload

So the real IPC of the Haswell or Skylake architecture is 6, is that what you mean? It's been calculated by people who seem to know the architecture well enough.

What is the maximum possible IPC can be achieved by Intel Nehalem Microarchitecture?

Is there an estimation for the maximum Instructions Per Cycle achievable by the Intel Nehalem Architecture? Also, what is the bottleneck that effects the maximum Instructions Per Cycle?

stackoverflow.com

btarunr said:
The big surprise here is just how good the "Gracemont" E-cores are in SPECint. OneRaichu made a distinction between the "Gracemont" E-cores of "Alder Lake" (GLC-12) and those of "Raptor Lake" (GLC-13,) as the latter have double the amount of shared L2 cache per E-core cluster. The E-core is fast approaching IPC levels comparable to that of "Skylake," which really is Intel's calculation in giving its processors a large number of E-cores next to a small number of P-cores. The idea is that the E-cores will soak up all the moderately-intensive compute workloads and background processes, keeping the P-cores free for gruelling compute-heavy tasks.

This was single-threaded benchmarking. While it does reveal a lot, it would have been great if it was also done with two threads and four threads.

2 threads on a single P core vs. 2 threads on the same E core cluster: each thread's performance on P should drop sharply (by 35% or so) but what about E?

4 threads on two P cores vs. 4 threads on the same E core cluster: similar but the E cores would be even more constrained because they share L2 and access to L3 and bus.

There may be optimisations (or regressions, for that matter) in how a P core handles SMT, and such benchmarking would have exposed that.

System Name	Mean machine
Processor	12900k
Motherboard	MSI Unify X
Cooling	Noctua U12A
Memory	7600c34
Video Card(s)	4090 Gamerock oc
Storage	980 pro 2tb
Display(s)	Samsung crg90
Case	Fractal Torent
Audio Device(s)	Hifiman Arya / a30 - d30 pro stack
Power Supply	Be quiet dark power pro 1200
Mouse	Viper ultimate
Keyboard	Blackwidow 65%

System Name	Mean machine
Processor	12900k
Motherboard	MSI Unify X
Cooling	Noctua U12A
Memory	7600c34
Video Card(s)	4090 Gamerock oc
Storage	980 pro 2tb
Display(s)	Samsung crg90
Case	Fractal Torent
Audio Device(s)	Hifiman Arya / a30 - d30 pro stack
Power Supply	Be quiet dark power pro 1200
Mouse	Viper ultimate
Keyboard	Blackwidow 65%

Processor	AMD Ryzen 9 5900X \|\|\| Intel Core i7-3930K
Motherboard	ASUS ProArt B550-CREATOR \|\|\| Asus P9X79 WS
Cooling	Noctua NH-U14S \|\|\| Be Quiet Pure Rock
Memory	Crucial 2 x 16 GB 3200 MHz \|\|\| Corsair 8 x 8 GB 1333 MHz
Video Card(s)	MSI GTX 1060 3GB \|\|\| MSI GTX 680 4GB
Storage	Samsung 970 PRO 512 GB + 1 TB \|\|\| Intel 545s 512 GB + 256 GB
Display(s)	Asus ROG Swift PG278QR 27" \|\|\| Eizo EV2416W 24"
Case	Fractal Design Define 7 XL x 2
Audio Device(s)	Cambridge Audio DacMagic Plus
Power Supply	Seasonic Focus PX-850 x 2
Mouse	Razer Abyssus
Keyboard	CM Storm QuickFire XT
Software	Ubuntu

Processor	i5-6600K
Motherboard	Asus Z170A
Cooling	some cheap Cooler Master Hyper 103 or similar
Memory	16GB DDR4-2400
Video Card(s)	IGP
Storage	Samsung 850 EVO 250GB
Display(s)	2x Oldell 24" 1920x1200
Case	Bitfenix Nova white windowless non-mesh
Audio Device(s)	E-mu 1212m PCI
Power Supply	Seasonic G-360
Mouse	Logitech Marble trackball, never had a mouse
Keyboard	Key Tronic KT2000, no Win key because 1994
Software	Oldwin

System Name	Compy 386
Processor	7800X3D
Motherboard	Asus
Cooling	Air for now.....
Memory	64 GB DDR5 6400Mhz
Video Card(s)	7900XTX 310 Merc
Storage	Samsung 990 2TB, 2 SP 2TB SSDs, 24TB Enterprise drives
Display(s)	55" Samsung 4K HDR
Audio Device(s)	ATI HDMI
Mouse	Logitech MX518
Keyboard	Razer
Software	A lot.
Benchmark Scores	Its fast. Enough.