- Joined
- Oct 30, 2020
- Messages
- 269 (0.18/day)
So he thinks AMD was looking at CPU-Z, or anyone was? That's absurd.
He doesn't "think", they were testing it. AMD literally had CPU-z in the official IPC slide in case you want to have a look, and his quote was immediately after the slide.
Note CPU-z on the far left with no change.
That might be because, Zen 3 and Zen 4, are not much different at the core level. They mostly changed the IO / memory / cache.
And yet the quote itself is this: "Zen 4 received improvements like a larger micro-op cache, better branch prediction, and doubled L2 cache capacity. Those would help a lot of applications, but not CPU-Z. Thus, CPU-Z’s benchmark ends up being useless to both CPU designers and end users"
So yes, there were core changes. And those improvements certainly contributed to the 13% IPC gain and yet no gain in the CPU-z benchmark, because the a beefier front end doesn't affect the result. So let me understand this correctly, is a larger micro op-cache, which affects a lot of real world workloads, part of your 'pure IPC' calculation? Or like..better branch prediction? How about having larger L2 caches, which will certainly affect a lot of workloads but not CPU-z? Or is your definition of a pure IPC benchmark one that only really tests the backend of a CPU and fits into L1 cache?
Of course it does. If it does not fit in L1, what exactly are you testing?
L1<->L2 latency and bandwidth? L2<->L3 Latency and bandwidth? L3<-> main memory latency and bandwidth? Yes on all counts.
I'm sorry, but just fitting into L1 cache doesn't make it a great, or a 'pure IPC' test. A higher L1 or L1>L2 bandwidth will contribute to, but not automatically make it a 'pure IPC' test . To put it simply, a larger L2 cache in a cache starved design will net huge IPC gains in most applications but an absolute zero gain in CPU-z single core bench. Same with a much larger micro-op cache. So essentially the CPU-z test is mostly testing the backend of a CPU, and hits the FP register particularly hard.
Work on that reading comprehension thing. It does test branch predictors, just not to the degree that "chipsandcheese" wants.
I'm going to ignore that unnecessary insult of sorts and get to it. Improved branch predictors make next to no difference in the CPU-z benchmark and certainly no difference in the single core bench. To quote, this is how basic their test of the branch prediction is: "Even Goldmont Plus has no problem tracking CPU-Z’s branches". Hell, even Bulldozer has very little problems with that terrible branch predictor and that's saying something. It's not what "chipsandcheese" wants, it's what they are seeing in the results. My reading comprehension was fine, I just didn't wanna add a ton of explanation thinking you also understand that branch predictors make next to no difference in that bench. Well now you know.
You seem to have the idea that this CPU-z test which fits into the L1 cache is a pure IPC test. A chips and cheese article refutes that. The test not really testing a CPU's frontend refutes that, as real world benches benefit a lot from it. Branch predictors not affecting the single core result refutes that. A larger micro-op cache not affecting the result refutes that. I can go on, but i'll stop here. If you still think this CPU-z bench is a pure IPC bench so be it. It would simply mean your definition of IPC is different from mine. For me, IPC gains are calculated using an average of test results which individually test different parts of the CPU or a more rounded single bench that is atleast affected by changes in front end, execution and backend. A singular test that doesn't change from an increase in micro-op cache, larger L2 and generally beefier front end doesn't really fit the definition of a 'pure IPC' bench.
I'm going to stop arguing with you over this and leave with a quote from them:
"What limits computer performance today is predictability, and the two big ones are instruction/branch predictability, and data locality - Jim Keller, during an Interview with Dr. Ian Cutress.
That’s not just Jim Keller’s opinion. I’ve watched CPU performance counters across my day-to-day workloads. Across code compilation, image editing, video encoding, and gaming, I can’t think of anything that fits within the L1 cache and barely challenges the branch predictor. CPU-Z’s benchmark is an exception. The factors that limit performance in CPU-Z are very different from those in typical real-life workloads."
Last edited: