Monday, November 2nd 2020
Intel Rocket Lake-S CPU Benchmarked: Up to 22% Faster Compared to the Previous Generation
Just a few days ago, Intel has decided to surprise us and give out information about its upcoming Rocket Lake-S platform designed for desktop users. Arriving early next year (Q1) the Rocket Lake-S platform is yet another iteration of the company's 14 nm node. However, this time we are getting some real system changes with a new architecture design. Backporting its Golden Cove core to 14 nm, Intel has named this new core type Cypress Cove. What used to be the heart of Ice Lake CPUs, is now powering the Rocket Lake-S platform. Besides the new core, there are other features of the platform like PCIe 4.0, new Xe graphics, and updated media codecs. You can check that out here.
Today, we have gotten the first benchmarks of the Intel Rocket Lake-S system. In the Userbenchmark bench, an unknown eight-core Rocket Lake CPU has been compared to Intel's 10th generation Comet Lake-S processors. The Rocket Lake engineering sample ran at 4.2 GHz while scoring a single-core score of 179. Compared to the Core i9-10900K that runs at 5.3 GHz, which scored 152 points, the Cypress Cove design is 18% faster. And if the new design is compared to the equivalent 8C/16T Compet Lake CPU like Core i7-10700K clocked at 5.1 GHz and scoring 148 points, the new CPU uarch is up to 22% faster. This represents massive single-threaded performance increases, however, please take the information with a grain of salt, as we wait for the official reviews.
Source:
WCCFTech
Today, we have gotten the first benchmarks of the Intel Rocket Lake-S system. In the Userbenchmark bench, an unknown eight-core Rocket Lake CPU has been compared to Intel's 10th generation Comet Lake-S processors. The Rocket Lake engineering sample ran at 4.2 GHz while scoring a single-core score of 179. Compared to the Core i9-10900K that runs at 5.3 GHz, which scored 152 points, the Cypress Cove design is 18% faster. And if the new design is compared to the equivalent 8C/16T Compet Lake CPU like Core i7-10700K clocked at 5.1 GHz and scoring 148 points, the new CPU uarch is up to 22% faster. This represents massive single-threaded performance increases, however, please take the information with a grain of salt, as we wait for the official reviews.
75 Comments on Intel Rocket Lake-S CPU Benchmarked: Up to 22% Faster Compared to the Previous Generation
This would be interesting news.. if the CPU's weren't about half a year off, we didn't have new CPU's coming out in a few days and that these Rocket Lake-S CPU's supposedly only top out at 8 cores.
- They have a benchmark app that runs actual benchmarks on computers. It returns certain set of numbers, bunch of x-thread CPU performance test results for example. This has not changed and results of these benchmarks have not changed either.
- Then they have a ranking or whatever on their page where they put some single number as CPU performance, calculating this from existing actual benchmark results. Their calculation formula for that number is what changed. These CPU benchmarks do not depend on memory speed.
With that said, if I was going to buy a CPU today, I'd probably buy a 12c AMD chip. So while I might be moderately impressed by this rumor, my opinion does not change until I see tangible evidence of what Intel claims. If it pans out, then good for Intel. If it doesn't, then I'd still buy an AMD chip.
Im just playing a few MMORPG on PC, 99% of them cant use more than 4 Threads im still happy with my Haswell i5 underclocked @ 2,2 GHz:p
My next Upgrade will be a i3 Rocket Lake :laugh:
18% over Comet Lake
6 % Comet Lake over Haswell
24% IPC+ = lower Clockrates = lower Powerusage for me
Architecture is much more important than nodes, even though the 14nm node will certainly impose some restrictions.
Many are forgetting that Intel's 14nm++ is much closer to TSMC's 7nm in performance than GloFo's and TSMC's 16/14/12 nm class nodes. Intel's limitations really start to kick in around ~8 cores. Coffee Lake and Comet Lake encounters issues with energy density, so a well crafted implementation of Sunny Cove might actually manage to reduce the energy density and keep similar clock speeds.
I don't expect the 14nm node to be "competitive" (in synthetics or specific workloads) against 12c/16c designs. But I really don't think a mainstream platform have to do that, as users with workloads which actually scales with 12 cores or more often "needs" HEDT features like more memory channels and PCIe lanes. Interestingly enough, the Skylake architecture performs very well in productive workloads like Photoshop and Premiere. It will be very interesting to see how this evolves with Zen 3 and Rocket Lake, perhaps an 8-core Rocket Lake becomes highly relevant for some power users despite having only 8 cores.
And the CPU isnt faster than a 6 Year old i7 4770:laugh:
IPC
Zen = under Haswell
Zen + = similar to Haswell
Zen 2 = 15% over Skylake
BTW
Intel is in our Country cheaper to a similar AMD System:
10400F 140$, H410 Board 50$, RAM 45$ = 235$
3600 190$, A320 Board 60$, RAM 45$ = 295$ (with a A520 Board 80$ we are by 315$)
All in all, I'm hoping to build an AMD machine in a couple months if everything pans out right. I have an itch just to build a mini itx machine. That way I can actually compare the two and have a machine that's a little more capable for gaming should I want the performance of a desktop.
what kind of achievement do you wanna show? Leading and cheating the desktop/pc/mobile CPU for decades with 4 cores CPU?
show me pls LOL, what a pity.
ahemm...sorry
22% increase in performance. In this particular bench yeah. I'd rather see the entire benchmark suite to judge any CPU. Claims is one thing and most of the time they are about one benchmark that is supposedly be the one to tell what the performance is? I will wait for reviews and more information about this new Intel CPU. March isn't so far away so Intel better buckle up and bring something to the game cause for now it's just embarrassing.
Such workloads vary between users, but generally high core speed is important if you're doing a lot of small recompilations, or if you're using an IDE. While large core counts mostly benefits large build jobs. You should look for the one that's the right balance for you. I just threw out my mini ITX machine, so much hassle.
I got a Fractal Design 7 XL instead, so easy to work with, and I'll probably buy some more if there are some good deals soon. :) Die space is not a problem, total power consumption and power density is.
The output score of a benchmark is not normally referred to as an "interpretation". It's normally referred to as the result. What is done with the final score is the interpretation part as this is what the professionals have been doing for a long time.
What all the links and quotes are about are not the benchmark results. These are about the interpretation of the results Userbenchmark did for their CPU rating and named it effective speed.
Look at the screenshot in the news bit: 1-core 179, 2-core 368, 4-core 682, 8-core 1115 and 64-core 1623 are results.
Percentages are comparison with base results (I believe all of these are compared to a 9900K).
Effective speed is up on the CPU page, I actually missed it on the first try. Both problematic things - Real World Speed and Effective Speed - seem to be primarily on the compare page.
And now I have spent more time and effort on looking at strange numbers from Userbenchmark than I really cared to.
Skylake L1-D 32KB, L2 256KB and L3 2MB (Inclusive)
SunnyCove L1-D 48KB, L2 512KB and L3 2MB (Inclusive)
CypressCove L1-D 48KB, L2 512KB and L3 2MB (Inclusive)
Skylake-X L1-D 32KB, L2 1MB and L3 1.375MB (Non-Inclusive)
WillowCove L1D 48KB, L2 1.25MB and L3 3MB (Non-Inclusive)
Inclusive cache means that copy of L1 is in L2 and copy of L2 is in L3. So the x86 core does not need to download from RAM which is much slower if it needs the same data again, only from L2 or L3.
The same is true if core b) is to run on the same data set as core a) and instead of asking core a) for L1 or L2, it copies the currently processed data by core a) from L3. A lot of software, including games, is sensitive to fast communication between the cores, so the Inclusive cache is a faster solution in this case.
The Non-Inclusive cache works differently because L1 has no copy in L2 and L2 has no copy in L3. Zuski are there where soft is sensitive to the L2 cache capacity and gains soft strongly multi-threaded with independent threads.
Inclusive is better for multi-threaded software whose threads are dependent on each other running on separate cores.
Non-Inclusive better for multi-threaded software, heavily independent threads running on separate cores.
CypressCove is exactly SunnyCove, possibly also in microcode with minor corrections. SunnyCove gains an average of 18% higher IPCs than Skylake. Which means that in one application it can have 10% higher IPC and in another 25-30%.
I think you're splitting a hair by calling a benchmark result (i.e. the final number) "interpretation".
While you do not seem to think that and the way the final number is tallied is an interpretation of the individual tests.
My point is that no one else calls the output, or combined number, the interpretation of the individual results. For example, no one does that with any of the 3DMark benchmark tools.
I understand your point, but disagree with your argument.
Thanks for the chat.
Also intel fanboys need to chill. Competition is good for EVERYONE.
Cypress Cove was designed after Willow Cove, so theoretically there is a possibility for some additional tweaks, but I haven't found any evidence of this yet. There is probably a good reason why both Intel and AMD have moved to non-inclusive caches for their most recent designs. I'm not sure there is a real "advantage" of inclusive caches at all in practice, especially with high core count and maintaining cache integrity. I think it's mostly a legacy thing, they just started to design L3 cache that way. But you're welcome to prove me wrong.
Multithreaded scaling is very hard to get right. Having multiple threads depend on each other is a recipe for poor scaling. Data hazards is commonly a killer of multithreaded performance. The best way to do it is to have the threads work as independent as possible, and sync up as little as possible. There is probably some edge case out there, but you hopefully get my point.
2. It's true and I can't deny it :) The trend is towards Epyc / Xeon and there is important scaling and the highest possible performance of a single core thanks to the large L2. The relationship between the threads goes to the background, which is confirmed by leaks about the Alderlake system, i.e. GoldenCove. GoldenCove has the same cache type and capacity as WillowCove, ie L2 1.25MB and L3 3MB non-inclusive. You can see that Intel will no longer develop two separate microarchitecture in the era of more and more cores.
When a cache line is prefetched into L2, an inclusive cache would immediately duplicate this into L3. Then when the cache line is evicted from L2, it would normally be copied to L3, but instead it's already there. The only advantage of having a copy in L3 is in case another core needs it within that very short window before it's moved there anyway, I believe this window isn't many thousands of clock cycles, way too short for any software to be "optimized" for it. The disadvantages are numerous; more wasted die space, increased complexity to maintain integrity with higher core counts, L3 waste increases if L2 increases in size, etc.
I would argue there are other ways to make the cache more efficient, if the concern is sharing cache between cores. As you know, L1 is split into two separate caches; instruction and data. If L2 were split similarly, the design could take advantage of the different usage patterns. Data cache lines have often higher throughput and is rarely shared between threads, while instruction cache lines are often used repeatedly and by many cores within a short time. If L3 were split, L3I could be tightly interconnected, and L3D could be larger and more off-die. That's just an idea, but there might be design considerations I'm not aware of.