Monday, November 2nd 2020

Intel Rocket Lake-S CPU Benchmarked: Up to 22% Faster Compared to the Previous Generation

Just a few days ago, Intel has decided to surprise us and give out information about its upcoming Rocket Lake-S platform designed for desktop users. Arriving early next year (Q1) the Rocket Lake-S platform is yet another iteration of the company's 14 nm node. However, this time we are getting some real system changes with a new architecture design. Backporting its Golden Cove core to 14 nm, Intel has named this new core type Cypress Cove. What used to be the heart of Ice Lake CPUs, is now powering the Rocket Lake-S platform. Besides the new core, there are other features of the platform like PCIe 4.0, new Xe graphics, and updated media codecs. You can check that out here.

Today, we have gotten the first benchmarks of the Intel Rocket Lake-S system. In the Userbenchmark bench, an unknown eight-core Rocket Lake CPU has been compared to Intel's 10th generation Comet Lake-S processors. The Rocket Lake engineering sample ran at 4.2 GHz while scoring a single-core score of 179. Compared to the Core i9-10900K that runs at 5.3 GHz, which scored 152 points, the Cypress Cove design is 18% faster. And if the new design is compared to the equivalent 8C/16T Compet Lake CPU like Core i7-10700K clocked at 5.1 GHz and scoring 148 points, the new CPU uarch is up to 22% faster. This represents massive single-threaded performance increases, however, please take the information with a grain of salt, as we wait for the official reviews.
Source: WCCFTech
Add your own comment

75 Comments on Intel Rocket Lake-S CPU Benchmarked: Up to 22% Faster Compared to the Previous Generation

#51
Enterprise24
Pretty impressive score with single channel 2666Mhz memory.

Posted on Reply
#52
Unregistered
Enterprise24Pretty impressive score with single channel 2666Mhz memory.

Hey it's Intel, AMD fans are blind to any kind of Intel achievement
#53
EatingDirt
Intel: "Please don't pay attention to the new Ryzen 5xxx CPU's coming out in a few days and keep talking about us and hold out for almost half a year for our new CPU's."

This would be interesting news.. if the CPU's weren't about half a year off, we didn't have new CPU's coming out in a few days and that these Rocket Lake-S CPU's supposedly only top out at 8 cores.
Posted on Reply
#54
londiste
mastrdrverBut they have by the very quote I used in my last post.

Why are you trying to make a distinction where there is not one?
There is absolutely a distinction.
- They have a benchmark app that runs actual benchmarks on computers. It returns certain set of numbers, bunch of x-thread CPU performance test results for example. This has not changed and results of these benchmarks have not changed either.
- Then they have a ranking or whatever on their page where they put some single number as CPU performance, calculating this from existing actual benchmark results. Their calculation formula for that number is what changed.
Enterprise24Pretty impressive score with single channel 2666Mhz memory.
These CPU benchmarks do not depend on memory speed.
Posted on Reply
#55
Aquinus
Resident Wat-man
Am I the only person who is impressed by a potential uarch improvement that could possibly be competitive in the short term, even on Intel's aging 14nm node? This is the kind of progress you can bring with you when you eventually do a die shrink. So assuming that this is true, this could also come with a TDP reduction for the same performance as earlier gens on the same process. Clearly it remains to be seen if that's the case, but that's certainly not a bad thing. If Intel manages to keep 14nm competitive, that should speak volumes to the quality of the uarch.

With that said, if I was going to buy a CPU today, I'd probably buy a 12c AMD chip. So while I might be moderately impressed by this rumor, my opinion does not change until I see tangible evidence of what Intel claims. If it pans out, then good for Intel. If it doesn't, then I'd still buy an AMD chip.
Posted on Reply
#56
seth1911
Cores here, Cores there

Im just playing a few MMORPG on PC, 99% of them cant use more than 4 Threads im still happy with my Haswell i5 underclocked @ 2,2 GHz:p


My next Upgrade will be a i3 Rocket Lake :laugh:
18% over Comet Lake
6 % Comet Lake over Haswell

24% IPC+ = lower Clockrates = lower Powerusage for me
Posted on Reply
#57
efikkan
AquinusAm I the only person who is impressed by a potential uarch improvement that could possibly be competitive in the short term, even on Intel's aging 14nm node? This is the kind of progress you can bring with you when you eventually do a die shrink. So assuming that this is true, this could also come with a TDP reduction for the same performance as earlier gens on the same process. Clearly it remains to be seen if that's the case, but that's certainly not a bad thing. If Intel manages to keep 14nm competitive, that should speak volumes to the quality of the uarch.
No you're not the only one.
Architecture is much more important than nodes, even though the 14nm node will certainly impose some restrictions.
Many are forgetting that Intel's 14nm++ is much closer to TSMC's 7nm in performance than GloFo's and TSMC's 16/14/12 nm class nodes. Intel's limitations really start to kick in around ~8 cores. Coffee Lake and Comet Lake encounters issues with energy density, so a well crafted implementation of Sunny Cove might actually manage to reduce the energy density and keep similar clock speeds.

I don't expect the 14nm node to be "competitive" (in synthetics or specific workloads) against 12c/16c designs. But I really don't think a mainstream platform have to do that, as users with workloads which actually scales with 12 cores or more often "needs" HEDT features like more memory channels and PCIe lanes. Interestingly enough, the Skylake architecture performs very well in productive workloads like Photoshop and Premiere. It will be very interesting to see how this evolves with Zen 3 and Rocket Lake, perhaps an 8-core Rocket Lake becomes highly relevant for some power users despite having only 8 cores.
Posted on Reply
#58
Patriot
seth1911Cores here, Cores there

Im just playing a few MMORPG on PC, 99% of them cant use more than 4 Threads im still happy with my Haswell i5 underclocked @ 2,2 GHz:p


My next Upgrade will be a i3 Rocket Lake :laugh:
18% over Comet Lake
6 % Comet Lake over Haswell

24% IPC+ = lower Clockrates = lower Powerusage for me
We get it, you chose intel because you don't value performance. Blue forever no matter the cost. So long as you use a kernel that came out after Zen.... Zen is happy. It really isn't hard. 2400g gave me some grief on 18.04 and even 19.04 but 19.10 or newer has a new enough kernel to work. As long as you aren't using an ancient Debian release you should be fine as well. Even Linus himself built himself an AMD box. www.phoronix.com/scan.php?page=news_item&px=Torvalds-Threadripper
Posted on Reply
#59
seth1911
Yeah and u payed 140 Bucks for 2400G and its IGP is still slower than a GT 1030 (GDDR5) for about 70 Bucks in Games:shadedshu:
And the CPU isnt faster than a 6 Year old i7 4770:laugh:

IPC
Zen = under Haswell
Zen + = similar to Haswell
Zen 2 = 15% over Skylake


BTW
Intel is in our Country cheaper to a similar AMD System:
10400F 140$, H410 Board 50$, RAM 45$ = 235$
3600 190$, A320 Board 60$, RAM 45$ = 295$ (with a A520 Board 80$ we are by 315$)
Posted on Reply
#60
Aquinus
Resident Wat-man
efikkanNo you're not the only one.
Architecture is much more important than nodes, even though the 14nm node will certainly impose some restrictions.
Many are forgetting that Intel's 14nm++ is much closer to TSMC's 7nm in performance than GloFo's and TSMC's 16/14/12 nm class nodes. Intel's limitations really start to kick in around ~8 cores. Coffee Lake and Comet Lake encounters issues with energy density, so a well crafted implementation of Sunny Cove might actually manage to reduce the energy density and keep similar clock speeds.

I don't expect the 14nm node to be "competitive" (in synthetics or specific workloads) against 12c/16c designs. But I really don't think a mainstream platform have to do that, as users with workloads which actually scales with 12 cores or more often "needs" HEDT features like more memory channels and PCIe lanes. Interestingly enough, the Skylake architecture performs very well in productive workloads like Photoshop and Premiere. It will be very interesting to see how this evolves with Zen 3 and Rocket Lake, perhaps an 8-core Rocket Lake becomes highly relevant for some power users despite having only 8 cores.
I use the i9 9880H in my MacBook Pro for dev and I can't say I have any complaints with it. The only time I fully load the chip is when I'm bringing the entire system I work on up at the same time. The chip gets toasty under load though, but what can you expect from a mobile 8c chip that has a short duration power limit of 95w and a long duration limit of 65w in a fairly thin laptop.

All in all, I'm hoping to build an AMD machine in a couple months if everything pans out right. I have an itch just to build a mini itx machine. That way I can actually compare the two and have a machine that's a little more capable for gaming should I want the performance of a desktop.
Posted on Reply
#61
SIGSEGV
tiggerHey it's Intel, AMD fans are blind to any kind of Intel achievement
LMAO.
what kind of achievement do you wanna show? Leading and cheating the desktop/pc/mobile CPU for decades with 4 cores CPU?
show me pls
seth1911Cores here, Cores there

Im just playing a few MMORPG on PC, 99% of them cant use more than 4 Threads im still happy with my Haswell i5 underclocked @ 2,2 GHz:p


My next Upgrade will be a i3 Rocket Lake :laugh:
18% over Comet Lake
6 % Comet Lake over Haswell

24% IPC+ = lower Clockrates = lower Powerusage for me
LOL, what a pity.
ahemm...sorry
Posted on Reply
#62
DemonicRyzen666
Ok No one is going to talk about the Latency there 81.2ns ? and still have a 22% increase in IPC, I'm very skeptical of that since we've seen ES samples doing 5.5GHz.
Posted on Reply
#63
watzupken
AquinusAm I the only person who is impressed by a potential uarch improvement that could possibly be competitive in the short term, even on Intel's aging 14nm node? This is the kind of progress you can bring with you when you eventually do a die shrink. So assuming that this is true, this could also come with a TDP reduction for the same performance as earlier gens on the same process. Clearly it remains to be seen if that's the case, but that's certainly not a bad thing. If Intel manages to keep 14nm competitive, that should speak volumes to the quality of the uarch.

With that said, if I was going to buy a CPU today, I'd probably buy a 12c AMD chip. So while I might be moderately impressed by this rumor, my opinion does not change until I see tangible evidence of what Intel claims. If it pans out, then good for Intel. If it doesn't, then I'd still buy an AMD chip.
IPC improvement is always welcome. While it is possible to back port this to 14nm, you can certainly tell it’s becoming a serious limitation. They don’t have enough die space for anything more than 8 cores. And to squeeze out performance from it, they had to forego power efficiency as well.
Posted on Reply
#64
ratirt
DemonicRyzen666Ok No one is going to talk about the Latency there 81.2ns ? and still have a 22% increase in IPC, I'm very skeptical of that since we've seen ES samples doing 5.5GHz.
Not all applications are latency sensitive.
22% increase in performance. In this particular bench yeah. I'd rather see the entire benchmark suite to judge any CPU. Claims is one thing and most of the time they are about one benchmark that is supposedly be the one to tell what the performance is? I will wait for reviews and more information about this new Intel CPU. March isn't so far away so Intel better buckle up and bring something to the game cause for now it's just embarrassing.
Posted on Reply
#65
owen10578
londisteThat is a very short-sighted statement. Their benchmark results have been rather solid on almost all fronts. All the controversy is about interpreting and presenting them.
There isn't a single shred of trust I have left in userbenchmark. For all I know they could be fudging up the numbers here to make intel look good in rumors after the smooth brain moves they constantly did.
Posted on Reply
#66
Aquinus
Resident Wat-man
watzupkenIPC improvement is always welcome. While it is possible to back port this to 14nm, you can certainly tell it’s becoming a serious limitation. They don’t have enough die space for anything more than 8 cores. And to squeeze out performance from it, they had to forego power efficiency as well.
AMD doesn't have die space for more than 8 cores either, that's why they adopted the MCM/Chiplet design. Honestly, I can't say I'm disappointed with the 9880H in my laptop though. Sure it runs hot, but it's a 8c chip in a laptop that boosts higher than the OC I had on my 3930k. That isn't to say AMD doesn't make a good chip, but AMD making better chips doesn't suddenly make Intel's chips bad.
Posted on Reply
#67
efikkan
AquinusI use the i9 9880H in my MacBook Pro for dev and I can't say I have any complaints with it. The only time I fully load the chip is when I'm bringing the entire system I work on up at the same time. The chip gets toasty under load though, but what can you expect from a mobile 8c chip that has a short duration power limit of 95w and a long duration limit of 65w in a fairly thin laptop.
I'm pretty sure even the current desktop offerings from AMD and Intel would be a nice speedup for development, but I expect the upcoming ones to be worth the wait.
Such workloads vary between users, but generally high core speed is important if you're doing a lot of small recompilations, or if you're using an IDE. While large core counts mostly benefits large build jobs. You should look for the one that's the right balance for you.
AquinusAll in all, I'm hoping to build an AMD machine in a couple months if everything pans out right. I have an itch just to build a mini itx machine. That way I can actually compare the two and have a machine that's a little more capable for gaming should I want the performance of a desktop.
I just threw out my mini ITX machine, so much hassle.
I got a Fractal Design 7 XL instead, so easy to work with, and I'll probably buy some more if there are some good deals soon. :)
watzupkenIPC improvement is always welcome. While it is possible to back port this to 14nm, you can certainly tell it’s becoming a serious limitation. They don’t have enough die space for anything more than 8 cores. And to squeeze out performance from it, they had to forego power efficiency as well.
Die space is not a problem, total power consumption and power density is.


Posted on Reply
#68
mastrdrver
londisteThere is absolutely a distinction.
- They have a benchmark app that runs actual benchmarks on computers. It returns certain set of numbers, bunch of x-thread CPU performance test results for example. This has not changed and results of these benchmarks have not changed either.
- Then they have a ranking or whatever on their page where they put some single number as CPU performance, calculating this from existing actual benchmark results. Their calculation formula for that number is what changed.
Yet you said (originally) that they have not changed the results, which they have and which is my point.

The output score of a benchmark is not normally referred to as an "interpretation". It's normally referred to as the result. What is done with the final score is the interpretation part as this is what the professionals have been doing for a long time.
Posted on Reply
#69
londiste
mastrdrverThe output score of a benchmark is not normally referred to as an "interpretation". It's normally referred to as the result. What is done with the final score is the interpretation part as this is what the professionals have been doing for a long time.
OK, and how does this differ from what I said? Benchmark results were and are the same, these have not changed.
What all the links and quotes are about are not the benchmark results. These are about the interpretation of the results Userbenchmark did for their CPU rating and named it effective speed.

Look at the screenshot in the news bit:
www.techpowerup.com/img/50rvZk29ud256ELe.jpg
1-core 179, 2-core 368, 4-core 682, 8-core 1115 and 64-core 1623 are results.
Percentages are comparison with base results (I believe all of these are compared to a 9900K).
Effective speed is up on the CPU page, I actually missed it on the first try. Both problematic things - Real World Speed and Effective Speed - seem to be primarily on the compare page.
And now I have spent more time and effort on looking at strange numbers from Userbenchmark than I really cared to.
Posted on Reply
#70
AMDK11
londisteSkylake and derivatives > Sunny Cove (in Ice Lake) > Willow Cove (in Tiger Lake) > Golden Cove (in Alder Lake)
Rocket Lake's Cypress Cove is between Sunny Cove and Willow Cove, a modified/improved Sunny Cove backported to 14nm.
The logic of the x86 SusnnyCove and WillowCove cores is the same. The differences are mainly in the cache subsystem.

Skylake L1-D 32KB, L2 256KB and L3 2MB (Inclusive)

SunnyCove L1-D 48KB, L2 512KB and L3 2MB (Inclusive)

CypressCove L1-D 48KB, L2 512KB and L3 2MB (Inclusive)

Skylake-X L1-D 32KB, L2 1MB and L3 1.375MB (Non-Inclusive)

WillowCove L1D 48KB, L2 1.25MB and L3 3MB (Non-Inclusive)

Inclusive cache means that copy of L1 is in L2 and copy of L2 is in L3. So the x86 core does not need to download from RAM which is much slower if it needs the same data again, only from L2 or L3.
The same is true if core b) is to run on the same data set as core a) and instead of asking core a) for L1 or L2, it copies the currently processed data by core a) from L3. A lot of software, including games, is sensitive to fast communication between the cores, so the Inclusive cache is a faster solution in this case.

The Non-Inclusive cache works differently because L1 has no copy in L2 and L2 has no copy in L3. Zuski are there where soft is sensitive to the L2 cache capacity and gains soft strongly multi-threaded with independent threads.

Inclusive is better for multi-threaded software whose threads are dependent on each other running on separate cores.

Non-Inclusive better for multi-threaded software, heavily independent threads running on separate cores.

CypressCove is exactly SunnyCove, possibly also in microcode with minor corrections. SunnyCove gains an average of 18% higher IPCs than Skylake. Which means that in one application it can have 10% higher IPC and in another 25-30%.
Posted on Reply
#71
mastrdrver
londisteOK, and how does this differ from what I said? Benchmark results were and are the same, these have not changed.
What all the links and quotes are about are not the benchmark results. These are about the interpretation of the results Userbenchmark did for their CPU rating and named it effective speed.

Look at the screenshot in the news bit:

1-core 179, 2-core 368, 4-core 682, 8-core 1115 and 64-core 1623 are results.
Percentages are comparison with base results (I believe all of these are compared to a 9900K).
Effective speed is up on the CPU page, I actually missed it on the first try. Both problematic things - Real World Speed and Effective Speed - seem to be primarily on the compare page.
And now I have spent more time and effort on looking at strange numbers from Userbenchmark than I really cared to.
My point, and yours, is on what we're referring to as the results.

I think you're splitting a hair by calling a benchmark result (i.e. the final number) "interpretation".

While you do not seem to think that and the way the final number is tallied is an interpretation of the individual tests.

My point is that no one else calls the output, or combined number, the interpretation of the individual results. For example, no one does that with any of the 3DMark benchmark tools.

I understand your point, but disagree with your argument.

Thanks for the chat.
Posted on Reply
#72
Arc1t3ct
Intel need a change in management ASAP. It's crystal clear now.

Also intel fanboys need to chill. Competition is good for EVERYONE.
Posted on Reply
#73
efikkan
AMDK11The logic of the x86 SusnnyCove and WillowCove cores is the same. The differences are mainly in the cache subsystem.
<snip>
CypressCove is exactly SunnyCove, possibly also in microcode with minor corrections. SunnyCove gains an average of 18% higher IPCs than Skylake. Which means that in one application it can have 10% higher IPC and in another 25-30%.
That's my understanding too, but I haven't found any confirmation.
Cypress Cove was designed after Willow Cove, so theoretically there is a possibility for some additional tweaks, but I haven't found any evidence of this yet.
AMDK11Inclusive is better for multi-threaded software whose threads are dependent on each other running on separate cores.

Non-Inclusive better for multi-threaded software, heavily independent threads running on separate cores.
There is probably a good reason why both Intel and AMD have moved to non-inclusive caches for their most recent designs. I'm not sure there is a real "advantage" of inclusive caches at all in practice, especially with high core count and maintaining cache integrity. I think it's mostly a legacy thing, they just started to design L3 cache that way. But you're welcome to prove me wrong.

Multithreaded scaling is very hard to get right. Having multiple threads depend on each other is a recipe for poor scaling. Data hazards is commonly a killer of multithreaded performance. The best way to do it is to have the threads work as independent as possible, and sync up as little as possible. There is probably some edge case out there, but you hopefully get my point.
Posted on Reply
#74
AMDK11
efikkanThat's my understanding too, but I haven't found any confirmation.
Cypress Cove was designed after Willow Cove, so theoretically there is a possibility for some additional tweaks, but I haven't found any evidence of this yet.


There is probably a good reason why both Intel and AMD have moved to non-inclusive caches for their most recent designs. I'm not sure there is a real "advantage" of inclusive caches at all in practice, especially with high core count and maintaining cache integrity. I think it's mostly a legacy thing, they just started to design L3 cache that way. But you're welcome to prove me wrong.

Multithreaded scaling is very hard to get right. Having multiple threads depend on each other is a recipe for poor scaling. Data hazards is commonly a killer of multithreaded performance. The best way to do it is to have the threads work as independent as possible, and sync up as little as possible. There is probably some edge case out there, but you hopefully get my point.
1. Intel has officially confirmed that CypressCove is the same as Icelandake that is SunnyCove and IGP is Tigerlake. The only difference between SunnyCove and WillowCove is the L2 cache from 512KB to 1.25MB, L3 from 2MB to 3MB and the Inclusive to Non-Inclusive type. CypressCove has L2 512KB and L3 2MB of Inclusive type, i.e. SunnyCove at 14nm.

2. It's true and I can't deny it :) The trend is towards Epyc / Xeon and there is important scaling and the highest possible performance of a single core thanks to the large L2. The relationship between the threads goes to the background, which is confirmed by leaks about the Alderlake system, i.e. GoldenCove. GoldenCove has the same cache type and capacity as WillowCove, ie L2 1.25MB and L3 3MB non-inclusive. You can see that Intel will no longer develop two separate microarchitecture in the era of more and more cores.
Posted on Reply
#75
efikkan
AMDK112. It's true and I can't deny it :) The trend is towards Epyc / Xeon and there is important scaling and the highest possible performance of a single core thanks to the large L2. The relationship between the threads goes to the background, which is confirmed by leaks about the Alderlake system, i.e. GoldenCove. GoldenCove has the same cache type and capacity as WillowCove, ie L2 1.25MB and L3 3MB non-inclusive. You can see that Intel will no longer develop two separate microarchitecture in the era of more and more cores.
Sure, but let me clarify why I believe an inclusive L3 cache isn't beneficial to even some real world multithreaded workloads;
When a cache line is prefetched into L2, an inclusive cache would immediately duplicate this into L3. Then when the cache line is evicted from L2, it would normally be copied to L3, but instead it's already there. The only advantage of having a copy in L3 is in case another core needs it within that very short window before it's moved there anyway, I believe this window isn't many thousands of clock cycles, way too short for any software to be "optimized" for it. The disadvantages are numerous; more wasted die space, increased complexity to maintain integrity with higher core counts, L3 waste increases if L2 increases in size, etc.

I would argue there are other ways to make the cache more efficient, if the concern is sharing cache between cores. As you know, L1 is split into two separate caches; instruction and data. If L2 were split similarly, the design could take advantage of the different usage patterns. Data cache lines have often higher throughput and is rarely shared between threads, while instruction cache lines are often used repeatedly and by many cores within a short time. If L3 were split, L3I could be tightly interconnected, and L3D could be larger and more off-die. That's just an idea, but there might be design considerations I'm not aware of.
Posted on Reply
Add your own comment
Dec 18th, 2024 06:59 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts