Sunday, January 16th 2022

Intel "Raptor Lake" Rumored to Feature Massive Cache Size Increases

Large on-die caches are expected to be a major contributor to IPC and gaming performance. The upcoming AMD Ryzen 7 5800X3D processor triples its on-die last-level cache using the 3D Vertical Cache technology, to level up to Intel's "Alder Lake-S" processors in gaming, while using the existing "Zen 3" IP. Intel realizes this, and is planning a massive increase in on-die cache sizes, although spread across the cache hierarchy. The next-generation "Raptor Lake-S" desktop processor the company plans to launch in the second half of 2022 is rumored to feature 68 MB of "total cache" (that's AMD lingo for L2 + L3 caches), according to a highly plausible theory by PC enthusiast OneRaichu on Twitter, and illustrated by Olrak29_.

The "Raptor Lake-S" silicon is expected to feature eight "Raptor Cove" P-cores, and four "Gracemont" E-core clusters (each cluster amounts to four cores). The "Raptor Cove" core is expected to feature 2 MB of dedicated L2 cache, an increase over the 1.25 MB L2 cache per "Golden Cove" P-core of "Alder Lake-S." In a "Gracemont" E-core cluster, four CPU cores share an L2 cache. Intel is looking to double this E-core cluster L2 cache size from 2 MB per cluster on "Alder Lake," to 4 MB per cluster. The shared L3 cache increases from 30 MB on "Alder Lake-S" (C0 silicon), to 36 MB on "Raptor Lake-S." The L2 + L3 caches hence add up to 68 MB. All eyes are now on "Zen 4," and whether AMD gives the L2 caches an increase from the 512 KB per-core size that it's consistently maintained since the first "Zen."
Sources: OneRaichu (Twitter), Olrack (Twitter), HotHardware
Add your own comment

66 Comments on Intel "Raptor Lake" Rumored to Feature Massive Cache Size Increases

#51
efikkan
Chrispy_The best thing about Raptor Lake isn't the cache, it's the additional 8 E-cores.

8 P-cores is enough for the moment, and based on historic trends, enough for a decade or more.
You are partly right.
8 P-cores is plenty for most workloads for the foreseeable future. Possibly longer than we might even think based on historic trends, as scaling with more threads will ultimately have diminishing returns. The workloads which can scale to 8+ threads are non-interactive batch loads (like non-realtime rendering, video encoding, server batch loads, etc.), where the work chunks are significantly large enough where a large pool of threads can be saturated and the relative overhead becomes negligible.
Interactive applications needs workloads to finish ~5ms to be responsive, which means that the scaling limit of such workloads depends on the nature of the workloads, i.e. how much of it can be done independently. As most workloads generally are pipelines (chains of operations) with dependencies to other data, there will always be diminishing returns. It's not like anything can be divided into 1000 chunks and fed to a thread pool when there are need for synchronization, and such dependencies often means all threads may have to wait for the slowest thread before all of them can proceed to the next step, and if just one of them is slightly delayed (e.g. by the OS scheduler), we could be talking about ms of delay, which is too severe when each chunk need to sync up in a matter of micro seconds to keep things flowing. So in conclusion, we will "never" see interactive applications in general scale well beyond 8 threads, not until a major hardware change, new paradigm etc. This is not just about software.

Where you don't hit the mark is about the extra E-cores. These are added because it's a cheap way for PC makers to sell you a "better" product. We will probably see a "small core race" now that the "big core race" and the "clockspeed race" have hit a wall. So this is all about marketing. :)
Posted on Reply
#52
Chrispy_
lexluthermiesterThat depends greatly on the Quad core being discussed. The original Core2Quads are not good for much on a modern level and they were struggling even in 2018. The second gen C2Q line held up much better because of the higher FSB and performance. There users here in the forums who are still running them. The early AMD quads didn't fair so well. They were irrelevant in 2015.

Bring that forward and the 6+4core and above Rocket Lake model will likely last 8 to 10 year barring a massive breakthrough in IC substrate materials.
The chip that sticks in everyone's mind is Sandy Bridge. That was a bit of a blip in terms of historic IPC progression, but a 2600K is still doing a passable job today in most software. Yes, much faster CPUs exist but its eight threads are enough to perform adequately in modern multithreaded software/game engines that didn't even exist for 7-8 years after Sandy Bridge was replaced by the next generation.
efikkan8 P-cores is plenty for most workloads for the foreseeable future. Possibly longer than we might even think based on historic trends, as scaling with more threads will ultimately have diminishing returns.
Yeah, "a decade or more" was a very quick off-the cuff remark based on the nearest round number. It may only be 7 years, it may be 17 years; None of us have a working crystal ball so we can only go on current and historic data and make predictions on that. Either way, I am guessing that the need for increasingly larmore P-core threads will slow down for mainly the reasons you say - it's going to require a completely different method of software development to replace the diminishing returns we're seeing with the current x86 architecture and current OS schedulers.
efikkanWhere you don't hit the mark is about the extra E-cores. These are added because it's a cheap way for PC makers to sell you a "better" product. We will probably see a "small core race" now that the "big core race" and the "clockspeed race" have hit a wall. So this is all about marketing. :)
I dunno. I have extensive experience of 100% E-core only Avoton and Denverton atom platforms for massively scaleable workloads (routers, firewalls etc). They absolutely rule when it comes to performance/Watt and performance/socket. 8 Atom cores using smaller silicon at 30W are giving ~2x the performance of a 2C/4T Xeon of the same generation, and that Xeon is larger more expensive silicon and consumes 45W despite being utterly outclassed in performance.

Once you have enough P-cores for the "Interactive applications needs workloads to finish ~5ms to be responsive" all spare silicon area should be spent on E-cores. They just better at saleable, non-interactive workloads in terms of efficiency and density. The concessions that make them unsuitable for interactive workloads are the exact same advantages that give them such a huge advantage in die area and power consumption.

We're still in the early days of hybrid architectures on Windows but once the teething troubles are ironed out I firmly believe that the E-core count race is on. It will be especially important in the ULP laptop segment where efficiency and power consumption are far more important than on desktop. I'm actually super excited for the Ultra-Mobile silicon Intel showed during their architecture day; Big IGP, and massive BIAS towards E-cores - 2P, 8E.

Posted on Reply
#53
Max(IT)
ncrsAlder Lake has already increased cache latency in comparison to Rocket Lake. If they go even further we might arrive in a situation where Zen 3 will have almost half the cache latency of Raptor Lake. But in the end we'll have to wait for benchmarks, and even then it is going to be workload-dependent.



There's nothing wrong with having some fun.


It's supposed to go over the existing cache to prevent thermal issues of the cores.
Well… we don’t know about 5800X3D cache latency yet.
bugI think I've said this before, but caches today are bigger than my first HDD (42MB).
I think my first hard disk was like 20 MB… Intel 8086 era.
PunkenjoyI got few remark on this:

- It would be great if lower SKU would still keep the full 36MB L3 cache, but that do not seems to be the case.
this is not going to happen with Intel and their artificial segmentation for marketing.
we are speaking about the manufacturer who played around HT and core counting for years just for segmentation
Posted on Reply
#54
Unregistered
Chrispy_The chip that sticks in everyone's mind is Sandy Bridge. That was a bit of a blip in terms of historic IPC progression, but a 2600K is still doing a passable job today in most software. Yes, much faster CPUs exist but its eight threads are enough to perform adequately in modern multithreaded software/game engines that didn't even exist for 7-8 years after Sandy Bridge was replaced by the next generation.

Yeah, "a decade or more" was a very quick off-the cuff remark based on the nearest round number. It may only be 7 years, it may be 17 years; None of us have a working crystal ball so we can only go on current and historic data and make predictions on that. Either way, I am guessing that the need for increasingly larmore P-core threads will slow down for mainly the reasons you say - it's going to require a completely different method of software development to replace the diminishing returns we're seeing with the current x86 architecture and current OS schedulers.

I dunno. I have extensive experience of 100% E-core only Avoton and Denverton atom platforms for massively scaleable workloads (routers, firewalls etc). They absolutely rule when it comes to performance/Watt and performance/socket. 8 Atom cores using smaller silicon at 30W are giving ~2x the performance of a 2C/4T Xeon of the same generation, and that Xeon is larger more expensive silicon and consumes 45W despite being utterly outclassed in performance.

Once you have enough P-cores for the "Interactive applications needs workloads to finish ~5ms to be responsive" all spare silicon area should be spent on E-cores. They just better at saleable, non-interactive workloads in terms of efficiency and density. The concessions that make them unsuitable for interactive workloads are the exact same advantages that give them such a huge advantage in die area and power consumption.

We're still in the early days of hybrid architectures on Windows but once the teething troubles are ironed out I firmly believe that the E-core count race is on. It will be especially important in the ULP laptop segment where efficiency and power consumption are far more important than on desktop. I'm actually super excited for the Ultra-Mobile silicon Intel showed during their architecture day; Big IGP, and massive BIAS towards E-cores - 2P, 8E.

So all the joking about the E cores being useless was crap. I think it is a great idea from Intel to give it a go. It is the first desktop implementation of it (i believe) so getting the bugs ironed out will make it even better. ADL will get better over the next X months imo. And i really believe once AMD see the benefits, they will have a bigLittle CPU themselves before long.
#55
bug
Max(IT)I think my first hard disk was like 20 MB… Intel 8086 era.
I went straight for the modern 80286 ;)
But I do know a guy who had an 8086. No HDD at all, he'd load games off floppy disks. He even learned how to shuffle files around so that the "please insert disk into drive a:" message would pop up as few times as possible.
Posted on Reply
#56
efikkan
Chrispy_None of us have a working crystal ball so we can only go on current and historic data and make predictions on that. Either way, I am guessing that the need for increasingly larmore P-core threads will slow down for mainly the reasons you say - it's going to require a completely different method of software development to replace the diminishing returns we're seeing with the current x86 architecture and current OS schedulers.
Yes, it's very hard to predict ~20-50 years ahead. But it's fairly obvious that new hardware paradigms is likely needed to see different scaling in mulithreading. Far too many think we soon will be running thousands of RISC-V cores or something, but it's simply not scalable for this kind of workloads. OS scheduling overhead is increasing with thread and core count, and the latency of modern kernels are in the 0.1 - 1 ms range for desktops, up to 20 ms for Windows in power saving mode on laptops, so imagine how quickly this becomes a problem if you have a workload of thousands of small chunks, even if the scheduler is just sub-optimal at times, the overhead quickly slows down everything. Having a different kind of scheduling akin to a real time kernel etc. can reduce such problems, but you still have the hardware latencies involved.

One good contender for the direction we may be heading is Intel's research into "threadlets". Something in this direction is what I've been expecting based on seeing the challenges of low level optimization. So the idea is basically to have smaller "threads" of instructions only existing inside the CPU core for a few clock cycles, transparent to the OS. But Intel have tried and failed before to tackle these problems with Itanium, and there is no guarantee they will be first to succeed with a new tier of performance. But if we find a sensible, efficient and scalable way to express parallelism and constraints on a low level, then it is certainly possible to scale performance to new levels without more high level threads. There is a very good chance that new paradigms will play a key role too, paradigms that I'm not aware of yet.
Posted on Reply
#57
Chrispy_
TiggerSo all the joking about the E cores being useless was crap. I think it is a great idea from Intel to give it a go. It is the first desktop implementation of it (i believe) so getting the bugs ironed out will make it even better. ADL will get better over the next X months imo. And i really believe once AMD see the benefits, they will have a bigLittle CPU themselves before long.
Joking? Perhaps from people who don't have any experience with them. And yes, in terms of outright performance compared to the Golden Cove P cores, the E-cores are barely half as fast - if you don't understand that their design criteria are die area and power efficiency, yeah they may look like a joke.

E-cores are far from useless though. In addition to firewall and router appliances with Avoton/Denverton Atoms I've setup plenty of NUCs with Pentium Silver or Celeron J CPUs comprised entirely of what is now called an E-core and whilst they're no speed demons they're relatively competent machines that can be passively cooled at 10W, something that none of the *-Lake architectures have managed for a very long time.

All the problems with them have genuinely been scheduler teething troubles where threads requiring high performance and maximum response time were incorrectly scheduled for the E-cores and effectively given a very low priority by mistake.

As the scheduler bugs get ironed out and tweaks to the Intel Thread Director get made, the perfect blend of P+E cores working together to do the things that each is best at will show that the hybrid architecture is the way forward.

I mean, I could be wrong, but big.LITTLE has been dominatingly superior in smartphones. No flaghship phone CPU is made any other way now.
efikkanOne good contender for the direction we may be heading is Intel's research into "threadlets". Something in this direction is what I've been expecting based on seeing the challenges of low level optimization. So the idea is basically to have smaller "threads" of instructions only existing inside the CPU core for a few clock cycles, transparent to the OS. But Intel have tried and failed before to tackle these problems with Itanium, and there is no guarantee they will be first to succeed with a new tier of performance. But if we find a sensible, efficient and scalable way to express parallelism and constraints on a low level, then it is certainly possible to scale performance to new levels without more high level threads. There is a very good chance that new paradigms will play a key role too, paradigms that I'm not aware of yet.
The bottleneck here is Microsoft who still haven't finished making Windows 8, whilst rebranding it 8.1, 10, 11 etc. 11 isn't new, it's just yet more lipstick on the unfinished pig that is halfway towards the full ground-up OS redesign that they promised with Project Longhorn (turned into Vista!). Vista SP2 finally gave Microsoft a revamp of underlying backend stuff. Windows 8 was the start of the frontend revamp. Once the Windows NT management consoles and legacy control panel settings are fully migrated to the native modernUI, then Microsoft will finally have achieved what they promised with Longhorn that they started on in May 2001, almost 22 years ago.

Expecting a high-quality, futuristic scheduler from those useless clowns is pointless. In 22 years they've basiscally failed to finish their original project. That might be done within 25 years. Expecting them to competently put together a working scheduler that does things the way we all hope they will is crazy. We might as well just howl at the moon until we're all old and grey for all the good that wishful thinking will do ;)
Posted on Reply
#58
lexluthermiester
Chrispy_but a 2600K is still doing a passable job today in most software.
I'll agree with that.
Chrispy_Yes, much faster CPUs exist but its eight threads are enough to perform adequately in modern multithreaded software/game engines that didn't even exist for 7-8 years after Sandy Bridge was replaced by the next generation.
Even the i5-2500k is still reasonable for most tasks at 1080p and gaming at 720p.
Posted on Reply
#59
efikkan
Chrispy_The bottleneck here is Microsoft who still haven't finished making Windows 8, whilst rebranding it 8.1, 10, 11 etc. 11 isn't new, it's just yet more lipstick on the unfinished pig that is halfway towards the full ground-up OS redesign that they promised with Project Longhorn (turned into Vista!). Vista SP2 finally gave Microsoft a revamp of underlying backend stuff. Windows 8 was the start of the frontend revamp. Once the Windows NT management consoles and legacy control panel settings are fully migrated to the native modernUI, then Microsoft will finally have achieved what they promised with Longhorn that they started on in May 2001, almost 22 years ago.

Expecting a high-quality, futuristic scheduler from those useless clowns is pointless. In 22 years they've basiscally failed to finish their original project. That might be done within 25 years. Expecting them to competently put together a working scheduler that does things the way we all hope they will is crazy. We might as well just howl at the moon until we're all old and grey for all the good that wishful thinking will do ;)
Well, the largest improvement intended for Longhorn was the kernel overhaul/rewrite, which has been postponed time after time. Instead they delivered small incremental changes in terms of patchwork on their old kernel, along with major UI overhauls. After 22 years, I don't know if they are actively pursuing this same overhaul any more. But the recent years of >6 core mainstream CPUs, and even more with hybrid designs, have revealed to many the shortcomings in their scheduler. Some of us have known about the obsolete kernel for two decades, and know that overhauling this kernel would solve most of Windows' stability and security problems. I honestly believe that they should stop focusing on anything fancy and gimmicks. Just make a stable and configurable system.
It's no accident that more and more developers are moving to Linux. We need a stable and reliable platform to get work done.
Posted on Reply
#60
Adam Krazispeed
AMDs 5800X3D gonna have 96MB of just L3 I think AMD WINS HERE, AND AM5 GONNA BE NUTZ/ AM5 IHS HAS TINY SPRINGS BETWEEN IHS AND SUBSTRAIGHT : :cool: :cool: :peace: :cool: :peace: :cool: :peace: :cool: :cool: :peace: :cool: :cool: :cool: :cool:
Posted on Reply
#61
Cutechri
Adam KrazispeedAMDs 5800X3D gonna have 96MB of just L3 I think AMD WINS HERE, AND AM5 GONNA BE NUTZ/ AM5 IHS HAS TINY SPRINGS BETWEEN IHS AND SUBSTRAIGHT : :cool: :cool: :peace: :cool: :peace: :cool: :peace: :cool: :cool: :peace: :cool: :cool: :cool: :cool:
You better hope neither company wins. You want the battle to go on for as long as possible.

Anyway, this is good. 8 P-Cores is definitely enough for gaming for years to come, what needs improvement now is the cache sizes. Intel has seen this in AMD's CPUs and I'm glad they're following.
Chrispy_The bottleneck here is Microsoft who still haven't finished making Windows 8, whilst rebranding it 8.1, 10, 11 etc. 11 isn't new, it's just yet more lipstick on the unfinished pig that is halfway towards the full ground-up OS redesign that they promised with Project Longhorn (turned into Vista!). Vista SP2 finally gave Microsoft a revamp of underlying backend stuff. Windows 8 was the start of the frontend revamp. Once the Windows NT management consoles and legacy control panel settings are fully migrated to the native modernUI, then Microsoft will finally have achieved what they promised with Longhorn that they started on in May 2001, almost 22 years ago.

Expecting a high-quality, futuristic scheduler from those useless clowns is pointless. In 22 years they've basiscally failed to finish their original project. That might be done within 25 years. Expecting them to competently put together a working scheduler that does things the way we all hope they will is crazy. We might as well just howl at the moon until we're all old and grey for all the good that wishful thinking will do ;)
After experiencing Loonix and its cultist userbase for months, I ended up with 11 again for a good reason. The "clowns" at Microsoft are still the only ones who managed to give me an OS that does everything I want it to do and more, including running my expensive VR headset and my competent video editing software. And also handling my CPU better. I never saw 5.2 GHz and proper scheduling out of my 5900X on Linux.

If Linux ever wants to overtake Windows and finally have that mythical Year of the Linux desktop (which was supposed to comes ages ago, then now, then in the future), then they'd need a culling of their crappy user base (which will never happen) and accelerating in compatibility (which will be as slow as ReactOS). Because so far, Windows is still the best distro and compatibility layer I've used.

You have Linux users desperately trying to get people to ditch Windows, and when they finally do and choose a beginner friendly distro like Ubuntu, they scream and howl for not choosing this or that elitist distro with tons of work needed to get it running properly. They should be glad they even managed to get people to ditch Windows in the first place.
Posted on Reply
#62
lexluthermiester
CutechriYou better hope neither company wins. You want the battle to go on for as long as possible.
This. And the competition from ARM is doing wonders for things too.
Posted on Reply
#63
Max(IT)
Adam KrazispeedAMDs 5800X3D gonna have 96MB of just L3 I think AMD WINS HERE, AND AM5 GONNA BE NUTZ/ AM5 IHS HAS TINY SPRINGS BETWEEN IHS AND SUBSTRAIGHT : :cool: :cool: :peace: :cool: :peace: :cool: :peace: :cool: :cool: :peace: :cool: :cool: :cool: :cool:
Are you AMD shareholder or just a fan/supporter ?
I frankly don’t mind who is “winning “, and I don’t think that an 1 years old CPU (actually more a 1 year and half old CPU…) with more cache is something to be happy about. And I’m sure 5800X3D price will be hilarious. I’m an AMD customer, and I’m not happy with what AMD is commercially doing lately: Ryzen Zen 3 pricing and some technical choices on RDNA2 products (that embarrassing joke the 6500XT is , for instance) make me think AMD need competition because they are becoming more and more greedy over the time.
Posted on Reply
#64
Nater
Are we going to need a new mainboard for these coming from Alder Lake?
Posted on Reply
#65
TheoneandonlyMrK
NaterAre we going to need a new mainboard for these coming from Alder Lake?
No just a bios update probably.
Posted on Reply
#66
Max(IT)
TheoneandonlyMrKNo just a bios update probably.
I would say: hopefully not :rolleyes:

With Intel, you never know…
Posted on Reply
Add your own comment
Nov 21st, 2024 18:16 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts