Monday, December 6th 2021

Intel Prepares Raptor Lake Designs With 24 Cores and 32 Threads, More E-Cores This Time

With the launch of Intel's Alder Lake processors, Intel has switched from a homogeneous to a heterogeneous design of processors, where smaller, high-efficiency cores are mixed with high-performance cores to create a highly efficient and high-performance processor for all kinds of workloads. And it seems like Intel is not over with adding more E-cores to its future products, as the latest leaks suggest. According to the BAPCO's Crossmark benchmark database, Intel's upcoming Raptor Lake processors will feature more E-cores than the high-performance P-cores in the SoC design. As to why this design choice is present, we are not sure and don't have a definitive answer.

E-Cores are suitable for background tasks, and adding more would potentially leave space for P-cores to do heavier workloads. In the benchmark submission, which is now offline, the samples used were a configuration with eight P-cores and sixteen E-cores. Since the big cores are hyperthreaded, it makes up for a total composition of 24 cores with 32 threads. The platform "RPL-S ADP-S DDR5 UDIMM OC CRB" was used with DDR5-4800 memory, indicating an early stage engineering sample with a probably unfinished memory controller. The Raptor Lake generation will also use LGA 1700 socket, DDR5 memory and be present in the desktop and mobile sector once it launches in Q4 of 2022. It will also use Intel's 7 semiconductor manufacturing process, similar to Alder Lake. The only difference with the next-generation design is the updated Raptor Cove core design that brings a significant IPC uplift.
Sources: Tom's Hardware, KOMACHI_ENSAKA (Twitter), via VideoCardz
Add your own comment

81 Comments on Intel Prepares Raptor Lake Designs With 24 Cores and 32 Threads, More E-Cores This Time

#51
Mussels
Freshwater Moderator
Intel have to do the E-cores because their main cores are too power hungry, and they want to win multi threaded benchmarks


I'd rather have 4-8 E-cores dedicated to the OS, and the big boys for programs and games
Posted on Reply
#52
InVasMani
phanbueyI like the e cores after having used them. Maybe in the perfect no crapware or background apps bench setups used by reviews they don't make much sense, but for my use case they seem work great.

I've tested my max overclock on the 12600k at 5.4 ghz and 47 ring with e-cores off and my 24/7 5.3 ghz with 43 ring with e-cores on (overclocked to 4.3ghz) and having them on very noticeably eliminates intermittent stutters in cyberpunk and far cry 6. Could be just my setup but they seem to really work (especially because I'm too lazy to shut down all my background stuff).

Also I get 90% of 12700K multithread perf at ~186W which is pretty nice - not something 8 P cores by themselves can do afaik. 12900K with 10P cores would probably get lower multithreaded performance than current 12900k 8p/8e for 100W more draw, and 0 benefit in virtually any current real-world application.



This is probably true. Stacked cache looks insane.
Finally info on someone that's intentionally overclocked the E cores. Have you tried aggressively reducing the P cores multipliers to see if the E cores can clock higher if the heat output from the P cores isn't a primary limitation to it!!? The E cores appear more efficient for the die space area than the P cores relative to the die space. To me clocking them higher is a no-brainer if you want higher overall performance. How did you go about overclocking them have you tried BCLK!!? There are some advantages to a BCLK overclock in that it raises the memory ratio speed much like infinity fabric. You can probably get better memory results as well like with infinity fabric overclocking.

The situation you describe is exactly where the strengths of E cores lies actually background CPU utilization contention that bogs down P core performance. Since you've bog down the P cores performance with less of that from the E cores that occupy less die space you have higher overall performance than you would otherwise under certain general use circumstances. There is certainly design balances between the core types, but I like the trade off myself. Your results look encouraging. I've wanted to see more of this kind of overclocking on Alder Lake and how it impacts results. That actually is 3W less than the stock 12600K multithreaded results TPU measured is that undervolted!!? Seems wild given you've got both P core and E cores types overclocked over stock though maybe that wasn't measured while stress testing under same workload circumstances with Cinebench.
MusselsIntel have to do the E-cores because their main cores are too power hungry, and they want to win multi threaded benchmarks


I'd rather have 4-8 E-cores dedicated to the OS, and the big boys for programs and games
This Intel can't push the P core frequency curve much higher at this point because voltage curve and heat output to do so is completely asinine at this point. Even with carbon nanotubes and move away from silicone the power draw would still be crazy as loon for a rather tiny increase to frequency scaling. E-cores are the right choice and more of them. A better balance medium between E cores and P cores with another core designs would further improve things, but won't happen overnight. I think we'll see kind of a stacked pyramid and inverted pyramid design of sorts eventually with TSV shingling.
Posted on Reply
#53
mouacyk
RavenmasterWould have preferred it if they'd added 2 more P cores and had less E core clusters
See, thing is they don't want melting stock VRM's as a defect.
Posted on Reply
#54
GoldenX
Intel Raptor Lake
or
How to get away with 300W TDPs.

In sales now.
Posted on Reply
#55
Dr. Dro
I think E-cores are the way.

With Alder and Raptor Lake, Intel's laying a foundation for high-performance manycore processors in the future. I believe the company will focus on increasing E-cores' performance, while retaining the density advantage. With Foveros 3D packaging + densely packed E-core clusters, they may very well achieve GPU-like core counts per socket without giving up on IPC, and that's where the master stroke is.

I would not be surprised to see HEDT processors with wild configs like 16 P-cores + 128 E-cores in the future.
Posted on Reply
#57
phanbuey
InVasManiFinally info on someone that's intentionally overclocked the E cores. Have you tried aggressively reducing the P cores multipliers to see if the E cores can clock higher if the heat output from the P cores isn't a primary limitation to it!!?
I have not -- they seem to be limited by the core voltage of the P cores (my board uses the same voltage domain for both) so I am sure if I push volts above 1.32v I would be able to push them harder. The Ecores themselves never get that hot at the sensor (68C during CB) so I don't think heat is their main limitation -- also when they crash they crash instantly (4.5 ghz wont even boot into windows) so it's pretty binary stability. Below is some shots during/after cinebench R23 at 4.3 ghz.

InVasManiThe E cores appear more efficient for the die space area than the P cores relative to the die space. To me clocking them higher is a no-brainer if you want higher overall performance. How did you go about overclocking them have you tried BCLK!!? There are some advantages to a BCLK overclock in that it raises the memory ratio speed much like infinity fabric. You can probably get better memory results as well like with infinity fabric overclocking.
I have actually - my issue on this board with BCLK OC is if I touch it at all, one of my sata drives in windows disappears and my USB ports randomly shut off, so I just leave that on 100. It does help to dial in max ring/ e core clocks but i don't have separate clock domains.
InVasManiThe situation you describe is exactly where the strengths of E cores lies actually background CPU utilization contention that bogs down P core performance. Since you've bog down the P cores performance with less of that from the E cores that occupy less die space you have higher overall performance than you would otherwise under certain general use circumstances. There is certainly design balances between the core types, but I like the trade off myself. Your results look encouraging. I've wanted to see more of this kind of overclocking on Alder Lake and how it impacts results.
I want to take some time to see if I can get a frame pacing software set up to show difference between e cores on and off with all my garbage that I run and youtube running in the background. This is what my gaming task manager usually looks like when I fire up a game:
InVasManiThat actually is 3W less than the stock 12600K multithreaded results TPU measured is that undervolted!!? Seems wild given you've got both P core and E cores types overclocked over stock though maybe that wasn't measured while stress testing under same workload circumstances with Cinebench.
So I measure using HWinfo -- I not sure TPU uses a different methodology. Here is a shot during CB 23:


^ I actually draw around 189-192W in R23 (not 187, so I was a tiny bit off). Let me know if you want me to run any before / after benches on E core OC. I am sure if I go full FPU load using another stress software I can push that past 200W (still not terrible).


CB R23 full run with e cores @ 4.3
Posted on Reply
#58
Minus Infinity
Raptor Lake should be a large improvement over Alder Lake especially in power efficiency and with 2x the E-cores as well as IPC uplifts shpould be a really good product. But don't count AMD out. They are releasing two Zen 4 CPU classes and for those that need massive mutli-thread performance they will have Zen 4c with up to 32 cores and each 4c core will be between 10-20% weaker than Zen 4 cores, so will obliterate Gracemont E cores in performance. 32C/64t Zen4c would destroy 13900K RL with 8 P-cores and 24 E-cores at multithreading.

Late next year will be very exciting and really can't go wrong IMO with either camp. Torn between updating my 2016 Zen 1700X system with Zen4/RL or waiting for Zen 5/Meteor Lake and pushing back update to 2024. Zen 5 introduces big.little and Meteor Lake cores bring large architectural changes and probably sees the end of ringbus topology. Zen 5's little cores will be 4c cores from Zen 4.
Posted on Reply
#59
Crackong
HarakhtiThis seems to be the general concensus, but: for home server builders? hell yeah, bring me all E-core clusters, don't care how slow they are (Xeon Phi style). For power users? Go full ham and make something with all P-cores just for the lulz, ridiculous cooling requirements are already a problem these days so nothing changes.
Alternatively, small E-core only (relatively performant) office systems would be welcome to help out on the efficiency side. I think that's what Zhaoxin was trying to do.
I have a few home server builds but I don't really want a Hybrid CPU in my system.
Mainly I don't trust the scheduler to handle things right.
And the unknown performance drop / crashing when the scheduler decides to move my tasks from P-cores to E-cores are concerning.

On the other hand, I agreed a "Pure E-core" CPU is interesting.
A 12900k sized 40 cores CPU will be extremely handy .
Posted on Reply
#60
phanbuey
CrackongI have a few home server builds but I don't really want a Hybrid CPU in my system.
Mainly I don't trust the scheduler to handle things right.
And the unknown performance drop / crashing when the scheduler decides to move my tasks from P-cores to E-cores are concerning.

On the other hand, I agreed a "Pure E-core" CPU is interesting.
A 12900k sized 40 cores CPU will be extremely handy .
I think that's what Zen 4c (Bergamo) is - basically scaled down zen 4 cores optimized for density.

Both camps are looking at density for mt applications it seems. Thing is I don't think AMD is planning on launching those to consumers, so a pure E core CPU, if intel decided to launch one, would be super interesting for people who need tons of mt.
Posted on Reply
#61
GoldenX
Why_MeObviously AMD hates Argentina.

www.fullh4rd.com.ar/prod/12425/micro-amd-ryzen-5-3600
AMD RYZEN 5 3600 $35.637,00

www.fullh4rd.com.ar/prod/17680/micro-amd-ryzen-5-5600x
AMD RYZEN 5 5600X $42.290,00

www.fullh4rd.com.ar/prod/18814/micro-intel-core-i5-11400f-sin-video
INTEL CORE I5 11400F $34.890,00
Yeah, you have to balance motherboard cost, CPU cost, and power use cost. Some places in the Patagonia are VERY expensive, making an Intel under intensive use a bad deal.
For gaming, the cheapest i5 or i3 are the only valid options.
Posted on Reply
#62
Richards
Dr. DroI think E-cores are the way.

With Alder and Raptor Lake, Intel's laying a foundation for high-performance manycore processors in the future. I believe the company will focus on increasing E-cores' performance, while retaining the density advantage. With Foveros 3D packaging + densely packed E-core clusters, they may very well achieve GPU-like core counts per socket without giving up on IPC, and that's where the master stroke is.

I would not be surprised to see HEDT processors with wild configs like 16 P-cores + 128 E-cores in the future.
Thats intel's ultimate goal.. imagine the multi core performance of a 8+100 core cpu insane
Posted on Reply
#63
Mussels
Freshwater Moderator
RichardsThats intel's ultimate goal.. imagine the multi core performance of a 8+100 core cpu insane
1 core to win ST performance benchmarks
E-cores to win MT benchmarks
And low CPU prices, to in the darkness bind them
Posted on Reply
#64
stimpy88
RichardsThats intel's ultimate goal.. imagine the multi core performance of a 8+100 core cpu insane
Not sure what all this enthusiasm for hundreds of E cores is all about. You do know what these E cores are, don't you? You do realize that there E cores are so efficient because they miss many of the latest CPU core features, and clock like a 10 year old CPU. Once you add the features and clockspeed back, and allow for increases to IPC, because AMD don't stand still, they will just end up being P cores anyway.

Intel seem to be in trouble with these P cores, they need to shrink these things down dramatically to get the thermals and power under control, and Intel suck at new process nodes.

And another thing is that AMD don't seem to have a problem with 128 full performance cores in the server line next year, and yes, they will be low clocked, but they have all the features and IPC, unlike Intels E cores.
Posted on Reply
#65
londiste
stimpy88Not sure what all this enthusiasm for hundreds of E cores is all about. You do know what these E cores are, don't you? You do realize that there E cores are so efficient because they miss many of the latest CPU core features, and clock like a 10 year old CPU. Once you add the features and clockspeed back, and allow for increases to IPC, because AMD don't stand still, they will just end up being P cores anyway.

And another thing is that AMD don't seem to have a problem with 128 full performance cores in the server line next year, and yes, they will be low clocked, but they have all the features and IPC, unlike Intels E cores.
There have been much complaints about the applicability of AVX512 and that is largely the main missing feature.
SMT is the other but given the size and possible density of E-cores it can be mitigated by adding more cores.
64-core EPYCs run at 2GHz base clock (highest SKU was 2.25MHz IIRC). 40-core Ice Lake Xeon runs at 2.3GHz. That is quite a bit less than what we see E-cores in Alder Lake running at.
E-core IPC today is in the same range as Skylake or Zen+ which is not bad at all.

By the way, AMD's 128-core is Zen4C, whatever that exactly ends up being. Space-optimized (=smaller) they said but looks like it is power optimized as well.
Posted on Reply
#66
InVasMani
phanbueyI have not -- they seem to be limited by the core voltage of the P cores (my board uses the same voltage domain for both) so I am sure if I push volts above 1.32v I would be able to push them harder. The Ecores themselves never get that hot at the sensor (68C during CB) so I don't think heat is their main limitation -- also when they crash they crash instantly (4.5 ghz wont even boot into windows) so it's pretty binary stability. Below is some shots during/after cinebench R23 at 4.3 ghz.





I have actually - my issue on this board with BCLK OC is if I touch it at all, one of my sata drives in windows disappears and my USB ports randomly shut off, so I just leave that on 100. It does help to dial in max ring/ e core clocks but i don't have separate clock domains.



I want to take some time to see if I can get a frame pacing software set up to show difference between e cores on and off with all my garbage that I run and youtube running in the background. This is what my gaming task manager usually looks like when I fire up a game:



So I measure using HWinfo -- I not sure TPU uses a different methodology. Here is a shot during CB 23:


^ I actually draw around 189-192W in R23 (not 187, so I was a tiny bit off). Let me know if you want me to run any before / after benches on E core OC. I am sure if I go full FPU load using another stress software I can push that past 200W (still not terrible).


CB R23 full run with e cores @ 4.3
This is good info on a lot of different details. Do you think your BCLK issue is board issue mostly or general problem with Alder Lake. I thought Alder Lake could push individual domains, but maybe that's basically board specific situation.

What you describes sounds like PCIE getting overclocked due to BCLK that's thing causing issues to the SATA/USB ports tied to PCIE. Makes me think of all the classic VIA chipsets that had those same basic overclocking issues. Vicious cycle of fixed then broken in regard to that.

The consistency of the tempson the E cores is kind of surprising. It looks to me like temps on P core could get in the way more readily than the E cores. The E cores don't look overly hot, but P cores certainly heat up a bit more and combined probably the bigger heat concern or seems that way.
zlobbyIntel won't rest until they put a phone SoC in your PC, but tax it as a HEDT one...


Or maybe intel should't use toothpaste as TIM?


Preeeach!
Perhaps or maybe they want to put a PC in a phone and tax it like Apple.
Posted on Reply
#67
mouacyk
CrackongI have a few home server builds but I don't really want a Hybrid CPU in my system.
Mainly I don't trust the scheduler to handle things right.
And the unknown performance drop / crashing when the scheduler decides to move my tasks from P-cores to E-cores are concerning.

On the other hand, I agreed a "Pure E-core" CPU is interesting.
A 12900k sized 40 cores CPU will be extremely handy .
I, too, run a home server and compile desktop programs to be distributed to other clients. Makes me wonder how much performance is lost because Windows 11 has to ensure binaries run on both P and E cores. Provided it's not as bad as i686 being the common denominator, but surely cache sizes and lines are different between the P and E cores to cause pipe-line stalls.
Posted on Reply
#68
Richards
Mussels1 core to win ST performance benchmarks
E-cores to win MT benchmarks
And low CPU prices, to in the darkness bind them
The cpu to rule them all lol.. lord if the rings
Posted on Reply
#69
efikkan
TheGuruStudZen 4 is going to have up to 50% perf increase with the stacked cache models. Intel isn't even on the radar.
Even if that claim is remotely true, the key here is up to.
Most new architectures are up to ~40-50% faster than their predecessors. We should expect this much.

We have to wait and see how much a massive L3 cache matters for various real world use cases.
CrackongThe primary reason they went for P/E-core config is current Intel ringbus architecture maxed out at 12 slots for CPU cores per ring
As demonstrated in the Xeon e5 v4 series.
And exactly the reason why they went for mesh architecture.
The ringbus vs. mesh design has to do with core layout. We've had this discussion since the quad core days, yet the ring bus is keeping up just fine. I see no reason why the ringbus would be a problem for mainstream use for even 16 cores.
MusselsIntel have to do the E-cores because their main cores are too power hungry, and they want to win multi threaded benchmarks
Sure, synthetic benchmarks matters a lot to the enthusiast market, but you're missing the bigger picture. The main reason for the big-little design in desktops is they have hit the clock speed "wall" and (big) core count "wall", and the big PC makers like Dell, HP, Lenovo, etc. mostly sells upgrades based on "specs".
londisteE-core IPC today is in the same range as Skylake or Zen+ which is not bad at all.
With a shared L2 the real world performance would be quite different with load on multiple small cores. This is one of the reasons why it's important to distinguish performance and IPC.
Posted on Reply
#70
dragontamer5788
efikkanWith a shared L2 the real world performance would be quite different with load on multiple small cores. This is one of the reasons why it's important to distinguish performance and IPC.
I'm unaware of any CPU made, be it AMD, Intel, or IBM POWER, or ARM, that did data-sharing in "close" caches (L1). All data-sharing is in LLC (last-level cache).
Posted on Reply
#71
Crackong
mouacykI, too, run a home server and compile desktop programs to be distributed to other clients. Makes me wonder how much performance is lost because Windows 11 has to ensure binaries run on both P and E cores. Provided it's not as bad as i686 being the common denominator, but surely cache sizes and lines are different between the P and E cores to cause pipe-line stalls.
Well, we would have no idea how its gonna behave unless someone tries to cross the mine field.
I have no time and resources to do that, so I would avoid these products for some type of use cases for now.
efikkanThe ringbus vs. mesh design has to do with core layout. We've had this discussion since the quad core days, yet the ring bus is keeping up just fine. I see no reason why the ringbus would be a problem for mainstream use for even 16 cores.
The largest ring ever created by Intel was in Xeon e5 v4.
With a total of 17 ring stops in the largest ring.
An Intel mainstream CPU needs 1 ring stop for each of the following : IMC , PCI-E controller , QPI link , iGPU
That leaves 13 ring stops left.
Since Intel does not do odd number cores anymore, it is 12 cores max.

Maybe , just maybe, sometime they will come up with 16 cores single ringbus.
But that means a 20 ring stop ringbus.
Will core to core latency become a huge concern ?
Posted on Reply
#72
InVasMani
Ring stops do be causing Sonic a lot of speed run latency issues. Is it possible to 3D stack a ringbus and have them run in reverse order to reduce latency!? Least important latency stuff could be sandwiched in the middle. Perhaps something a bit like Apple's hybrid memory subsystem for the ringbus over substrate?
Posted on Reply
#74
Aquinus
Resident Wat-man
ShurikNE cores have no place in any desktop whatsoever. Gaming or not.
Laptops, sure.
Did you not see the E-core review that W1zz did? At 4k, the E-cores perform almost as well as the P-cores. You only see a difference at lower resolutions because each frame takes less GPU power to render and completes faster. All in all, that's pretty darn good. Are they perfect, no, but given the power consumption and how many of these cores you can fit into the same area as a single P-core makes it a nice option for a lot of different workloads. Also, what good is more P-cores if you're already hitting a thermal limit. Not every machine is going to have a huge honking cooler and not everyone wants a CPU with a power limit north of 200 watts.
Posted on Reply
#75
Dr. Dro
Why_MeObviously AMD hates Argentina.

www.fullh4rd.com.ar/prod/12425/micro-amd-ryzen-5-3600
AMD RYZEN 5 3600 $35.637,00

www.fullh4rd.com.ar/prod/17680/micro-amd-ryzen-5-5600x
AMD RYZEN 5 5600X $42.290,00

www.fullh4rd.com.ar/prod/18814/micro-intel-core-i5-11400f-sin-video
INTEL CORE I5 11400F $34.890,00
Those prices are only moderately higher than what currently practiced by Brazil's largest hardware store and AMD's largest licensed retailer (KaBuM!), where I bought my R9 5950X and my 3090. That store is asking roughly the equivalent of 2200 BRL for the 5600X, this processor can be obtained at 1750 BRL currently, for a smaller hardware store, that is not so bad. The price with Mercado Pago payment processor is probably due to the commission Mercado Libre charges, whenever I sell stuff through there, they take 17% in the premium plan (giving buyer ability to finance in 12 installments + free shipping and featuring my listing in ads).

The numbers seem quite stratospheric, but in reality, it's more of Americans often not being aware of how good they've got things. It's mostly a result of the combination of it being a small store, taxes and the Argentinian economy's very poor performance. In general, though, the COVID-19 pandemic's economic recoil in South America (including Brazil) has been generally felt recently because of our devaluing currency. 1 USD is currently trading for 5.60 BRL and just north of 101 ARS.
RichardsThats intel's ultimate goal.. imagine the multi core performance of a 8+100 core cpu insane
As long as the prices are kept in check, I welcome this development with open arms.
Posted on Reply
Add your own comment
Dec 18th, 2024 11:16 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts