• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Intel "Emerald Rapids" Die Configuration Leaks, More Details Appear

AleksandarK

News Editor
Staff member
Joined
Aug 19, 2017
Messages
2,582 (0.97/day)
Thanks to the leaked slides obtained by @InstLatX64, we have more details and some performance estimates about Intel's upcoming 5th Generation Xeon "Emerald Rapids" CPUs, boasting a significant performance leap over its predecessors. Leading the Emerald Rapids family is the top-end SKU, the Xeon 8592+, which features 64 cores and 128 threads, backed by a massive 480 MB L3 cache pool. The upcoming lineup shifts from a 4-tile to a 2-tile design to minimize latency and improve performance. The design utilizes the P-Core architecture under the Raptor Cove ISA and promises up to 40% faster performance than the current 4th Generation "Sapphire Rapids" CPUs in AI applications utilizing Intel AMX engine. Each chiplet has 35 cores, three of which are disabled, and each tile has two DDR5-5600 MT/s memory controllers, which operate two memory channels each and translating that into eight-channel design. There are three PCIe controllers per die, making it six in total.

Newer protocols and AI accelerators also back the upcoming lineup. Now, the Emerald Rapids family supports the Compute Express Link (CXL) Types 1/2/3 in addition to up to 80 PCIe Gen 5 lanes and enhanced Intel Ultra Path Interconnect (UPI). There are four UPI controllers spread over two dies. Moreover, features like the four on-die Intel Accelerator Engines, optimized power mode, and up to 17% improvement in general-purpose workloads make it seem like a big step up from the current generation. Much of this technology is found on the existing Sapphire Rapids SKUs, with the new generation enhancing the AI processing capability further. You can see the die configuration below. The 5th Generation Emerald Rapids designs are supposed to be official on December 14th, just a few days away.



View at TechPowerUp Main Site | Source
 
Joined
Sep 1, 2020
Messages
2,348 (1.52/day)
Location
Bulgaria
Hmm, 4 channels RAM but only up to 5600 DDR5. How to feeding 64 cores with this?
 
Joined
Sep 1, 2020
Messages
2,348 (1.52/day)
Location
Bulgaria
I believe that is 256 bit bus and is equal to AMD Threadripper X 7000 series. Yes theoretically is 8 channels because each DDR5 module has 2*32bit inner bus... but... will compete with AMD Epyc(?) How? Epyc already has 12 channels.
 
Joined
Dec 12, 2016
Messages
1,840 (0.63/day)
I wonder what went wrong with the four tile SPR configuration that made them drop down to two tiles with more cores.
 
Joined
May 25, 2022
Messages
117 (0.13/day)
I believe that is 256 bit bus and is equal to AMD Threadripper X 7000 series. Yes theoretically is 8 channels because each DDR5 module has 2*32bit inner bus... but... will compete with AMD Epyc(?) How? Epyc already has 12 channels.
Sigh.

Emerald Rapids is a drop-in-socket replacement to Sapphire Rapids, the current gen Xeon. Sapphire Rapids is 8-channels. Disregard the DDR5 "channel" term, it just confuses the lesser knowledgeable.

DDR5 confuses people. Each DIMMs are always 64-bit. So eight channels mean 512-bit. They called it "dual channel" originally because you only needed one DIMM for maximum performance before and after that you needed two, hence "dual". So dual channels are always 128-bit. Beyond that pseudo-marketing term, you refer to bit width, so you get away from the DDR5 shenanigans.

I doubt DDR5's "dual" channel really did anything other than try to compensate for increased latency over DDR4. Every DDR generation talks about various talking points but in the end it's only the MT/s metric that matters.

Back to the CPU.

From an engineering standpoint, it's a very good work. The turnaround time is very short and since they optimized the space taken up by the die, it's good and they improved perf/W on the same process.

From a product standpoint, it's better than Sapphire Rapids, which is not much. Just like the predecessor, it'll live based on Intel selling low and people that needs accelerators. They got lucky betting with AMX, since big companies are using it for deep learning acceleration.
 
Last edited:
Joined
May 25, 2022
Messages
117 (0.13/day)
No. This isn't explain two channels per tile/chiplet whatever is right terminology.
Emerald Rapids has:
-Two tiles
-Each tile has two memory tiles
-Each memory tile is a 2-channel device

2x2x2 = 8

Keep insisting, but you are wrong. EMR is 512-bit, just like the predecessor.

I'm telling you, DDR5 messed up people's minds on what "channels" mean for memory. Channels = 64-bit.
 
Joined
May 25, 2022
Messages
117 (0.13/day)
Intel claims 40% gain in AI workloads over Sapphire Rapids. The core count increase is relatively small, so the low level changes such as the tile removal must be contributing to it.
Inner channels in DDR5 module are 32bit or if you make sum your 8 individual channels math say 8*32bit.
No. Do research, and then think, rather than repeat yourself. Why would Intel cut down memory channels in half for a successor? Why would Intel state DDR5 inner channels, when it doesn't matter at all?

I'm talking about you when referring to DDR5 confusing people.
 
Joined
Sep 1, 2020
Messages
2,348 (1.52/day)
Location
Bulgaria
Intel claims 40% gain in AI workloads over Sapphire Rapids. The core count increase is relatively small, so the low level changes such as the tile removal must be contributing to it.

No. Do research, and then think, rather than repeat yourself. Why would Intel cut down memory channels in half for a successor? Why would Intel state DDR5 inner channels, when it doesn't matter at all?

I'm talking about you when referring to DDR5 confusing people.
Looks like we'll have to wait for better leaks or official details on what the architecture will be. And why wouldn't Intel act logically from the user's point of view?...They can count on the reduced latency and the increased cache size (cache hitt>>cache miss) to mask the fact that they will sell us a crippled line in terms of RAM processors. It may be that production is cheaper and they hope for an increased profit margin. How do I know for sure at this point?
 
Joined
Feb 15, 2019
Messages
1,658 (0.79/day)
System Name Personal Gaming Rig
Processor Ryzen 7800X3D
Motherboard MSI X670E Carbon
Cooling MO-RA 3 420
Memory 32GB 6000MHz
Video Card(s) RTX 4090 ICHILL FROSTBITE ULTRA
Storage 4x 2TB Nvme
Display(s) Samsung G8 OLED
Case Silverstone FT04
Is the architecture still golden cove ?

We just had a Xeon W5 server recently and found out that
Under normal TDP settings, it performs just like a second gen EPYC, which is really, really disappointing.
 
Joined
Aug 12, 2022
Messages
248 (0.30/day)
One RAM channel is always 64 bits, or 2 × 32 bits in the case of DDR5. So 8 channels is 8 × 2 × 32 bits, for 512 bits total.

Is the architecture still golden cove ?

We just had a Xeon W5 server recently and found out that
Under normal TDP settings, it performs just like a second gen EPYC, which is really, really disappointing.
That's surprising to me; Golden Cove in desktop and laptop CPUs usually outperforms Zen 3, or matches it at the same power consumption. Emerald Rapids is "Raptor Cove" which I understand to be exactly like Golden Cove but with more cache.
 
Joined
Sep 1, 2020
Messages
2,348 (1.52/day)
Location
Bulgaria
Yes i read a article from before one year for Emerald Rapids and there is explain more understandable. 16 dimms per socket.
 
Joined
Feb 15, 2019
Messages
1,658 (0.79/day)
System Name Personal Gaming Rig
Processor Ryzen 7800X3D
Motherboard MSI X670E Carbon
Cooling MO-RA 3 420
Memory 32GB 6000MHz
Video Card(s) RTX 4090 ICHILL FROSTBITE ULTRA
Storage 4x 2TB Nvme
Display(s) Samsung G8 OLED
Case Silverstone FT04
That's surprising to me; Golden Cove in desktop and laptop CPUs usually outperforms Zen 3, or matches it at the same power consumption.
Our main use case is VM hosting.
So only multithread performance matters.
Under server TDP limits (225W) these CPUs just don't have the room to boost itself to reasonable frequencies.
The cores usually sits below 2 GHz for most of the time for the high core count models.

(edited for more details)
For the model I have on hand it was a Xeon w5-2465x with 240W TDP
The comparison was a 3.5 year-old TR3955wx with 280W TDP
In all core workload the Xeon struggled to keep its base frequency of 3.1GHz, sometimes dip below 2.8
While the TR stays at its 3.85GHz all the time.
Those frequencies differences makes up for the architectural benefits and they just performed almost the same.


In Desktop & Laptop, manufacturers usually 'cheat' the TDP limit by adding higher PL2
This isn't the case in server.
Server TDP limits are straight.
 
Last edited:
Joined
Aug 12, 2022
Messages
248 (0.30/day)
Our main use case is VM hosting.
So only multithread performance matters.
Under server TDP limits (225W) these CPUs just don't have the room to boost itself to reasonable frequencies.
The cores usually sits below 2 GHz for most of the time.

In Desktop & Laptop, manufacturers usually 'cheat' the TDP limit by adding higher PL2
This isn't the case in server.
Server TDP limits are straight.
I wonder if Golden Cove is less efficient than Zen3 when at low clock speeds. This wouldn't really ever effect desktop and wouldn't hurt benchmarks on laptops, but would partly explain the poor battery life of Alder Lake laptops.
 
Joined
Feb 15, 2019
Messages
1,658 (0.79/day)
System Name Personal Gaming Rig
Processor Ryzen 7800X3D
Motherboard MSI X670E Carbon
Cooling MO-RA 3 420
Memory 32GB 6000MHz
Video Card(s) RTX 4090 ICHILL FROSTBITE ULTRA
Storage 4x 2TB Nvme
Display(s) Samsung G8 OLED
Case Silverstone FT04
I wonder if Golden Cove is less efficient than Zen3 when at low clock speeds. This wouldn't really ever effect desktop and wouldn't hurt benchmarks on laptops, but would partly explain the poor battery life of Alder Lake laptops.
I've updated some comparison details on hand, please check.

For my use case it was two workstation CPUs with lower core counts.
I choose them specifically for their relatively higher base clock...so to give snappier responses in VM
It isn't Golden Cove is less efficient than Zen3 (Or Zen2) when at low clock speeds
It is just Golden Cove needs more juice to get those clock speeds in the first place.

So the Zen series will always get more Frequencies out of the straight limited power budget, and that frequencies out weights the architectural benefits that goldencove has
 
Joined
Aug 12, 2022
Messages
248 (0.30/day)
(edited for more details)
For the model I have on hand it was a Xeon w5-2465x with 240W TDP
The comparison was a 3.5 year-old TR3955wx with 280W TDP
I'm not sure how the TDP conventions here compare but you're comparing a TSMC N7 CPU to an Intel 7 CPU, and the Xeon here appears to be performing the same with a 14% lower thermal limit. That's not a bad showing for a "7nm" CPU against another. But I get that it's disappointing that Intel can't do better 3.5 years later, when AMD can with 4th-generation Epyc.

I think it's quite likely that Emerald Rapids will do better than Sapphire Rapids, but probably not 14% better except maybe in really cache-sensitive workloads.
 
Joined
Aug 24, 2023
Messages
30 (0.07/day)
System Name The Financial Mistake 2.0
Processor Intel Xeon w5-3435X 5.3GHz
Motherboard ASUS Pro WS W790E-SAGE-SE
Cooling Alphacool ES Jet 2U 4677, 2x480 Monsta + 1x360 Monsta / Phanteks T30 x22
Memory 4x16GB Kingston Fury Renegade Pro 7000MT/s CL32
Video Card(s) NVIDIA RTX 4090 FE
Storage 4x Samsung 990 Pro 2TB, Crucial P5 Plus 2TB
Display(s) ALIENWARE AW3423DWF
Case Caselabs TH10
Audio Device(s) Creative X3
Power Supply Corsair AX1600i
Mouse Elecom HUGE Trackball, ROG Chakram
Keyboard Mountain Everest Max
Benchmark Scores Time Spy Extreme: 19,874 Time Spy: 33,772
I heard this one has a battle frontier and animated sprites
They usually say Emerald is the best one, lets hope that its the case here too.
 
Joined
May 25, 2022
Messages
117 (0.13/day)
I wonder if Golden Cove is less efficient than Zen3 when at low clock speeds. This wouldn't really ever effect desktop and wouldn't hurt benchmarks on laptops, but would partly explain the poor battery life of Alder Lake laptops.
It is true. https://www.reddit.com/media?url=https://i.redd.it/7fw8a6w4qkj81.jpg

On 22nm the curve was steeper, thus Ivy Bridge lost significantly on the higher frequencies. Great on the Atoms though. Steeper curve = better frequencies at lower power, but doesn't improve as much when you juice it up.

14nm they changed it up a bit, but that went full steam on the 10nm processes. You can see Alderlake beats AMD counterparts in perf/W at higher power, but not at lower power. Recent parts are more pronounced in this regard.

Battery life is another matter though. Alder/Raptor has a difficult time keeping idle power low. It seems it can sometimes, but not as well as the predecessors. Since battery life is bursty workloads, idle power being low is what determines battery life for the most part.

50W chip being on for 1% of the time = 0.5W
1W idle for 99% = 1W
Total = ~1.5W

20W chip being on for 1% of the time = 0.2W
2W idle for 99% = 2W
Total = ~2.2W

In theory, Meteorlake should do better. The LP E-cores will force tasks off compute tile for bursty workloads and reduce SoC power. The Intel 4 process has a steeper curve, so while it won't do as well on higher power, it'll do quite well on the lower end. Hopefully, whatever low-level changes Alder/Raptor had that made it regress is addressed on Meteorlake too.

I'm not sure how the TDP conventions here compare but you're comparing a TSMC N7 CPU to an Intel 7 CPU, and the Xeon here appears to be performing the same with a 14% lower thermal limit. That's not a bad showing for a "7nm" CPU against another. But I get that it's disappointing that Intel can't do better 3.5 years later, when AMD can with 4th-generation Epyc.

I think it's quite likely that Emerald Rapids will do better than Sapphire Rapids, but probably not 14% better except maybe in really cache-sensitive workloads.
Emerald Rapids cost more than Sapphire Rapids to produce, because it has two tiles using 1490mm2 while Sapphire Rapids uses 1510mm2 over four tiles. According to Semianalysis, with a perfect defect density rate, the amount of CPUs that can be made per wafer is 34 on EMR vs 37 on SPR. Since EMR is a 700mm2 die versus 400mm2 one, the differences will be likely even better since in practice there is some defect rate and larger dies have a more chance of having defects.

Therefore, it's latency that Intel aims to reduce with EMR. With two tiles, there's a lot less data hopping than on four with SPR. Intel claims 17% improved performance/watt. Being that it's only two tiles, they can also beef up the bandwidth between tiles in addition to lowering latency.

For the model I have on hand it was a Xeon w5-2465x with 240W TDP
The comparison was a 3.5 year-old TR3955wx with 280W TDP
In all core workload the Xeon struggled to keep its base frequency of 3.1GHz, sometimes dip below 2.8
While the TR stays at its 3.85GHz all the time.
Those frequencies differences makes up for the architectural benefits and they just performed almost the same.
Laptops can't cheat for too long as it's a thermally constrained chassis.

Based on your numbers, it's possible that Sierra Forest might equal Sapphire Rapids even on per thread performance due to higher clocks.
 
Last edited:
Top