Wednesday, September 20th 2023

Intel 288 E-core Xeon "Sierra Forest" Out to Eat AMD EPYC Bergamo's Lunch
Intel at the 2023 InnovatiON event unveiled a 288-core extreme core-count variant of the Xeon "Sierra Forest" processor for high-density servers for scale-out, cloud-native environments. It succeeds the current 144-core model. "Sierra Forest" is a server processor based entirely on efficiency cores, or E-cores, based on the "Sierra Glen" core microarchitecture, a server-grade derivative of "Crestmont," Intel's second-generation E-core that's making a client debut with "Meteor Lake."
Xeon "Sierra Forest" is a chiplet-based processor, much like "Meteor Lake" and the upcoming "Emerald Rapids" server processor. It features a total of five tiles—two Compute tiles, two I/O tiles, and a base tile (interposer). Each of the two Compute tiles is built on the Intel 3 foundry node, a more advanced node than Intel 4, featuring higher-density libraries, and an undisclosed performance/Watt increase. Each tile has 36 "Sierra Glen" E-core clusters, 108 MB of shared L3 cache, 6-channel (12 sub-channel) DDR5 memory controllers, and Foveros tile-to-tile interfaces.Each "Sierra Glen" E-core cluster features four CPU cores that share a 4 MB local L2 cache, and a 3 MB segment contributing to the tile's 108 MB L3 cache. Unlike the "Meteor Lake" Compute tile that uses a ringbus to connect its E-core clusters and P-cores, the Compute tile uses a Mesh topology interconnect for the large array of 36 E-core clusters. With 144 cores per tile, in its maximum configuration with three such tiles, "Sierra Forest" achieves 288 cores. "Sierra Glen" lacks SMT, just like "Crestmont," and so the OS only has 288 logical processors to address.Besides the two Compute tiles, the processor has two I/O tiles. Unlike the similarly named "I/O tile" of the client "Meteor Lake" processor, the ones on "Sierra Forest" serve the functions of both the SoC and I/O PHY. With the memory controllers located on the Compute tiles, in its maximum 288-core variant, "Sierra Forest" features a 12-channel DDR5 memory interface.The I/O tile is left with the UPI interconnect for 2P servers; application-specific accelerators, a 68-lane PCI-Express Gen 5 root complex that's flexible between PCIe Gen 5 and CXL 2.0, and the I/O Fabric. Despite being based on an advanced node like Intel 3, each of the two Compute tiles is an enormous 578 mm² in die-area, while each of the two I/O tiles is 241 mm².
The up to 12-channel memory interface of "Sierra Forest" comes with native support for ECC DDR5-6400 speed. The accelerators are carried over from the current "Granite Rapids" processor, and provide speed ups for popular cryptography, file-streaming, and and data-compression operations.
When it arrives in the first half of 2024, Xeon "Sierra Forest" will square off against AMD's EPYC "Bergamo" processor. "Bergamo" is based on a slightly different philosophy than "Sierra Forest." It is a 128-core/256-thread processor based on "Zen 4c" cores that don't quite qualify as E-cores, and have an identical IPC to regular "Zen 4" cores, an identical ISA, and SMT.
Source:
Tom's Hardware
Xeon "Sierra Forest" is a chiplet-based processor, much like "Meteor Lake" and the upcoming "Emerald Rapids" server processor. It features a total of five tiles—two Compute tiles, two I/O tiles, and a base tile (interposer). Each of the two Compute tiles is built on the Intel 3 foundry node, a more advanced node than Intel 4, featuring higher-density libraries, and an undisclosed performance/Watt increase. Each tile has 36 "Sierra Glen" E-core clusters, 108 MB of shared L3 cache, 6-channel (12 sub-channel) DDR5 memory controllers, and Foveros tile-to-tile interfaces.Each "Sierra Glen" E-core cluster features four CPU cores that share a 4 MB local L2 cache, and a 3 MB segment contributing to the tile's 108 MB L3 cache. Unlike the "Meteor Lake" Compute tile that uses a ringbus to connect its E-core clusters and P-cores, the Compute tile uses a Mesh topology interconnect for the large array of 36 E-core clusters. With 144 cores per tile, in its maximum configuration with three such tiles, "Sierra Forest" achieves 288 cores. "Sierra Glen" lacks SMT, just like "Crestmont," and so the OS only has 288 logical processors to address.Besides the two Compute tiles, the processor has two I/O tiles. Unlike the similarly named "I/O tile" of the client "Meteor Lake" processor, the ones on "Sierra Forest" serve the functions of both the SoC and I/O PHY. With the memory controllers located on the Compute tiles, in its maximum 288-core variant, "Sierra Forest" features a 12-channel DDR5 memory interface.The I/O tile is left with the UPI interconnect for 2P servers; application-specific accelerators, a 68-lane PCI-Express Gen 5 root complex that's flexible between PCIe Gen 5 and CXL 2.0, and the I/O Fabric. Despite being based on an advanced node like Intel 3, each of the two Compute tiles is an enormous 578 mm² in die-area, while each of the two I/O tiles is 241 mm².
The up to 12-channel memory interface of "Sierra Forest" comes with native support for ECC DDR5-6400 speed. The accelerators are carried over from the current "Granite Rapids" processor, and provide speed ups for popular cryptography, file-streaming, and and data-compression operations.
When it arrives in the first half of 2024, Xeon "Sierra Forest" will square off against AMD's EPYC "Bergamo" processor. "Bergamo" is based on a slightly different philosophy than "Sierra Forest." It is a 128-core/256-thread processor based on "Zen 4c" cores that don't quite qualify as E-cores, and have an identical IPC to regular "Zen 4" cores, an identical ISA, and SMT.
40 Comments on Intel 288 E-core Xeon "Sierra Forest" Out to Eat AMD EPYC Bergamo's Lunch
An e core - as defined and implemented by Intel - has a reduced instruction set compared to P cores of the same generation, misses features like SMT, has lower clocks, lower dedicated L1 and L2 cache and lower IPC. It has been mocked as not a real core by enthusiasts because the roll out of this hybrid architecture has been a complete mess, often being more beneficial to disable e cores all together in real world application.
On the opposite side, AMD zen4c implements the same instructions as the bigger zen4, has the same L1 and L2, about the same IPC, maintains support for SMT and just looses on max clocks and L3 (though the loss in L3 is because there's double the cores in the same chiplet).
In servers - that already prioritize lower power and more stable clocks and where even the previous so called "slow" ryzen cores were already beating intel "regular" (before the e core p core distinction) cores - AMD is set to demolish Intel's solution unless something goes terribly wrong
As for performance, a fair comparison would be one between units that run two threads. Hence, one Zen 4 vs. one Zen 4c vs. one P vs. two E cores. Two E cores are much closer, performance-wise and area-wise, to the other three, especially if you keep in mind that SMT drags down the performance of all of them - except E cores.
Back to technical details, I just don't think that lack of SMT is a deficiency here. Just look at how small the E core is.
There are other possible bottlenecks in the architecture, memory bandwidth primarily. 288 / 12 = 24 cores per (64-bit) memory channel ... uh-huh. That MCR multiplexing scheme is quickly becoming a necessity.
Whether it's lower performance and quite a bit lower power or similar power levels and similar performance, that's yet to be determined.
Intel themselves claim 2.4x performance/watt over Sapphire Rapids. Since it has 2.4x the amount of cores as the 60 core SPR, and SPR has the advantage of having hyperthreading that's responsible for 20-30% gain, that's pretty impressive, especially considering the Golden Cove core in SPR is 40-50% faster than Sierra Glen(server grace Gracemont) E cores in SRF.
Because of that 2.4x the cores if the two CPUs have the same clocks would result in only maybe 30% gain, yet SRF has 2.4x perf/watt. This means few possibilities:
-SRF: 205W, 144 cores that perform 30-40% higher than SPR: 350W, 60 cores
-Sierra Forest is at 270W, but clocks 40% higher than SPR, and is nearly 90% faster than SPR, essentially, the clock increase makes up for architectural differences. 4.2GHz all core versus 2.9GHz all core.
According to SpecCPU tests, the 40-50% advantage Golden Cove and Zen 4 has over Gracemont is split as 20-25% in Integer and 60-65% in floating point. Golden Cove is few low single digit % faster than Zen 4, by the way.
Since Bergamo and Sierra Forest is aimed at cloud workloads, and even most non-HPC server is all integer works, that means Gracemont may be far more competent than on PCs. Then Sierra Forest would only need 3.6GHz to perform 90% faster than Sapphire Rapids, but use only 270W. There's even a possibility they could clock SRF all the way to 4.5GHz so a 2.4x perf/watt also ends up being 2.4x the performance, but at 350W TDP.
So now the real deal. Let's assume 350W Sierra Forest at 4.5GHz. Since Bergamo is also 350W but peaks at 3.1GHz, the reality is the all-core Turbo is probably about 3GHz. This means at the end of the day, it's a core count battle, and SRF has a slight edge at 144 cores versus 128.
But the "real competitor is Turin Dense" you say. You are right, maybe. According to earlier leaked roadmaps, Bergamo was supposed to be very early Q1 of this year, like Dec-Jan. Instead, it came out June of this year. The same roadmap has Turin Dense firmly at Q2 of next year. Best case scenario is that Turin Dense comes a month after SRF, the worst case scenario is that it comes 5-6 months later, meaning some sort of a leapfrog. Hence the existence of 288-core SRF. Looks like Intel wants to be at minimum, competitive in the worst case.
Let's analyze 288-core SRF vs 192 core Turin Dense.
Turin Dense:
-192 cores(1.5x)
-Zen 5(Let's say 1.2x)
-500W
-80% faster than Bergamo
288-core SRF: I speculate roughly 40W of the 350W is taken up by the IO tiles, leaving 310W for Compute. Assuming 144-core SRF is at 4.5GHz, with minimal voltage reductions, we can get a 3.6GHz SRF at 500W. Or 3.4GHz without touching the voltage.
-3.6GHz: 60% faster than Sierra Forest
-3.4GHz: 50% faster than Sierra Forest
Since the assumption is 144-core SRF is a wee bit faster than Bergamo, it looks like 288-core will be competitive. You can see 5% here, and 5% there will swing the favor to either party. But it's nothing like the bloodbath between Genoa and Sapphire Rapids.
(yes I know they will usually run Linux but I'm curious)
Unlimited OSE $110880; limited to 2 OSE cost $19296. Calculator is non official!