Friday, December 8th 2023

Intel "Sierra Forest" Xeon System Surfaces, Fails in Comparison to AMD Bergamo

Intel's upcoming Sierra Forest Xeon server chip has debuted on Geekbench 6, showcasing its potential in multi-core performance. Slated for release in the first half of 2024, Sierra Forest is equipped with up to 288 Efficiency cores, positioning it to compete with AMD's Zen 4c Bergamo server CPUs and other ARM-based server chips like those from Ampere for the favor of cloud service providers (CSP). In the Geekbench 6 benchmark, a dual-socket configuration featuring two 144-core Sierra Forest CPUs was tested. The benchmark revealed a notable multi-core score of 7,770, surpassing most dual-socket systems powered by Intel's high-end Xeon Platinum 8480+, which typically scores between 6,500 and 7,500. However, Sierra Forest's single-core score of 855 points was considerably lower, not even reaching half of that of the 8480+, which manages 1,897 points.

The difference in single-core performance is a matter of choice, as Sierra Forest uses Crestmont-derived Sierra Glen E-cores, which are more power and area-efficient, unlike the Golden Cove P-cores in the Sapphire Rapids-based 8480+. This design choice is particularly advantageous for server environments where high-core counts are crucial, as CSPs usually partition their instances by the number of CPU cores. However, compared to AMD's Bergamo CPUs, which use Zen 4c cores, Sierra Forest lacks pure computing performance, especially in multi-core. The Sierra Forest lacks hyperthreading, while Bergaamo offers SMT with 256 threads on the 128-core SKU. Comparing the Geekbench 6 scores to AMD Bergamo EPYC 9754 and Sierra Forest results look a lot less impressive. Bergamo scored 1,597 points in single-core, almost double that of Sierra Forest, and 16,455 points in the multi-core benchmarks, which is more than double. This is a significant advantage of the Zen 4c core, which cuts down on caches instead of being an entirely different core, as Intel does with its P and E-cores. However, these are just preliminary numbers; we must wait for real-world benchmarks to see the actual performance.
Sources: BenchLeaks, Tom's Hardware
Add your own comment

76 Comments on Intel "Sierra Forest" Xeon System Surfaces, Fails in Comparison to AMD Bergamo

#26
_Flare
Phoenix2 has 2+4 (Zen4+Zen4c) and all share the same 16MB L3.
Phoenix has 8 Zen4 and all share the same 16MB L3.
That halved L3 compared to "normal" 32MB L3 per CCX Zen4 configs is typical for those mobile parts since some generations now.
In that regard Bergamo combines:
- mobile grade halved L3 per CCX
- 2 CCX per CCD
- dense library compactified Cores

For me the real catch of Phoenix2 is that Zen4 & Zen4c can be part of the same CCX and thus share the same L3.
Posted on Reply
#27
user556
Bergamo is not the new part here, it's older than Phoenix2. The new part is Intel's Sierra Forest, which is still to be released in fact.
Posted on Reply
#28
phanbuey
288e cores... which get slapped around by 128HT enabled zen4c cores... interesting.

I wonder what the application for this is - is it competing with the same product? I wonder what the power draw was.
_FlarePhoenix2 has 2+4 (Zen4+Zen4c) and all share the same 16MB L3.
Phoenix has 8 Zen4 and all share the same 16MB L3.
That halved L3 compared to "normal" 32MB L3 per CCX Zen4 configs is typical for those mobile parts since some generations now.
In that regard Bergamo combines:
- mobile grade halved L3 per CCX
- 2 CCX per CCD
- dense library compactified Cores

For me the real catch of Phoenix2 is that Zen4 & Zen4c can be part of the same CCX and thus share the same L3.
The issue here is the scheduler. Even though the cores have the same arch, they have different performance -- and as seen by the 7950X3D there is currently no thread scheduling which enables the chip to run as efficiently as a homogenous core.

So what you're essentially getting in that ccx is Zen 4 cores that are bogged by slower Zen4c cores and relying on software schedulers.
Posted on Reply
#29
bug
user556Bergamo is not the new part here, it's older than Phoenix2. The new part is Intel's Sierra Forest, which is still to be released in fact.
That's the clickbait: Intel's new CPU is worse than AMD's old CPU :slap:
Posted on Reply
#30
user556
The Geekbench results are a tad suspect. Maybe Geekbench tops out at 64 tasks or something that makes these results so badly skewed.
Posted on Reply
#31
Selaya
fancucker"Fails" is such a wreckless and irresponsible statement from a journalist of TPU's calibre. These products are designed for specific applications and Intel's tertiary services and easier integration make them a more compelling option than Bergamo.
there's only one fail here, and it's considering geekbench to be of any relevance at all.
Posted on Reply
#32
bug
phanbuey288e cores... which get slapped around by 128HT enabled zen4c cores... interesting.
That why I pointed out some server specific benchmarks are in order.
E cores are built to be slower. Yet servers run hundreds of threads (way more than any single CPU can offer), so it's all a matter of spreading the workload and getting the job done promptly, but without burning through too much power. I mean, if the E cores are half as slow as Zen4c cores, but there's twice as many of them, they would get the job done just as fast. So the CPU that burns though less power would win. But the numbers just aren't there to tell either way.
Posted on Reply
#33
phanbuey
bugThat why I pointed out some server specific benchmarks are in order.
E cores are built to be slower. Yet server run hundreds of threads (way more than any single CPU can offer), so it's all a matter of spreading the workload and getting the job done promptly, but without burning through too much power. I mean, if the E cores are half as slow as Zen4c cores, but there's twice as many, they would get the job done just as fast. So the CPU that burns though less power would win. But the numbers just aren't there to tell either way.
not to mention they wouldn't release such a product unless it had some advantage - whether cost or power etc.

so wondering what the full story is.
Posted on Reply
#34
R0H1T
phanbueyand as seen by the 7950X3D there is currently no thread scheduling which enables the chip to run as efficiently as a homogenous core.
That's debatable, Apple chips are currently the most efficient in that power envelope & AMD is relatively close. And we know what kind of uarch/setup Apple has ~ the days of all homogenous cores are long gone. That's not to say we'll not get such chips but efficacy isn't necessarily the best with them!
www.notebookcheck.net/Apple-M3-SoC-analyzed-Increased-performance-and-improved-efficiency.766789.0.html
Posted on Reply
#35
user556
The 7950X3D is homogenous cores. It's only the L3 cache that differs. That said, yes, even that difference matters. So AMD messed up by not offering the 7950X3D with both chiplets stacked the same.
Posted on Reply
#36
persondb
So to summarize, the results are:

Sierra Forest: 855 Single Core and 7700 Multi Core

Bergamo: 1597 Single Core and 16455 Multi Core

There seems to be something very wrong with Geekbench on both of those benches.

Just compare it to other Geekbench 6 results

Intel N100(4 gracemont cores): ~1100 Single Core and ~2800 Multi Core

Steam Deck(4 Zen 2 cores that are heavily clock limited): ~1350 Single Core and ~4650 Multi Core

AMD 3700X: ~1600 Single Core and 7300 Multi Core.

AMD 3950X: ~1800 Single Core and 11500 Multi Core

AMD 3990X: ~1650 Single Core and ~15000 Multi Core

etc etc

The results don't seem to make much sense, honestly. It might be that the choosen benches doesn't scale well with higher core count. Though see one from a Threadripper PRO 7995WX with 96 Zen 4 cores.

ASUS System Product Name - Geekbench
Posted on Reply
#37
bug
@persondb You didn't read any of this thread, did you?
Posted on Reply
#38
Panther_Seraphin
Perf/Watt is going to be the measuring stick here.

If its half the performance for 33% of the power draw they are onto a winner in areas that need high density cores (think render farms/Internet Front ends/Highly parralelised workloads)

If its the other way around 50% of the performance for 66% of the power draw then its completely DOA and you wont be able to source a Bergamo part for love nor money for the next 6-12 months.
Posted on Reply
#39
thesmokingman
The Sierra Forest lacks hyperthreading, while Beragamo offers SMT with 256 threads on the 128-core SKU.
For reals? Scratches head... :laugh:
Posted on Reply
#40
Panther_Seraphin
thesmokingmanFor reals? Scratches head... :laugh:
SMT in a lot of the use cases these are going to be used is a risk not a benefit TBH
Posted on Reply
#41
evernessince
tabascosauzAMD chiplet design weakness is uncore power overhead in relation to core count. IFOP power hasn't really improved monumentally in the server space, but Zen 4c doubling core count for a given #CCD count is a huge point in Bergamo's favour for perf/W.

You could say that E-cores are pushed too hard in Core I, and are best in their efficiency band running Xeon clocks, but the same goes for Bergamo. Server Zen 4c is also close to its happy place.
This is inaccurate.. Level1Techs has a video demonstrating that the IO die takes up to 1 watt and the infinity fabric up to 1.2 watts on AMD's Zen 4 server parts. Compare that to Zen 4 consumer CPUs where depending on the memory frequency the IO die alone takes anywhere from 8 to 23w. Zen 4 consumer parts uncore in inefficient because they are tuned at squeezing out every last bit of performance but the server parts demonstrate that when in the sweet spot Zen 4 uncore can be very efficient.
Posted on Reply
#42
john_
Intel is trying to play the "Moooooooarrrrrr cores" game that it plays at consumers to professionals.
Posted on Reply
#43
tabascosauz
evernessinceThis is inaccurate.. Level1Techs has a video demonstrating that the IO die takes up to 1 watt and the infinity fabric up to 1.2 watts on AMD's Zen 4 server parts. Compare that to Zen 4 consumer CPUs where depending on the memory frequency the IO die alone takes anywhere from 8 to 23w. Zen 4 consumer parts uncore in inefficient because they are tuned at squeezing out every last bit of performance but the server parts demonstrate that when in the sweet spot Zen 4 uncore can be very efficient.
Do you have a link to that? AMD did have slides hinting at improved Fabric power last year for Genoa, but I can't see any direct or indirect data on what you're saying. IOD + Fabric power of just a few watts for Bergamo's big IOD would be beyond groundbreaking.
  • If you're referring to the tests of Gigabyte getting rid of IO overhead from Milan (AT review), the super low idle (14W full system draw I think?) was not in a usable state (board powered off but BMC on)
  • In the LTT Bergamo preview (yes I know), HWInfo was showing about 70W package power with all cores close to 0W + 1800 fabric + 1200 UMC, so all of that 70W would be uncore of some kind.
  • A reduction of ~75W traditionally in EPYC down to <10W would be pure insanity and something I'm pretty sure someone would have made a big news story about, by now.
  • Running very low Fabric clock and UMC clock a la EPYC helps significantly, but not to the degree of many magnitudes improvement, and not nearly enough to overcome being not monolithic.
  • "<2 pJ/bit" on AMD slides seems like a restatement of Fabric's known efficiency, not the kind of groundbreaking improvement to make for what you referenced.
  • In L1 tech's video, the AMD slide for Bergamo power (vs ARM Ampere) itself advertises 70W idle power for its own product. 4:30 from the video 4 months ago on Bergamo.
Also, kinda missing the point here.........the point was that Zen 4c should be so overwhelmingly an advantage that even if Bergamo IO power was worse than Milan (which it probably isn't), it shouldn't matter, at least at the core counts that would be relevant for this comparison to Sierra Forest.
Posted on Reply
#44
evernessince
tabascosauzDo you have a link to that? AMD did have slides hinting at improved Fabric power last year for Genoa, but I can't see any direct or indirect data on what you're saying. IOD + Fabric power of just a few watts for Bergamo's big IOD would be beyond groundbreaking.
  • If you're referring to the tests of Gigabyte getting rid of IO overhead from Milan (AT review), the super low idle (14W full system draw I think?) was not in a usable state (board powered off but BMC on)
  • In the LTT Bergamo preview (yes I know), HWInfo was showing about 70W package power with all cores close to 0W + 1800 fabric + 1200 UMC, so all of that 70W would be uncore of some kind.
  • A reduction of ~75W traditionally in EPYC down to <10W would be pure insanity and something I'm pretty sure someone would have made a big news story about, by now.
  • Running very low Fabric clock and UMC clock a la EPYC helps significantly, but not to the degree of many magnitudes improvement, and not nearly enough to overcome being not monolithic.
  • "<2 pJ/bit" on AMD slides seems like a restatement of Fabric's known efficiency, not the kind of groundbreaking improvement to make for what you referenced.
  • In L1 tech's video, the AMD slide for Bergamo power (vs ARM Ampere) itself advertises 70W idle power for its own product. 4:30 from the video 4 months ago on Bergamo.
Also, kinda missing the point here.........the point was that Zen 4c should be so overwhelmingly an advantage that even if Bergamo IO power was worse than Milan (which it probably isn't), it shouldn't matter, at least at the core counts that would be relevant for this comparison to Sierra Forest.
This video seems to address a lot of the points you bring up:

It's not the original video I pulled my numbers from, just the most relevant one I could find. You can probably skip the middle section of the video, the first and last thirds of the video is where the relevant content to your inquiry is. Middle is just speculation.
Posted on Reply
#45
tabascosauz
evernessinceThis video seems to address a lot of the points you bring up:

It's not the original video I pulled my numbers from, just the most relevant one I could find. You can probably skip the middle section of the video, the first and last thirds of the video is where the relevant content to your inquiry is. Middle is just speculation.
Right off the bat his own system is idling at 100W in Linux, which is a fair bit above the "normal" 70W idle quoted by AMD and by LTT in Windows. But he's not presenting anything contradictory either.

When he briefly says "on the order of 1-2 watts" provided by AMD telemetry, he makes a passing reference to IOD but the power figures are referring to CCDs. It makes sense and is largely known CCD idle behaviour on consumer Ryzen, the cores are aggressively power gated and CCD power consumption is usually very low for anything Matisse-onwards. What he says about efficiency gap between Ryzen/EPYC is true, but irrelevant because both of them can and will idle their CCDs that low already.

That's not IOD power, and not IFOP power either. Both of those together account for the 70-75W in LTT's Bergamo review, AT and Phoronix' Milan review, AMD's official numbers, etc. Which also makes sense and is known IOD and Fabric behaviour in all of AMD's other chiplet CPUs. LTT video does not show specifically SOC Power, but the cores are clearly idling (~0.01-0.1W max) so the rest of the 70W can only come from one place.

Assuming that "1-2 watts" actually describes IOD (which is misleading I know, because it's all in the same same sentence from wendell) makes zero sense in the context of what everyone else (and what Wendell's own system) has shown about Bergamo/any EPYC/any Ryzen ever. If, theoretically, IOD and IFOP draw could reach sub-5W levels, then the rest of the 70-75W budget (in wendell's case, 100W) must be caused by the CCDs - which means that every Bergamo ever never idles its cores ever, the 4c CCDs are always under load, and AMD telemetry in all Ryzen and EPYC is unreliable and blatantly incorrect.

It's not the first time wendell has made misleading/incorrect statements in his videos, but he's not reading from a script so it's fully understandable, and doesn't diminish his expertise in any way.

Come on, man. Clearly it's not hard to see what he's actually getting at, given all the context of the other info. What you're suggesting requires at the very least, a fundamental ground-up redesign of Fabric and IOD. EPYC 9000 expands Fabric with more lanes (split into "P" and "G" links) but it is not that kind of redesign.
Posted on Reply
#46
phanbuey
R0H1TThat's debatable, Apple chips are currently the most efficient in that power envelope & AMD is relatively close. And we know what kind of uarch/setup Apple has ~ the days of all homogenous cores are long gone. That's not to say we'll not get such chips but efficacy isn't necessarily the best with them!
www.notebookcheck.net/Apple-M3-SoC-analyzed-Increased-performance-and-improved-efficiency.766789.0.html
There's not really a debate though -- 7800x3d is far superior to the 7950x3d even though it's clocked lower and has fewer cores and less total cache. The only reason that is, is due to threads leaking over to the non-cache CCD.

e cores on vs e cores off on the raptor lake series doesn't have nearly the same type of impact, and apple's chip is a completely different ISA altogether.

AMD has to figure out thread scheduling prior to homogenous cores or they will release a chip that has inconsistent performance.
Posted on Reply
#47
ZoneDymo
fancucker"Fails" is such a wreckless and irresponsible statement from a journalist of TPU's calibre. These products are designed for specific applications and Intel's tertiary services and easier integration make them a more compelling option than Bergamo.
Look man, you usually dont impress with your comments....but do go ahead and explain how on earth this is "wreckless and irresponsible"....
What massive risk does this apparently create?
Posted on Reply
#48
kondamin
Kinda doubt people planning on furnishing a server room will care all that much about geekbench.

if intel manages to get high volume going for this node it will be a win for intel even if they don't get the best performance .
Posted on Reply
#49
Assimilator
Yeah, this is a new low for TPU. Even though I'm 100% certain that Sierra Forest CPUs are going to be comprehensively beaten by anything AMD has to offer, to use Geekbench of all things as "evidence" of that is just plain stupid... there's really no other way to put it. Geekbench is designed for consumer smartphone CPU workloads, which are about as far from server chip workloads as it's possible to be.
Posted on Reply
#50
user556
AssimilatorYeah, this is a new low for TPU. Even though I'm 100% certain that Sierra Forest CPUs are going to be comprehensively beaten by anything AMD has to offer, to use Geekbench of all things as "evidence" of that is just plain stupid... there's really no other way to put it. Geekbench is designed for consumer smartphone CPU workloads, which are about as far from server chip workloads as it's possible to be.
You idiots. It wasn't TPU that did this. They're just reporting the news that such a test has been posted with Geekbench.

I would be nice to know why GB scales so badly in general though.
Posted on Reply
Add your own comment
Nov 23rd, 2024 23:15 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts