• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Intel "Sierra Forest" Xeon System Surfaces, Fails in Comparison to AMD Bergamo

If it includes games, web browsers or office software, that wouldn't be very relevant for a server chip.
Very little of the above. It lists what it does each step of the way.
GB6:
Running File Compression
Running Navigation
Running HTML5 Browser
Running PDF Renderer
Running Photo Library
Running Clang
Running Text Processing
Running Asset Compression
Running Object Detection
Running Background Blur
Running Horizon Detection
Running Object Remover
Running HDR
Running Photo Filter
Running Ray Tracer
Running Structure from Motion


GB5:
Running AES-XTS
Running Text Compression
Running Image Compression
Running Navigation
Running HTML5
Running SQLite
Running PDF Rendering
Running Text Rendering
Running Clang
Running Camera
Running N-Body Physics
Running Rigid Body Physics
Running Gaussian Blur
Running Face Detection
Running Horizon Detection
Running Image Inpainting
Running HDR
Running Ray Tracing
Running Structure from Motion
Running Speech Recognition
Running Machine Learning
 
Phoenix2 has 2+4 (Zen4+Zen4c) and all share the same 16MB L3.
Phoenix has 8 Zen4 and all share the same 16MB L3.
That halved L3 compared to "normal" 32MB L3 per CCX Zen4 configs is typical for those mobile parts since some generations now.
In that regard Bergamo combines:
- mobile grade halved L3 per CCX
- 2 CCX per CCD
- dense library compactified Cores

For me the real catch of Phoenix2 is that Zen4 & Zen4c can be part of the same CCX and thus share the same L3.
 
288e cores... which get slapped around by 128HT enabled zen4c cores... interesting.

I wonder what the application for this is - is it competing with the same product? I wonder what the power draw was.

Phoenix2 has 2+4 (Zen4+Zen4c) and all share the same 16MB L3.
Phoenix has 8 Zen4 and all share the same 16MB L3.
That halved L3 compared to "normal" 32MB L3 per CCX Zen4 configs is typical for those mobile parts since some generations now.
In that regard Bergamo combines:
- mobile grade halved L3 per CCX
- 2 CCX per CCD
- dense library compactified Cores

For me the real catch of Phoenix2 is that Zen4 & Zen4c can be part of the same CCX and thus share the same L3.

The issue here is the scheduler. Even though the cores have the same arch, they have different performance -- and as seen by the 7950X3D there is currently no thread scheduling which enables the chip to run as efficiently as a homogenous core.

So what you're essentially getting in that ccx is Zen 4 cores that are bogged by slower Zen4c cores and relying on software schedulers.
 
Bergamo is not the new part here, it's older than Phoenix2. The new part is Intel's Sierra Forest, which is still to be released in fact.
That's the clickbait: Intel's new CPU is worse than AMD's old CPU :slap:
 
The Geekbench results are a tad suspect. Maybe Geekbench tops out at 64 tasks or something that makes these results so badly skewed.
 
"Fails" is such a wreckless and irresponsible statement from a journalist of TPU's calibre. These products are designed for specific applications and Intel's tertiary services and easier integration make them a more compelling option than Bergamo.
there's only one fail here, and it's considering geekbench to be of any relevance at all.
 
288e cores... which get slapped around by 128HT enabled zen4c cores... interesting.
That why I pointed out some server specific benchmarks are in order.
E cores are built to be slower. Yet servers run hundreds of threads (way more than any single CPU can offer), so it's all a matter of spreading the workload and getting the job done promptly, but without burning through too much power. I mean, if the E cores are half as slow as Zen4c cores, but there's twice as many of them, they would get the job done just as fast. So the CPU that burns though less power would win. But the numbers just aren't there to tell either way.
 
Last edited:
That why I pointed out some server specific benchmarks are in order.
E cores are built to be slower. Yet server run hundreds of threads (way more than any single CPU can offer), so it's all a matter of spreading the workload and getting the job done promptly, but without burning through too much power. I mean, if the E cores are half as slow as Zen4c cores, but there's twice as many, they would get the job done just as fast. So the CPU that burns though less power would win. But the numbers just aren't there to tell either way.
not to mention they wouldn't release such a product unless it had some advantage - whether cost or power etc.

so wondering what the full story is.
 
  • Like
Reactions: bug
and as seen by the 7950X3D there is currently no thread scheduling which enables the chip to run as efficiently as a homogenous core.
That's debatable, Apple chips are currently the most efficient in that power envelope & AMD is relatively close. And we know what kind of uarch/setup Apple has ~ the days of all homogenous cores are long gone. That's not to say we'll not get such chips but efficacy isn't necessarily the best with them!
 
The 7950X3D is homogenous cores. It's only the L3 cache that differs. That said, yes, even that difference matters. So AMD messed up by not offering the 7950X3D with both chiplets stacked the same.
 
So to summarize, the results are:

Sierra Forest: 855 Single Core and 7700 Multi Core

Bergamo: 1597 Single Core and 16455 Multi Core

There seems to be something very wrong with Geekbench on both of those benches.

Just compare it to other Geekbench 6 results

Intel N100(4 gracemont cores): ~1100 Single Core and ~2800 Multi Core

Steam Deck(4 Zen 2 cores that are heavily clock limited): ~1350 Single Core and ~4650 Multi Core

AMD 3700X: ~1600 Single Core and 7300 Multi Core.

AMD 3950X: ~1800 Single Core and 11500 Multi Core

AMD 3990X: ~1650 Single Core and ~15000 Multi Core

etc etc

The results don't seem to make much sense, honestly. It might be that the choosen benches doesn't scale well with higher core count. Though see one from a Threadripper PRO 7995WX with 96 Zen 4 cores.

ASUS System Product Name - Geekbench
 
@persondb You didn't read any of this thread, did you?
 
Perf/Watt is going to be the measuring stick here.

If its half the performance for 33% of the power draw they are onto a winner in areas that need high density cores (think render farms/Internet Front ends/Highly parralelised workloads)

If its the other way around 50% of the performance for 66% of the power draw then its completely DOA and you wont be able to source a Bergamo part for love nor money for the next 6-12 months.
 
The Sierra Forest lacks hyperthreading, while Beragamo offers SMT with 256 threads on the 128-core SKU.

For reals? Scratches head... :laugh:
 
AMD chiplet design weakness is uncore power overhead in relation to core count. IFOP power hasn't really improved monumentally in the server space, but Zen 4c doubling core count for a given #CCD count is a huge point in Bergamo's favour for perf/W.

You could say that E-cores are pushed too hard in Core I, and are best in their efficiency band running Xeon clocks, but the same goes for Bergamo. Server Zen 4c is also close to its happy place.

This is inaccurate.. Level1Techs has a video demonstrating that the IO die takes up to 1 watt and the infinity fabric up to 1.2 watts on AMD's Zen 4 server parts. Compare that to Zen 4 consumer CPUs where depending on the memory frequency the IO die alone takes anywhere from 8 to 23w. Zen 4 consumer parts uncore in inefficient because they are tuned at squeezing out every last bit of performance but the server parts demonstrate that when in the sweet spot Zen 4 uncore can be very efficient.
 
Intel is trying to play the "Moooooooarrrrrr cores" game that it plays at consumers to professionals.
 
This is inaccurate.. Level1Techs has a video demonstrating that the IO die takes up to 1 watt and the infinity fabric up to 1.2 watts on AMD's Zen 4 server parts. Compare that to Zen 4 consumer CPUs where depending on the memory frequency the IO die alone takes anywhere from 8 to 23w. Zen 4 consumer parts uncore in inefficient because they are tuned at squeezing out every last bit of performance but the server parts demonstrate that when in the sweet spot Zen 4 uncore can be very efficient.

Do you have a link to that? AMD did have slides hinting at improved Fabric power last year for Genoa, but I can't see any direct or indirect data on what you're saying. IOD + Fabric power of just a few watts for Bergamo's big IOD would be beyond groundbreaking.
  • If you're referring to the tests of Gigabyte getting rid of IO overhead from Milan (AT review), the super low idle (14W full system draw I think?) was not in a usable state (board powered off but BMC on)
  • In the LTT Bergamo preview (yes I know), HWInfo was showing about 70W package power with all cores close to 0W + 1800 fabric + 1200 UMC, so all of that 70W would be uncore of some kind.
  • A reduction of ~75W traditionally in EPYC down to <10W would be pure insanity and something I'm pretty sure someone would have made a big news story about, by now.
  • Running very low Fabric clock and UMC clock a la EPYC helps significantly, but not to the degree of many magnitudes improvement, and not nearly enough to overcome being not monolithic.
  • "<2 pJ/bit" on AMD slides seems like a restatement of Fabric's known efficiency, not the kind of groundbreaking improvement to make for what you referenced.
  • In L1 tech's video, the AMD slide for Bergamo power (vs ARM Ampere) itself advertises 70W idle power for its own product. 4:30 from the video 4 months ago on Bergamo.
Also, kinda missing the point here.........the point was that Zen 4c should be so overwhelmingly an advantage that even if Bergamo IO power was worse than Milan (which it probably isn't), it shouldn't matter, at least at the core counts that would be relevant for this comparison to Sierra Forest.
 
Last edited:
Do you have a link to that? AMD did have slides hinting at improved Fabric power last year for Genoa, but I can't see any direct or indirect data on what you're saying. IOD + Fabric power of just a few watts for Bergamo's big IOD would be beyond groundbreaking.
  • If you're referring to the tests of Gigabyte getting rid of IO overhead from Milan (AT review), the super low idle (14W full system draw I think?) was not in a usable state (board powered off but BMC on)
  • In the LTT Bergamo preview (yes I know), HWInfo was showing about 70W package power with all cores close to 0W + 1800 fabric + 1200 UMC, so all of that 70W would be uncore of some kind.
  • A reduction of ~75W traditionally in EPYC down to <10W would be pure insanity and something I'm pretty sure someone would have made a big news story about, by now.
  • Running very low Fabric clock and UMC clock a la EPYC helps significantly, but not to the degree of many magnitudes improvement, and not nearly enough to overcome being not monolithic.
  • "<2 pJ/bit" on AMD slides seems like a restatement of Fabric's known efficiency, not the kind of groundbreaking improvement to make for what you referenced.
  • In L1 tech's video, the AMD slide for Bergamo power (vs ARM Ampere) itself advertises 70W idle power for its own product. 4:30 from the video 4 months ago on Bergamo.
Also, kinda missing the point here.........the point was that Zen 4c should be so overwhelmingly an advantage that even if Bergamo IO power was worse than Milan (which it probably isn't), it shouldn't matter, at least at the core counts that would be relevant for this comparison to Sierra Forest.

This video seems to address a lot of the points you bring up:

It's not the original video I pulled my numbers from, just the most relevant one I could find. You can probably skip the middle section of the video, the first and last thirds of the video is where the relevant content to your inquiry is. Middle is just speculation.
 
This video seems to address a lot of the points you bring up:

It's not the original video I pulled my numbers from, just the most relevant one I could find. You can probably skip the middle section of the video, the first and last thirds of the video is where the relevant content to your inquiry is. Middle is just speculation.

Right off the bat his own system is idling at 100W in Linux, which is a fair bit above the "normal" 70W idle quoted by AMD and by LTT in Windows. But he's not presenting anything contradictory either.

When he briefly says "on the order of 1-2 watts" provided by AMD telemetry, he makes a passing reference to IOD but the power figures are referring to CCDs. It makes sense and is largely known CCD idle behaviour on consumer Ryzen, the cores are aggressively power gated and CCD power consumption is usually very low for anything Matisse-onwards. What he says about efficiency gap between Ryzen/EPYC is true, but irrelevant because both of them can and will idle their CCDs that low already.

That's not IOD power, and not IFOP power either. Both of those together account for the 70-75W in LTT's Bergamo review, AT and Phoronix' Milan review, AMD's official numbers, etc. Which also makes sense and is known IOD and Fabric behaviour in all of AMD's other chiplet CPUs. LTT video does not show specifically SOC Power, but the cores are clearly idling (~0.01-0.1W max) so the rest of the 70W can only come from one place.

Assuming that "1-2 watts" actually describes IOD (which is misleading I know, because it's all in the same same sentence from wendell) makes zero sense in the context of what everyone else (and what Wendell's own system) has shown about Bergamo/any EPYC/any Ryzen ever. If, theoretically, IOD and IFOP draw could reach sub-5W levels, then the rest of the 70-75W budget (in wendell's case, 100W) must be caused by the CCDs - which means that every Bergamo ever never idles its cores ever, the 4c CCDs are always under load, and AMD telemetry in all Ryzen and EPYC is unreliable and blatantly incorrect.

It's not the first time wendell has made misleading/incorrect statements in his videos, but he's not reading from a script so it's fully understandable, and doesn't diminish his expertise in any way.

Come on, man. Clearly it's not hard to see what he's actually getting at, given all the context of the other info. What you're suggesting requires at the very least, a fundamental ground-up redesign of Fabric and IOD. EPYC 9000 expands Fabric with more lanes (split into "P" and "G" links) but it is not that kind of redesign.
 
Last edited:
That's debatable, Apple chips are currently the most efficient in that power envelope & AMD is relatively close. And we know what kind of uarch/setup Apple has ~ the days of all homogenous cores are long gone. That's not to say we'll not get such chips but efficacy isn't necessarily the best with them!

There's not really a debate though -- 7800x3d is far superior to the 7950x3d even though it's clocked lower and has fewer cores and less total cache. The only reason that is, is due to threads leaking over to the non-cache CCD.

e cores on vs e cores off on the raptor lake series doesn't have nearly the same type of impact, and apple's chip is a completely different ISA altogether.

AMD has to figure out thread scheduling prior to homogenous cores or they will release a chip that has inconsistent performance.
 
"Fails" is such a wreckless and irresponsible statement from a journalist of TPU's calibre. These products are designed for specific applications and Intel's tertiary services and easier integration make them a more compelling option than Bergamo.

Look man, you usually dont impress with your comments....but do go ahead and explain how on earth this is "wreckless and irresponsible"....
What massive risk does this apparently create?
 
Kinda doubt people planning on furnishing a server room will care all that much about geekbench.

if intel manages to get high volume going for this node it will be a win for intel even if they don't get the best performance .
 
Yeah, this is a new low for TPU. Even though I'm 100% certain that Sierra Forest CPUs are going to be comprehensively beaten by anything AMD has to offer, to use Geekbench of all things as "evidence" of that is just plain stupid... there's really no other way to put it. Geekbench is designed for consumer smartphone CPU workloads, which are about as far from server chip workloads as it's possible to be.
 
Back
Top