• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Intel "Emerald Rapids" 8592+ and 8558U Xeon CPUs with 64C and 48C Configurations Spotted

AleksandarK

News Editor
Staff member
Joined
Aug 19, 2017
Messages
2,651 (0.99/day)
Intel's next-generation Emerald Rapids Xeon lineup is just around the corner, and we are now receiving more leaks as the launch nears. Today, we get to see leaks of two models: a 64-core Xeon 8592+ Platinum and a 48-core Xeon 8558U processor. First is the Xeon 8592+ Platinum, which is possibly Intel's top-end design with 64 cores and 128 threads. Running at the base frequency of 1.9 GHz, the CPU can boost up to 3.9 GHz. This SKU carries 488 MB of total cache, where 120 MB is dedicated to L2 and 320 MB is there for L3. With a TDP of 350 Watts, the CPU can even be adjusted to 420 Watts.

Next up, we have the Xeon 8558U processor, which has been spotted in Geekbench. The Xeon 8558U is a 48-core, 96-threaded CPU with a 2.0 GHz base clock whose boost frequency has yet to be shown or enabled, likely because it is an engineering sample. It carries 96 MB of L2 cache and 260 MB of L3 cache, making for a total of 356 MB of cache (which includes L1D and L1I as well). Both of these SKUs should launch with the remaining models in the Emerald Rapids family, dubbed 5th generation Xeon Scalable, on December 14 this year.



View at TechPowerUp Main Site | Source
 
Joined
Sep 1, 2020
Messages
2,393 (1.52/day)
Location
Bulgaria
In the material of one of the sources there is an interesting table listing the models. There is only one model marked "Gold", all others are "Platinum". However, I am interested in something else. About 1/3 of the models support DDR5 4800, other 1/3 - 5200 and last 1/3 - 5600 respectively. Is it possible that only the ones with DDR5 5600 support are actually "Emerald" architecture and the rest are rebranded older architectures?
 
Joined
Aug 13, 2010
Messages
5,480 (1.04/day)
In the material of one of the sources there is an interesting table listing the models. There is only one model marked "Gold", all others are "Platinum". However, I am interested in something else. About 1/3 of the models support DDR5 4800, other 1/3 - 5200 and last 1/3 - 5600 respectively. Is it possible that only the ones with DDR5 5600 support are actually "Emerald" architecture and the rest are rebranded older architectures?
IMC rated DDR speeds are indeed very good indicator of a monolithic die's generational design.
By now, its not impossible Intel is combining 1st gen P-cores based IMC with 2nd gen and 2nd gen+ in one series.
 
Joined
Dec 12, 2016
Messages
1,948 (0.66/day)
“This SKU carries 488 MB of total cache, where 120 MB is dedicated to L2 and 320 MB is there for L3.”

120 + 320 = 440

Besides it’s probably a misreport on the L2 cache which should be 2MB per core for 128 total. So we are looking at 448 MB of total L2+L3 cache in the 64 core part.
 
Joined
Sep 1, 2020
Messages
2,393 (1.52/day)
Location
Bulgaria
“This SKU carries 488 MB of total cache, where 120 MB is dedicated to L2 and 320 MB is there for L3.”

120 + 320 = 440

Besides it’s probably a misreport on the L2 cache which should be 2MB per core for 128 total. So we are looking at 448 MB of total L2+L3 cache in the 64 core part.
Total means all levels of caches are in sum, include L1. Yes L2 is 128MB in slides.
 
Joined
Sep 1, 2020
Messages
2,393 (1.52/day)
Location
Bulgaria
128+320+5= 453 MB total cache
I wonder if it also applies to the cache how much is the actual printed size, and how much is reported by the OS and how much of the caches capacity is actually usable by the user's applications?
 
Joined
Dec 12, 2016
Messages
1,948 (0.66/day)
I wonder if it also applies to the cache how much is the actual printed size, and how much is reported by the OS and how much of the caches capacity is actually usable by the user's applications?
I think @AleksanderK just made a couple of typos. The L2 Cache is 128 MB and the total L2+L3 cache is 448 MB. Usually L1 cache is not used when reporting total cache size.

As for your comment, usually the amounts of cache shown in CPUz is the amount available to the OS and applications. I am not aware of any overhead, provisioning or otherwise hidden cache.

Edit: I should mention that @AleksanderK only made one typo. The 488 should say 448. The 120 typo is carried over from a typo in the original Videocardz article.
 
Last edited:
Joined
Sep 1, 2020
Messages
2,393 (1.52/day)
Location
Bulgaria
I think that maybe have around 10% spare cache to hot replace of defective cache blocks. Or I'm wrong?
 
Joined
Aug 30, 2006
Messages
7,223 (1.08/day)
System Name ICE-QUAD // ICE-CRUNCH
Processor Q6600 // 2x Xeon 5472
Memory 2GB DDR // 8GB FB-DIMM
Video Card(s) HD3850-AGP // FireGL 3400
Display(s) 2 x Samsung 204Ts = 3200x1200
Audio Device(s) Audigy 2
Software Windows Server 2003 R2 as a Workstation now migrated to W10 with regrets.
Interested to know the TDP of the Xeon 8558U
 
Joined
Sep 20, 2021
Messages
468 (0.39/day)
Processor Ryzen 7 9700x
Motherboard Asrock B650E PG Riptide WiFi
Cooling Underfloor CPU cooling
Memory 2x32GB 6200MT/s
Video Card(s) 4080 SUPER Noctua OC Edition
Storage Kingston Fury Renegade 1TB, Seagate Exos 12TB
Display(s) MSI Optix MAG301RF 2560x1080@200Hz
Case Phanteks Enthoo Pro
Power Supply NZXT C850 850W Gold
Mouse Bloody W95 Max Naraka
Someone test it at geekbench 6 :)

INTEL(R) XEON(R) PLATINUM 8558U / 48 Cores, 96 Threads

1699359186009.png
 
Joined
Aug 30, 2006
Messages
7,223 (1.08/day)
System Name ICE-QUAD // ICE-CRUNCH
Processor Q6600 // 2x Xeon 5472
Memory 2GB DDR // 8GB FB-DIMM
Video Card(s) HD3850-AGP // FireGL 3400
Display(s) 2 x Samsung 204Ts = 3200x1200
Audio Device(s) Audigy 2
Software Windows Server 2003 R2 as a Workstation now migrated to W10 with regrets.
^ doesnt look right. Multicore only 10x faster but 48 cores/96 threads. I would expect a different scale statistic unless they are choking to death due to downclocking and/or cache incoherence/starvation. Quick google search gives me:
The multi-core benchmark tests in Geekbench 6 have also undergone a significant overhaul. Rather than assigning separate tasks to each core, the tests now measure how cores cooperate to complete a shared task.
The new "shared task" approach requires cores to co-operate by sharing information. Given the larger datasets used in Geekbench 6 several workloads are now memory-constrained, rather than CPU-constrained, on most systems.
Not sure GB6 is at all useful for high core CPUs, since, by purpose, high core CPUs are designed to run multiple separate and independent tasks.
 
Joined
Nov 26, 2021
Messages
1,705 (1.52/day)
Location
Mississauga, Canada
Processor Ryzen 7 5700X
Motherboard ASUS TUF Gaming X570-PRO (WiFi 6)
Cooling Noctua NH-C14S (two fans)
Memory 2x16GB DDR4 3200
Video Card(s) Reference Vega 64
Storage Intel 665p 1TB, WD Black SN850X 2TB, Crucial MX300 1TB SATA, Samsung 830 256 GB SATA
Display(s) Nixeus NX-EDG27, and Samsung S23A700
Case Fractal Design R5
Power Supply Seasonic PRIME TITANIUM 850W
Mouse Logitech
VR HMD Oculus Rift
Software Windows 11 Pro, and Ubuntu 20.04
I think that maybe have around 10% spare cache to hot replace of defective cache blocks. Or I'm wrong?
That would be rather wasteful of die space. In the past, much smaller ratios have been used. Over 20 years ago, Intel used less than 2% of the L3 cache's actual capacity for redundancy:

The McKinley’s L3 is composed of 135 identical 24 KB sub-blocks. Of these, 128 are used to store data, 5 are used to hold EDC check bits, and 2 are used for redundancy.
 
Joined
Nov 26, 2021
Messages
1,705 (1.52/day)
Location
Mississauga, Canada
Processor Ryzen 7 5700X
Motherboard ASUS TUF Gaming X570-PRO (WiFi 6)
Cooling Noctua NH-C14S (two fans)
Memory 2x16GB DDR4 3200
Video Card(s) Reference Vega 64
Storage Intel 665p 1TB, WD Black SN850X 2TB, Crucial MX300 1TB SATA, Samsung 830 256 GB SATA
Display(s) Nixeus NX-EDG27, and Samsung S23A700
Case Fractal Design R5
Power Supply Seasonic PRIME TITANIUM 850W
Mouse Logitech
VR HMD Oculus Rift
Software Windows 11 Pro, and Ubuntu 20.04
Well, downsizing over the past period may have made the components less durable despite all measures to protect them.
This redundancy was to account for manufacturing defects; durability for servers running at near constant load shouldn't be a concern. The power density is much less than even a 12600K on the same process and it won't suffer from thermal cycling to the same extent as a typical consumer CPU. At least for TSMC, defect density has improved or held steady for new processes. Of course, only Intel and some of its foundry customers know what Intel's defect rate is.
 
Joined
Sep 1, 2020
Messages
2,393 (1.52/day)
Location
Bulgaria
This redundancy was to account for manufacturing defects; durability for servers running at near constant load shouldn't be a concern. The power density is much less than even a 12600K on the same process and it won't suffer from thermal cycling to the same extent as a typical consumer CPU. At least for TSMC, defect density has improved or held steady for new processes. Of course, only Intel and some of its foundry customers know what Intel's defect rate is.
I didn't even mean the density of defects in production, only those occurring in use. Although defectively manufactured components also occupy some area. This is the state of things. The physics of microscale materials is killer the lower you go.
 
Joined
Nov 26, 2021
Messages
1,705 (1.52/day)
Location
Mississauga, Canada
Processor Ryzen 7 5700X
Motherboard ASUS TUF Gaming X570-PRO (WiFi 6)
Cooling Noctua NH-C14S (two fans)
Memory 2x16GB DDR4 3200
Video Card(s) Reference Vega 64
Storage Intel 665p 1TB, WD Black SN850X 2TB, Crucial MX300 1TB SATA, Samsung 830 256 GB SATA
Display(s) Nixeus NX-EDG27, and Samsung S23A700
Case Fractal Design R5
Power Supply Seasonic PRIME TITANIUM 850W
Mouse Logitech
VR HMD Oculus Rift
Software Windows 11 Pro, and Ubuntu 20.04
I didn't even mean the density of defects in production, only those occurring in use. Although defectively manufactured components also occupy some area. This is the state of things. The physics of microscale materials is killer the lower you go.
Luckily, this is digital logic so variation in threshold voltages, and circuit resistance or capacitance is relatively easily accommodated.
 
Joined
Jan 3, 2021
Messages
3,605 (2.49/day)
Location
Slovenia
Processor i5-6600K
Motherboard Asus Z170A
Cooling some cheap Cooler Master Hyper 103 or similar
Memory 16GB DDR4-2400
Video Card(s) IGP
Storage Samsung 850 EVO 250GB
Display(s) 2x Oldell 24" 1920x1200
Case Bitfenix Nova white windowless non-mesh
Audio Device(s) E-mu 1212m PCI
Power Supply Seasonic G-360
Mouse Logitech Marble trackball, never had a mouse
Keyboard Key Tronic KT2000, no Win key because 1994
Software Oldwin
That 8558Uranium must sit somewhere between the i7-8550U and the i7-8559U, hah.

More seriously, the "U" apparently means a processor without UPI interprocessor links, at least in Sapphire Rapids generation, therefore not able to work in multiprocessor systems.

I didn't even mean the density of defects in production, only those occurring in use. Although defectively manufactured components also occupy some area. This is the state of things. The physics of microscale materials is killer the lower you go.
Static RAM, or dynamic RAM for that matter, can't be compared to NAND, which wears out with use. Any defects in RAM must be detected in the chip testing phase, and worked around once and for all, if at all possible. It is not known how Intel and others correct the defects. Maybe they make use of redundant cache lines. The other option is to just tag the defective cache lines and exclude them from use, so you get 35.992 MiB of L3 for the price of 36 MiB. The OS and applications know nothing about that, it's only the cache controller's business.

But ... if there are several defects in the L3 area, it means that there are defects everywhere, and it's nearly impossible for the rest of the chip to be defect-free. So, while it's useful to have a little bit of redundancy and save some nice costly chips from recycling, it's not useful to have a lot of it.
 
Joined
Dec 30, 2010
Messages
2,200 (0.43/day)
Can someone compare those scores vs a Epyc at the same threads/cores please?

I'm pretty sure the Epyc would be overall better due to it's more efficient design. the sapphire has bin plagued.
 
Top