• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Intel "Granite Rapids-UCC" Powering Xeon 6 6980P Detailed: Up to 132 Cores Possible

btarunr

Editor & Senior Moderator
Staff member
Joined
Oct 9, 2007
Messages
47,129 (7.58/day)
Location
Hyderabad, India
System Name RBMK-1000
Processor AMD Ryzen 7 5700G
Motherboard ASUS ROG Strix B450-E Gaming
Cooling DeepCool Gammax L240 V2
Memory 2x 8GB G.Skill Sniper X
Video Card(s) Palit GeForce RTX 2080 SUPER GameRock
Storage Western Digital Black NVMe 512GB
Display(s) BenQ 1440p 60 Hz 27-inch
Case Corsair Carbide 100R
Audio Device(s) ASUS SupremeFX S1220A
Power Supply Cooler Master MWE Gold 650W
Mouse ASUS ROG Strix Impact
Keyboard Gamdias Hermes E2
Software Windows 11 Pro
Intel on Tuesday announced the ultra core-count (UCC) variant of its "Granite Rapids" server processor microarchitecture, with the introduction of new Xeon 6 series SKUs with P-core counts as high as 128-core/256-thread per socket. "Granite Rapids" microarchitecture uses "Redwood Cove" performance cores. These are newer than the "Raptor Cove" cores powering "Sapphire Rapids," and a client version of these cores power the Core Ultra 1-series "Meteor Lake" processors. The server version of "Redwood Cove" comes with 112 KB of L1 cache (64 KB L1I and 48 KB L1D), and 2 MB of dedicated L2 cache. The server version of "Redwood Cove" is optimized for the mesh interconnect core layout on the three compute tiles making up "Granite Rapids-UCC," each core comes with a 3.93 MB segment of the tile's 168 MB shared L3 cache.

Perhaps the biggest change between the client and server variants of "Redwood Cove" are the AMX-FP16 and AVX512-FP16 instruction sets. The 128-core Xeon 6 6980P processor is based on the "Granite Rapids-UCC" package, which has three compute tiles of 44 cores, each, and a 4-channel memory interface, each. The three compute tiles have cache coherence, and so each core on any of the three tiles can benefit from the processor's 12-channel DDR5 memory interface. The package also has two SoC tiles with a 48-lane PCIe Gen 5 or CXL 2.0 root complex, making up a total of 96 lanes. The processor has 6 UPI links for multi-socket machines, and supports up to a 2P configuration per system, for a maximum core-count of 256 P-cores. Each of the three compute tiles is built on the Intel 3 foundry node, while the two SoC tiles are built on the Intel 7 node.



For its server processors, Intel does not resort to the same power measurement system as its client processors (base power, turbo power), since it has to stick to industry standards that guide data-center architects, and so it uses a flat TDP rating. The lineup is led by the Xeon 6980P, with its 128-core/256-thread core-count, a 2.00 GHz base frequency, 3.20 GHz all-core boost, 3.90 GHz maximum boost frequency, 504 MB of shared L3 cache, and 500 W TDP. The 6979P is next up, with a 120-core/240-thread configuration, 2.10 GHz base frequency, 3.20 GHz all-core boost, and 3.90 GHz maximum turbo frequency. Interestingly, it has the same 504 MB L3 cache size as the 6980P, as well as its 500 W TDP rating.



The Xeon 6972P is an interesting SKU, as it directly squares off against the top AMD EPYC "Genoa" part, with its 96-core/192-thread config. It ticks at 2.40 GHz base, 3.50 GHz all-core boost, and 3.90 GHz maximum boost. The L3 cache gets a slight haircut to a respectable 480 MB, but the TDP remains at 500 W. The 6952P has the same 96-core/192-thread config, but with lower clock speeds and TDP, with a 2.10 GHz base, 3.20 GHz all-core boost, and 3.90 GHz maximum boost. The TDP is reduced to 400 W.

The Xeon 6960P should appeal to the compute server market, with its balance of core counts and clock speeds. It comes with a 72-core/144-thread configuration, but the highest clock speeds in the lineup. This includes a 2.70 GHz base frequency, 3.80 GHz all-core boost, and 3.90 GHz maximum boost. The L3 cache size is 432 MB, while the TDP has been raised to 500 W to support the clock speeds.



All five models mentioned above are 2P capable, have 12-channel DDR5 memory interfaces, and native memory speeds of DDR5-6400 using conventional RDIMMs, or DDR5-8800 using MRDIMMs. All SKUs also have 96 PCIe Gen 5 or CXL 2.0 lanes. All SKUs also receive the same on-package accelerators (fixed function hardware to accelerate popular kinds of server applications), which include DSA (data streaming accelerator), IAA (in-memory analysis accelerator), QAT (Quick Assist Technology), and Dynamic Load Balancing (DLB).

View at TechPowerUp Main Site
 
Joined
Jan 11, 2022
Messages
787 (0.78/day)
A 2 socket server would replace a small data centre less than 20 years ago
 
Joined
Jun 29, 2018
Messages
532 (0.23/day)
Each of the three compute tiles is built on the Intel 4 foundry node—the same node on which the company builds the compute tile of "Meteor Lake," while the two SoC tiles are built on the Intel 7 node.
It's Intel 3 not Intel 4 for the compute tiles, so not the same as Meteor Lake ;)

 
Joined
Aug 13, 2010
Messages
5,460 (1.05/day)
Reducing this chip by one compute core tile and serving it as a high end desktop platform with lots of Lion Cove P-cores is something I want that will never happen. Absolute dread.
 
Joined
Jun 29, 2018
Messages
532 (0.23/day)
Reducing this chip by one compute core tile and serving it as a high end desktop platform with lots of Lion Cove P-cores is something I want that will never happen. Absolute dread.
Intel has shown that this platform is very flexible, so what you want might not be out of the question:
 
Joined
Aug 13, 2010
Messages
5,460 (1.05/day)
Intel has shown that this platform is very flexible, so what you want might not be out of the question:
I know its very technically possible, but Intel does not exist in the state of luxury it once did to allow spin-off HEDT platforms anymore, im afraid. Not in its current situation.
For them, it would be choosing to make a lot less cash per fabricated chip.
 
Joined
Jan 3, 2021
Messages
3,379 (2.44/day)
Location
Slovenia
Processor i5-6600K
Motherboard Asus Z170A
Cooling some cheap Cooler Master Hyper 103 or similar
Memory 16GB DDR4-2400
Video Card(s) IGP
Storage Samsung 850 EVO 250GB
Display(s) 2x Oldell 24" 1920x1200
Case Bitfenix Nova white windowless non-mesh
Audio Device(s) E-mu 1212m PCI
Power Supply Seasonic G-360
Mouse Logitech Marble trackball, never had a mouse
Keyboard Key Tronic KT2000, no Win key because 1994
Software Oldwin
Apparently they get to 128 cores by combining three 43-core dies, with one defective core.

The amount of L3 per core is really weird here. Perhaps in this new architecture there isn't a slice of L3 attached to each core but rather a pool of last-level cache, attached to the mesh bus separately from the cores. Does this make any sense?
 
Joined
Dec 12, 2016
Messages
1,721 (0.60/day)
How is 132 cores possible?

Edit: oh i see. 44 times 3 is 132. So 128 is a fully functional die of 44 plus two partially functioning dies of 42.

Weird.
 
Joined
Mar 21, 2016
Messages
2,495 (0.80/day)
Apparently they get to 128 cores by combining three 43-core dies, with one defective core.

The amount of L3 per core is really weird here. Perhaps in this new architecture there isn't a slice of L3 attached to each core but rather a pool of last-level cache, attached to the mesh bus separately from the cores. Does this make any sense?

I had suggested they do similar at one point with cache for P cores similar to what was done with E cores. I believe what I was thinking of at the time was a L4 cache that's round robin pooled access to it underneath with TSV. Basically a shared pool cache that can be tapped into on a need to be basis in round robin as well as some reserved cache for typical workload operations.
 
Joined
Jun 11, 2017
Messages
261 (0.10/day)
Location
Montreal Canada
128 core Xeons and still cannot make a pure 16 core desktop. Lame Intel is lame.
 

tfp

Joined
Jun 14, 2023
Messages
73 (0.15/day)
128 core Xeons and still cannot make a pure 16 core desktop. Lame Intel is lame.
There are 8, 12, 16 core Xeon Ws and I think it scales up some from there. This is standard Intel practice of splitting workstation and desktop CPUs based on P-core count, memory channels, and PCIe lanes.
 
Joined
Jun 10, 2014
Messages
2,975 (0.79/day)
Processor AMD Ryzen 9 5900X ||| Intel Core i7-3930K
Motherboard ASUS ProArt B550-CREATOR ||| Asus P9X79 WS
Cooling Noctua NH-U14S ||| Be Quiet Pure Rock
Memory Crucial 2 x 16 GB 3200 MHz ||| Corsair 8 x 8 GB 1333 MHz
Video Card(s) MSI GTX 1060 3GB ||| MSI GTX 680 4GB
Storage Samsung 970 PRO 512 GB + 1 TB ||| Intel 545s 512 GB + 256 GB
Display(s) Asus ROG Swift PG278QR 27" ||| Eizo EV2416W 24"
Case Fractal Design Define 7 XL x 2
Audio Device(s) Cambridge Audio DacMagic Plus
Power Supply Seasonic Focus PX-850 x 2
Mouse Razer Abyssus
Keyboard CM Storm QuickFire XT
Software Ubuntu
Reducing this chip by one compute core tile and serving it as a high end desktop platform with lots of Lion Cove P-cores is something I want that will never happen. Absolute dread.
So basically Xeon W?
Intel usually launches a workstation derivative of their Xeon scalable platform about half a cycle later.
The current Sapphire Rapids refresh might not be super-impressive considering it's at the tail end of its lifecycle (and Arrow Lake being imminent), but with the next generation potentially offering 6400 MHz DDR5 (and 8800 MHz MRDIMMs?) along with faster cores and all the usual high-end workstation stuff, is going to get quite interesting for workstation and power users.

If yields are good enough for them to push them a little beyond 5 GHz, they would probably be appealing to a lot of power users who feel seriously constrained on the mainstream platforms, whether it's due to gaming performance or specific applications which they think needs those super high (short) boosts. These workstation CPUs will however offer much more performance than the specs alone will lead people to believe, both due to more consistent performance, but also features like much higher memory bandwidth and dual AVX-512 FMAs which is a real powerhouse in many heavy loads.

I'm looking forward to a deep-dive in the performance and latency implications of MRDIMMs, and whether this is something that may add to the benefits of a workstation platform vs. mainstream.
 

Aquinus

Resident Wat-man
Joined
Jan 28, 2012
Messages
13,161 (2.83/day)
Location
Concord, NH, USA
System Name Apollo
Processor Intel Core i9 9880H
Motherboard Some proprietary Apple thing.
Memory 64GB DDR4-2667
Video Card(s) AMD Radeon Pro 5600M, 8GB HBM2
Storage 1TB Apple NVMe, 4TB External
Display(s) Laptop @ 3072x1920 + 2x LG 5k Ultrafine TB3 displays
Case MacBook Pro (16", 2019)
Audio Device(s) AirPods Pro, Sennheiser HD 380s w/ FIIO Alpen 2, or Logitech 2.1 Speakers
Power Supply 96w Power Adapter
Mouse Logitech MX Master 3
Keyboard Logitech G915, GL Clicky
Software MacOS 12.1
The only thing coming to my mind is that Intel poking fun at AMD gluing cores together didn't age very well. Nifty tech here, brought to you by glue. :laugh:
 
Joined
Jan 3, 2021
Messages
3,379 (2.44/day)
Location
Slovenia
Processor i5-6600K
Motherboard Asus Z170A
Cooling some cheap Cooler Master Hyper 103 or similar
Memory 16GB DDR4-2400
Video Card(s) IGP
Storage Samsung 850 EVO 250GB
Display(s) 2x Oldell 24" 1920x1200
Case Bitfenix Nova white windowless non-mesh
Audio Device(s) E-mu 1212m PCI
Power Supply Seasonic G-360
Mouse Logitech Marble trackball, never had a mouse
Keyboard Key Tronic KT2000, no Win key because 1994
Software Oldwin
ah yes I can see me pulling a 4 phase power cable to the home office in 2028 so I can power my workstation
... which will have around 300 cores inside each CPU, right?
 
Top