Wednesday, September 25th 2024

Intel "Granite Rapids-UCC" Powering Xeon 6 6980P Detailed: Up to 132 Cores Possible

Intel on Tuesday announced the ultra core-count (UCC) variant of its "Granite Rapids" server processor microarchitecture, with the introduction of new Xeon 6 series SKUs with P-core counts as high as 128-core/256-thread per socket. "Granite Rapids" microarchitecture uses "Redwood Cove" performance cores. These are newer than the "Raptor Cove" cores powering "Sapphire Rapids," and a client version of these cores power the Core Ultra 1-series "Meteor Lake" processors. The server version of "Redwood Cove" comes with 112 KB of L1 cache (64 KB L1I and 48 KB L1D), and 2 MB of dedicated L2 cache. The server version of "Redwood Cove" is optimized for the mesh interconnect core layout on the three compute tiles making up "Granite Rapids-UCC," each core comes with a 3.93 MB segment of the tile's 168 MB shared L3 cache.

Perhaps the biggest change between the client and server variants of "Redwood Cove" are the AMX-FP16 and AVX512-FP16 instruction sets. The 128-core Xeon 6 6980P processor is based on the "Granite Rapids-UCC" package, which has three compute tiles of 44 cores, each, and a 4-channel memory interface, each. The three compute tiles have cache coherence, and so each core on any of the three tiles can benefit from the processor's 12-channel DDR5 memory interface. The package also has two SoC tiles with a 48-lane PCIe Gen 5 or CXL 2.0 root complex, making up a total of 96 lanes. The processor has 6 UPI links for multi-socket machines, and supports up to a 2P configuration per system, for a maximum core-count of 256 P-cores. Each of the three compute tiles is built on the Intel 3 foundry node, while the two SoC tiles are built on the Intel 7 node.
For its server processors, Intel does not resort to the same power measurement system as its client processors (base power, turbo power), since it has to stick to industry standards that guide data-center architects, and so it uses a flat TDP rating. The lineup is led by the Xeon 6980P, with its 128-core/256-thread core-count, a 2.00 GHz base frequency, 3.20 GHz all-core boost, 3.90 GHz maximum boost frequency, 504 MB of shared L3 cache, and 500 W TDP. The 6979P is next up, with a 120-core/240-thread configuration, 2.10 GHz base frequency, 3.20 GHz all-core boost, and 3.90 GHz maximum turbo frequency. Interestingly, it has the same 504 MB L3 cache size as the 6980P, as well as its 500 W TDP rating.
The Xeon 6972P is an interesting SKU, as it directly squares off against the top AMD EPYC "Genoa" part, with its 96-core/192-thread config. It ticks at 2.40 GHz base, 3.50 GHz all-core boost, and 3.90 GHz maximum boost. The L3 cache gets a slight haircut to a respectable 480 MB, but the TDP remains at 500 W. The 6952P has the same 96-core/192-thread config, but with lower clock speeds and TDP, with a 2.10 GHz base, 3.20 GHz all-core boost, and 3.90 GHz maximum boost. The TDP is reduced to 400 W.

The Xeon 6960P should appeal to the compute server market, with its balance of core counts and clock speeds. It comes with a 72-core/144-thread configuration, but the highest clock speeds in the lineup. This includes a 2.70 GHz base frequency, 3.80 GHz all-core boost, and 3.90 GHz maximum boost. The L3 cache size is 432 MB, while the TDP has been raised to 500 W to support the clock speeds.
All five models mentioned above are 2P capable, have 12-channel DDR5 memory interfaces, and native memory speeds of DDR5-6400 using conventional RDIMMs, or DDR5-8800 using MRDIMMs. All SKUs also have 96 PCIe Gen 5 or CXL 2.0 lanes. All SKUs also receive the same on-package accelerators (fixed function hardware to accelerate popular kinds of server applications), which include DSA (data streaming accelerator), IAA (in-memory analysis accelerator), QAT (Quick Assist Technology), and Dynamic Load Balancing (DLB).
Add your own comment

14 Comments on Intel "Granite Rapids-UCC" Powering Xeon 6 6980P Detailed: Up to 132 Cores Possible

#1
kondamin
A 2 socket server would replace a small data centre less than 20 years ago
Posted on Reply
#2
ncrs
Each of the three compute tiles is built on the Intel 4 foundry node—the same node on which the company builds the compute tile of "Meteor Lake," while the two SoC tiles are built on the Intel 7 node.
It's Intel 3 not Intel 4 for the compute tiles, so not the same as Meteor Lake ;)

Posted on Reply
#3
dj-electric
Reducing this chip by one compute core tile and serving it as a high end desktop platform with lots of Lion Cove P-cores is something I want that will never happen. Absolute dread.
Posted on Reply
#4
ncrs
dj-electricReducing this chip by one compute core tile and serving it as a high end desktop platform with lots of Lion Cove P-cores is something I want that will never happen. Absolute dread.
Intel has shown that this platform is very flexible, so what you want might not be out of the question:
Posted on Reply
#5
dj-electric
ncrsIntel has shown that this platform is very flexible, so what you want might not be out of the question:
I know its very technically possible, but Intel does not exist in the state of luxury it once did to allow spin-off HEDT platforms anymore, im afraid. Not in its current situation.
For them, it would be choosing to make a lot less cash per fabricated chip.
Posted on Reply
#6
Wirko
Apparently they get to 128 cores by combining three 43-core dies, with one defective core.

The amount of L3 per core is really weird here. Perhaps in this new architecture there isn't a slice of L3 attached to each core but rather a pool of last-level cache, attached to the mesh bus separately from the cores. Does this make any sense?
Posted on Reply
#7
Daven
How is 132 cores possible?

Edit: oh i see. 44 times 3 is 132. So 128 is a fully functional die of 44 plus two partially functioning dies of 42.

Weird.
Posted on Reply
#8
InVasMani
WirkoApparently they get to 128 cores by combining three 43-core dies, with one defective core.

The amount of L3 per core is really weird here. Perhaps in this new architecture there isn't a slice of L3 attached to each core but rather a pool of last-level cache, attached to the mesh bus separately from the cores. Does this make any sense?
I had suggested they do similar at one point with cache for P cores similar to what was done with E cores. I believe what I was thinking of at the time was a L4 cache that's round robin pooled access to it underneath with TSV. Basically a shared pool cache that can be tapped into on a need to be basis in round robin as well as some reserved cache for typical workload operations.
Posted on Reply
#9
Lycanwolfen
128 core Xeons and still cannot make a pure 16 core desktop. Lame Intel is lame.
Posted on Reply
#10
kondamin
Lycanwolfen128 core Xeons and still cannot make a pure 16 core desktop. Lame Intel is lame.
Those should be possible now they are finally using foveros, I kinda doubt you would want a ~500W cpu in your desktop though.
Posted on Reply
#11
tfp
Lycanwolfen128 core Xeons and still cannot make a pure 16 core desktop. Lame Intel is lame.
There are 8, 12, 16 core Xeon Ws and I think it scales up some from there. This is standard Intel practice of splitting workstation and desktop CPUs based on P-core count, memory channels, and PCIe lanes.
Posted on Reply
#12
Lycanwolfen
kondaminThose should be possible now they are finally using foveros, I kinda doubt you would want a ~500W cpu in your desktop though.
Why not we have 600 watt video cards comming. Your Logic.
Posted on Reply
#13
efikkan
dj-electricReducing this chip by one compute core tile and serving it as a high end desktop platform with lots of Lion Cove P-cores is something I want that will never happen. Absolute dread.
So basically Xeon W?
Intel usually launches a workstation derivative of their Xeon scalable platform about half a cycle later.
The current Sapphire Rapids refresh might not be super-impressive considering it's at the tail end of its lifecycle (and Arrow Lake being imminent), but with the next generation potentially offering 6400 MHz DDR5 (and 8800 MHz MRDIMMs?) along with faster cores and all the usual high-end workstation stuff, is going to get quite interesting for workstation and power users.

If yields are good enough for them to push them a little beyond 5 GHz, they would probably be appealing to a lot of power users who feel seriously constrained on the mainstream platforms, whether it's due to gaming performance or specific applications which they think needs those super high (short) boosts. These workstation CPUs will however offer much more performance than the specs alone will lead people to believe, both due to more consistent performance, but also features like much higher memory bandwidth and dual AVX-512 FMAs which is a real powerhouse in many heavy loads.

I'm looking forward to a deep-dive in the performance and latency implications of MRDIMMs, and whether this is something that may add to the benefits of a workstation platform vs. mainstream.
Posted on Reply
#14
Aquinus
Resident Wat-man
The only thing coming to my mind is that Intel poking fun at AMD gluing cores together didn't age very well. Nifty tech here, brought to you by glue. :laugh:
Posted on Reply
Add your own comment
Sep 26th, 2024 22:32 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts