• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Intel Rocket Lake-S CPU Benchmarked: Up to 22% Faster Compared to the Previous Generation

Joined
Jun 10, 2014
Messages
3,016 (0.78/day)
Processor AMD Ryzen 9 5900X ||| Intel Core i7-3930K
Motherboard ASUS ProArt B550-CREATOR ||| Asus P9X79 WS
Cooling Noctua NH-U14S ||| Be Quiet Pure Rock
Memory Crucial 2 x 16 GB 3200 MHz ||| Corsair 8 x 8 GB 1333 MHz
Video Card(s) MSI GTX 1060 3GB ||| MSI GTX 680 4GB
Storage Samsung 970 PRO 512 GB + 1 TB ||| Intel 545s 512 GB + 256 GB
Display(s) Asus ROG Swift PG278QR 27" ||| Eizo EV2416W 24"
Case Fractal Design Define 7 XL x 2
Audio Device(s) Cambridge Audio DacMagic Plus
Power Supply Seasonic Focus PX-850 x 2
Mouse Razer Abyssus
Keyboard CM Storm QuickFire XT
Software Ubuntu
2. It's true and I can't deny it :) The trend is towards Epyc / Xeon and there is important scaling and the highest possible performance of a single core thanks to the large L2. The relationship between the threads goes to the background, which is confirmed by leaks about the Alderlake system, i.e. GoldenCove. GoldenCove has the same cache type and capacity as WillowCove, ie L2 1.25MB and L3 3MB non-inclusive. You can see that Intel will no longer develop two separate microarchitecture in the era of more and more cores.
Sure, but let me clarify why I believe an inclusive L3 cache isn't beneficial to even some real world multithreaded workloads;
When a cache line is prefetched into L2, an inclusive cache would immediately duplicate this into L3. Then when the cache line is evicted from L2, it would normally be copied to L3, but instead it's already there. The only advantage of having a copy in L3 is in case another core needs it within that very short window before it's moved there anyway, I believe this window isn't many thousands of clock cycles, way too short for any software to be "optimized" for it. The disadvantages are numerous; more wasted die space, increased complexity to maintain integrity with higher core counts, L3 waste increases if L2 increases in size, etc.

I would argue there are other ways to make the cache more efficient, if the concern is sharing cache between cores. As you know, L1 is split into two separate caches; instruction and data. If L2 were split similarly, the design could take advantage of the different usage patterns. Data cache lines have often higher throughput and is rarely shared between threads, while instruction cache lines are often used repeatedly and by many cores within a short time. If L3 were split, L3I could be tightly interconnected, and L3D could be larger and more off-die. That's just an idea, but there might be design considerations I'm not aware of.
 
Top