• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Intel Increases L1D and L2 Cache Sizes with "Ice Lake"

btarunr

Editor & Senior Moderator
Staff member
Joined
Oct 9, 2007
Messages
47,385 (7.52/day)
Location
Hyderabad, India
System Name RBMK-1000
Processor AMD Ryzen 7 5700G
Motherboard ASUS ROG Strix B450-E Gaming
Cooling DeepCool Gammax L240 V2
Memory 2x 8GB G.Skill Sniper X
Video Card(s) Palit GeForce RTX 2080 SUPER GameRock
Storage Western Digital Black NVMe 512GB
Display(s) BenQ 1440p 60 Hz 27-inch
Case Corsair Carbide 100R
Audio Device(s) ASUS SupremeFX S1220A
Power Supply Cooler Master MWE Gold 650W
Mouse ASUS ROG Strix Impact
Keyboard Gamdias Hermes E2
Software Windows 11 Pro
Intel's next major CPU microarchitecture being designed for the 10 nm silicon fabrication process, codenamed "Ice Lake," could introduce the first major core redesign in over three years. Keen observers of Geekbench database submissions of dual-core "Ice Lake" processor engineering samples noticed something curious - Intel has increased its L1 and L2 cache sizes from previous generations.

The L1 data cache has been enlarged to 48 KB from 32 KB of current-generation "Coffee Lake," and more interestingly, the L2 cache has been doubled in size to 512 KB, from 256 KB. The L1 instruction cache is still 32 KB in size, while the shared L3 cache for this dual-core chip is 4 MB. The "Ice Lake" chip in question is still a "mainstream" rendition of the microarchitecture, and not an enterprise version, which has had a "re-balanced" cache hierarchy since "Skylake-X," which combined large 1 MB L2 caches with relatively smaller shared L3 caches.



View at TechPowerUp Main Site
 
D

Deleted member 178884

Guest
Moving cache back up? Interesting since the HEDT got hit harder, the cache total on the 6950x was more than the 7980xe.
 
Joined
Oct 8, 2015
Messages
774 (0.23/day)
Location
Earth's Troposphere
System Name 3 "rigs"-gaming/spare pc/cruncher
Processor R7-5800X3D/i7-7700K/R9-7950X
Motherboard Asus ROG Crosshair VI Extreme/Asus Ranger Z170/Asus ROG Crosshair X670E-GENE
Cooling Bitspower monoblock ,custom open loop,both passive and active/air tower cooler/air tower cooler
Memory 32GB DDR4/32GB DDR4/64GB DDR5
Video Card(s) Gigabyte RX6900XT Alphacooled/AMD RX5700XT 50th Aniv./SOC(onboard)
Storage mix of sata ssds/m.2 ssds/mix of sata ssds+an m.2 ssd
Display(s) Dell UltraSharp U2410 , HP 24x
Case mb box/Silverstone Raven RV-05/CoolerMaster Q300L
Audio Device(s) onboard/onboard/onboard
Power Supply 3 Seasonics, a DeltaElectronics, a FractalDesing
Mouse various/various/various
Keyboard various wired and wireless
VR HMD -
Software W10.someting or another,all 3
Good, or is it ? if its a new CPU architecture I cant argue on no basis whatsoever, more cache is "more better" is it as fast? what are the benefits ? can the chain of micro-ops handle it or is the scheduler up to the task, but, yet again I is no expert.
 
Joined
May 8, 2016
Messages
1,936 (0.61/day)
System Name BOX
Processor Core i7 6950X @ 4,26GHz (1,28V)
Motherboard X99 SOC Champion (BIOS F23c + bifurcation mod)
Cooling Thermalright Venomous-X + 2x Delta 38mm PWM (Push-Pull)
Memory Patriot Viper Steel 4000MHz CL16 4x8GB (@3240MHz CL12.12.12.24 CR2T @ 1,48V)
Video Card(s) Titan V (~1650MHz @ 0.77V, HBM2 1GHz, Forced P2 state [OFF])
Storage WD SN850X 2TB + Samsung EVO 2TB (SATA) + Seagate Exos X20 20TB (4Kn mode)
Display(s) LG 27GP950-B
Case Fractal Design Meshify 2 XL
Audio Device(s) Motu M4 (audio interface) + ATH-A900Z + Behringer C-1
Power Supply Seasonic X-760 (760W)
Mouse Logitech RX-250
Keyboard HP KB-9970
Software Windows 10 Pro x64
In general, having bigger cache is good. However lalency is also important.
Programs that can fit in smaller cache should execute faster on older tech, if latency on bigger cache is higher.

Also, last L1 bump on "consumer grade" platform was with Conroe (from Netburst) and we have 256kB L2 since Nehalem (first gen Core I series).
Intel never released a large L3 caches per core on LGA11xx platforms (always 2MB/core max.).
 
Last edited:
Joined
Jun 10, 2014
Messages
3,009 (0.78/day)
Processor AMD Ryzen 9 5900X ||| Intel Core i7-3930K
Motherboard ASUS ProArt B550-CREATOR ||| Asus P9X79 WS
Cooling Noctua NH-U14S ||| Be Quiet Pure Rock
Memory Crucial 2 x 16 GB 3200 MHz ||| Corsair 8 x 8 GB 1333 MHz
Video Card(s) MSI GTX 1060 3GB ||| MSI GTX 680 4GB
Storage Samsung 970 PRO 512 GB + 1 TB ||| Intel 545s 512 GB + 256 GB
Display(s) Asus ROG Swift PG278QR 27" ||| Eizo EV2416W 24"
Case Fractal Design Define 7 XL x 2
Audio Device(s) Cambridge Audio DacMagic Plus
Power Supply Seasonic Focus PX-850 x 2
Mouse Razer Abyssus
Keyboard CM Storm QuickFire XT
Software Ubuntu
We still don't know the details of Ice Lake, but the new details look like this in comparison with existing architectures:
cpu_cache.png

(based on info around the web, may not be 100% accurate)

One thing I consider interesting is that Intel seem to prioritize L1 data cache while AMD prioritizes L1 instruction cache.

Moving cache back up? Interesting since the HEDT got hit harder, the cache total on the 6950x was more than the 7980xe.
What?
The L3 cache on Skylake-X works differently. Prior generations had an inclusive L3 cache, meaning L2 will be duplicated in L3, so effectively the L3 cache size of older generations is 1.75 MB. Skylake-X also quadrupled the L2 cache, leading to an effective increase in cache per core, but more importantly, a more efficient cache.

Good, or is it ? if its a new CPU architecture I cant argue on no basis whatsoever, more cache is "more better" is it as fast? what are the benefits ? can the chain of micro-ops handle it or is the scheduler up to the task, but, yet again I is no expert.
Cache have always been more complex than just "more is better".
I believe even the old 80486 supported something like 512 kB of off-chip L2 cache.
For cache it comes down to latency, throughput and die space. Fewer banks may give higher cache efficiency, but lower bandwidth and higher complexity. More banks is simpler, gives higher bandwidth, but sacrifices cache efficiency. Latency is even tougher, it depends on the implementation.
 
Joined
Mar 13, 2018
Messages
68 (0.03/day)
Will need to wait for an actual product. Right now Intel 10nm is vaporware/rumormill at best.
 
Joined
Nov 15, 2016
Messages
454 (0.15/day)
System Name Sillicon Nightmares
Processor Intel i7 9700KF 5ghz (5.1ghz 4 core load, no avx offset), 4.7ghz ring, 1.412vcore 1.3vcio 1.264vcsa
Motherboard Asus Z390 Strix F
Cooling DEEPCOOL Gamer Storm CAPTAIN 360
Memory 2x8GB G.Skill Trident Z RGB (B-Die) 3600 14-14-14-28 1t, tRFC 220 tREFI 65535, tFAW 16, 1.545vddq
Video Card(s) ASUS GTX 1060 Strix 6GB XOC, Core: 2202-2240, Vcore: 1.075v, Mem: 9818mhz (Sillicon Lottery Jackpot)
Storage Samsung 840 EVO 1TB SSD, WD Blue 1TB, Seagate 3TB, Samsung 970 Evo Plus 512GB
Display(s) BenQ XL2430 1080p 144HZ + (2) Samsung SyncMaster 913v 1280x1024 75HZ + A Shitty TV For Movies
Case Deepcool Genome ROG Edition
Audio Device(s) Bunta Sniff Speakers From The Tip Edition With Extra Kenwoods
Power Supply Corsair AX860i/Cable Mod Cables
Mouse Logitech G602 Spilled Beer Edition
Keyboard Dell KB4021
Software Windows 10 x64
Benchmark Scores 13543 Firestrike (3dmark.com/fs/22336777) 601 points CPU-Z ST 37.4ns AIDA Memory
Joined
May 20, 2011
Messages
227 (0.05/day)
System Name Windows 10 Pro 64 bit
Processor Ryzen 5 5600 @4.65 GHz
Motherboard Asus ROG X570-E
Cooling Thermalright
Memory 32 GB 3200 MHz
Video Card(s) Asus RX 6700XT 12 GB Dual
Storage 1TB Samsung 970 EVO Plus
Display(s) SS QHD 144Hz + LG 55 Inch 4K
Case Corsair 4000D
Power Supply Superflower 850
IceLake is running at 16 GB Dual Channel and on Linux. Other i3-7130U laptops maybe were running in Single Channel and on Windows
 

hat

Enthusiast
Joined
Nov 20, 2006
Messages
21,750 (3.28/day)
Location
Ohio
System Name Starlifter :: Dragonfly
Processor i7 2600k 4.4GHz :: i5 10400
Motherboard ASUS P8P67 Pro :: ASUS Prime H570-Plus
Cooling Cryorig M9 :: Stock
Memory 4x4GB DDR3 2133 :: 2x8GB DDR4 2400
Video Card(s) PNY GTX1070 :: Integrated UHD 630
Storage Crucial MX500 1TB, 2x1TB Seagate RAID 0 :: Mushkin Enhanced 60GB SSD, 3x4TB Seagate HDD RAID5
Display(s) Onn 165hz 1080p :: Acer 1080p
Case Antec SOHO 1030B :: Old White Full Tower
Audio Device(s) Creative X-Fi Titanium Fatal1ty Pro - Bose Companion 2 Series III :: None
Power Supply FSP Hydro GE 550w :: EVGA Supernova 550
Software Windows 10 Pro - Plex Server on Dragonfly
Benchmark Scores >9000
Interesting. I thought cache sizes got smaller as we moved away from FSB on to QPI/DMI because QPI/DMI was so much faster than FSB, so large caches weren't needed...
 
Joined
Jun 12, 2017
Messages
136 (0.05/day)
Interesting. I thought cache sizes got smaller as we moved away from FSB on to QPI/DMI because QPI/DMI was so much faster than FSB, so large caches weren't needed...
Nope... Cache is still a must. No matter how fast the memory/IO may be, internal CPU pipelines will always be way faster.

Cache on Skylake-X works differently. Prior generations had an inclusive L3 cache, meaning L2 will be duplicated in L3, so effectively the L3 cache size of older generations is 1.75 MB. Skylake-X also quadrupled the L2 cache, leading to an effective increase in cache per core, but more importantly, a more efficient cache.
I agree with your opinion. Only that SKL-SP's victim cache is not necessarily more efficient. The efficiency of a victim cache and an inclusive cache depends on the workload.
And by the way, Ryzen uses victim cache too, similar to SKL-SP.
 
Last edited:
Joined
Apr 8, 2008
Messages
342 (0.06/day)
System Name Xajel Main
Processor AMD Ryzen 7 5800X
Motherboard ASRock X570M Steel Legened
Cooling Corsair H100i PRO
Memory G.Skill DDR4 3600 32GB (2x16GB)
Video Card(s) ZOTAC GAMING GeForce RTX 3080 Ti AMP Holo
Storage (OS) Gigabyte AORUS NVMe Gen4 1TB + (Personal) WD Black SN850X 2TB + (Store) WD 8TB HDD
Display(s) LG 38WN95C Ultrawide 3840x1600 144Hz
Case Cooler Master CM690 III
Audio Device(s) Built-in Audio + Yamaha SR-C20 Soundbar
Power Supply Thermaltake 750W
Mouse Logitech MK710 Combo
Keyboard Logitech MK710 Combo (M705)
Software Windows 11 Pro
Interesting. I thought cache sizes got smaller as we moved away from FSB on to QPI/DMI because QPI/DMI was so much faster than FSB, so large caches weren't needed...

The main idea of the memory subsystem is to scale directly to how close the data is being processed handled, the closer they're the closer they get to the processor while being handled faster and with much lower latency. The L1 cache handles the actual data being processed at that exact time, that's why it consist of L1 Data and L1 Instruction, as the processor will use the L1i to process the data currently on the L1d. L2 cache contains data to be processed next or the rest of the data that the L1d cant handle. And you guessed it, the L3 contains data of the next level. And then the RAM, and finally the rest is on the HDD.

When ever a higher priority cache/memory is not enough, the system will use the next available, so if L1d is not enough, the next step is L2, when that is full then L3 comes (if available), and when there's L4 cache it will be the next level also, if not RAM will be used and so on.

Do you remember why the system becomes very slow when you have heavy applications and low RAM ? so upgrading RAM sped up your system noticeably then ? or when you finally upgraded to SSD and saw a huge jump in responsiveness and speed ? This what happens if the higher level cache/memory becomes too low and the system is forced to go for the next "slower" one.


When Intel first released Celeron, they experimented with L2 cache less one to make it cost less, it did cost less to make. But it performed horribly. They quickly scrapped that and the next update came with L2.

So why not having more and more of cache ? there's several things to consider:-
1- Cache are expensive: They require a lot of die area and consume power.
2- More cache brings latency: The larger the cache is the more time it takes to actually look for the data you need, and latency is crucial here.
3- Performance gain with more cache is not linear.
4- Architecture favouring: Duo to the second and third points, and how the architecture actually handles the data and cache hierarchy is working, there will be an optimal cache size for each level that brings the most performance at best power/cost. Adding more might rise the power/cost too much for little performance boost or might actually bring performance down a little for some latency critical applications.
 
Joined
Jun 10, 2014
Messages
3,009 (0.78/day)
Processor AMD Ryzen 9 5900X ||| Intel Core i7-3930K
Motherboard ASUS ProArt B550-CREATOR ||| Asus P9X79 WS
Cooling Noctua NH-U14S ||| Be Quiet Pure Rock
Memory Crucial 2 x 16 GB 3200 MHz ||| Corsair 8 x 8 GB 1333 MHz
Video Card(s) MSI GTX 1060 3GB ||| MSI GTX 680 4GB
Storage Samsung 970 PRO 512 GB + 1 TB ||| Intel 545s 512 GB + 256 GB
Display(s) Asus ROG Swift PG278QR 27" ||| Eizo EV2416W 24"
Case Fractal Design Define 7 XL x 2
Audio Device(s) Cambridge Audio DacMagic Plus
Power Supply Seasonic Focus PX-850 x 2
Mouse Razer Abyssus
Keyboard CM Storm QuickFire XT
Software Ubuntu
The main idea of the memory subsystem is to scale directly to how close the data is being processed handled, the closer they're the closer they get to the processor while being handled faster and with much lower latency. The L1 cache handles the actual data being processed at that exact time, that's why it consist of L1 Data and L1 Instruction, as the processor will use the L1i to process the data currently on the L1d. L2 cache contains data to be processed next or the rest of the data that the L1d cant handle. And you guessed it, the L3 contains data of the next level. And then the RAM, and finally the rest is on the HDD.
The purpose of the cache is to hide latency.
To be precise, L1 is still a cache, the actual data being processed are in registers.

Even some introductory books in CS describe the cache hierarchy incorrectly. L1…L3 is just a streaming buffer, it contains code and data that is likely to be used or have been recently used. Many mistakenly think that the most important stuff is stored in L1, then L2 and so on. These caches are overwritten thousands of times per second, no data ever stays there for long. And it's not like your running program can fit in there, or your most important variables in code.

Modern CPUs do aggressive prefetching, which means it preloads data you might need. Each bank in the cache is a usually a Least Recently Used(LRU) queue, which means that any time one cache line is written, the oldest one is discarded. So caching things that are not needed may actually replace useful data. Depending on workload, the cache may at times be mostly wasted, but it's of course still better than no cache.

Do you remember why the system becomes very slow when you have heavy applications and low RAM ? so upgrading RAM sped up your system noticeably then ? or when you finally upgraded to SSD and saw a huge jump in responsiveness and speed ? This what happens if the higher level cache/memory becomes too low and the system is forced to go for the next "slower" one.
SSDs does wonders for file operations, but only affects responsiveness when the OS is swapping heavily, and by that point the system is too sluggish anyway. There is a lot of placebo tied to the benefits of SSDs. Don't get me wrong, SSDs are good, but they don't make code faster.
 
Top