Intel Increases L1D and L2 Cache Sizes with "Ice Lake"

btarunr · Oct 23, 2018

Intel's next major CPU microarchitecture being designed for the 10 nm silicon fabrication process, codenamed "Ice Lake," could introduce the first major core redesign in over three years. Keen observers of Geekbench database submissions of dual-core "Ice Lake" processor engineering samples noticed something curious - Intel has increased its L1 and L2 cache sizes from previous generations.

The L1 data cache has been enlarged to 48 KB from 32 KB of current-generation "Coffee Lake," and more interestingly, the L2 cache has been doubled in size to 512 KB, from 256 KB. The L1 instruction cache is still 32 KB in size, while the shared L3 cache for this dual-core chip is 4 MB. The "Ice Lake" chip in question is still a "mainstream" rendition of the microarchitecture, and not an enterprise version, which has had a "re-balanced" cache hierarchy since "Skylake-X," which combined large 1 MB L2 caches with relatively smaller shared L3 caches.

View at TechPowerUp Main Site

Deleted member 178884 · Oct 23, 2018

Moving cache back up? Interesting since the HEDT got hit harder, the cache total on the 6950x was more than the 7980xe.

dont whant to set it"' · Oct 23, 2018

Good, or is it ? if its a new CPU architecture I cant argue on no basis whatsoever, more cache is "more better" is it as fast? what are the benefits ? can the chain of micro-ops handle it or is the scheduler up to the task, but, yet again I is no expert.

agent_x007 · Oct 23, 2018

In general, having bigger cache is good. However lalency is also important.
Programs that can fit in smaller cache should execute faster on older tech, if latency on bigger cache is higher.

Also, last L1 bump on "consumer grade" platform was with Conroe (from Netburst) and we have 256kB L2 since Nehalem (first gen Core I series).
Intel never released a large L3 caches per core on LGA11xx platforms (always 2MB/core max.).

efikkan · Oct 23, 2018

We still don't know the details of Ice Lake, but the new details look like this in comparison with existing architectures:

(based on info around the web, may not be 100% accurate)

One thing I consider interesting is that Intel seem to prioritize L1 data cache while AMD prioritizes L1 instruction cache.

Xx Tek Tip xX said:
Moving cache back up? Interesting since the HEDT got hit harder, the cache total on the 6950x was more than the 7980xe.

What?
The L3 cache on Skylake-X works differently. Prior generations had an inclusive L3 cache, meaning L2 will be duplicated in L3, so effectively the L3 cache size of older generations is 1.75 MB. Skylake-X also quadrupled the L2 cache, leading to an effective increase in cache per core, but more importantly, a more efficient cache.

dont whant to set it"' said:
Good, or is it ? if its a new CPU architecture I cant argue on no basis whatsoever, more cache is "more better" is it as fast? what are the benefits ? can the chain of micro-ops handle it or is the scheduler up to the task, but, yet again I is no expert.

Cache have always been more complex than just "more is better".
I believe even the old 80486 supported something like 512 kB of off-chip L2 cache.
For cache it comes down to latency, throughput and die space. Fewer banks may give higher cache efficiency, but lower bandwidth and higher complexity. More banks is simpler, gives higher bandwidth, but sacrifices cache efficiency. Latency is even tougher, it depends on the implementation.

srsbsns · Oct 23, 2018

Will need to wait for an actual product. Right now Intel 10nm is vaporware/rumormill at best.

birdie · Oct 23, 2018

Here's a comparison with a similarly clocked Kaby Lake CPU (SkyLake uArch):

https://browser.geekbench.com/v4/cpu/compare/9473563?baseline=10445533

I can't say Ice Lake is impressive - there are some gains but overall it's a minimal advantage.

Midland Dog · Oct 24, 2018

birdie said:
Here's a comparison with a similarly clocked Kaby Lake CPU (SkyLake uArch):

https://browser.geekbench.com/v4/cpu/compare/9473563?baseline=10445533

I can't say Ice Lake is impressive - there are some gains but overall it's a minimal advantage.

1st gen 10nm wont beat 14nm ++++ they should be chasing ipc not clocks with 14nm since they have an extremely mature node

darkangel0504 · Oct 24, 2018

IceLake is running at 16 GB Dual Channel and on Linux. Other i3-7130U laptops maybe were running in Single Channel and on Windows

hat · Oct 24, 2018

Interesting. I thought cache sizes got smaller as we moved away from FSB on to QPI/DMI because QPI/DMI was so much faster than FSB, so large caches weren't needed...

First Strike · Oct 24, 2018

hat said:
Interesting. I thought cache sizes got smaller as we moved away from FSB on to QPI/DMI because QPI/DMI was so much faster than FSB, so large caches weren't needed...

Nope... Cache is still a must. No matter how fast the memory/IO may be, internal CPU pipelines will always be way faster.

efikkan said:
Cache on Skylake-X works differently. Prior generations had an inclusive L3 cache, meaning L2 will be duplicated in L3, so effectively the L3 cache size of older generations is 1.75 MB. Skylake-X also quadrupled the L2 cache, leading to an effective increase in cache per core, but more importantly, a more efficient cache.

I agree with your opinion. Only that SKL-SP's victim cache is not necessarily more efficient. The efficiency of a victim cache and an inclusive cache depends on the workload.
And by the way, Ryzen uses victim cache too, similar to SKL-SP.

Xajel · Oct 24, 2018

hat said:
Interesting. I thought cache sizes got smaller as we moved away from FSB on to QPI/DMI because QPI/DMI was so much faster than FSB, so large caches weren't needed...

The main idea of the memory subsystem is to scale directly to how close the data is being processed handled, the closer they're the closer they get to the processor while being handled faster and with much lower latency. The L1 cache handles the actual data being processed at that exact time, that's why it consist of L1 Data and L1 Instruction, as the processor will use the L1i to process the data currently on the L1d. L2 cache contains data to be processed next or the rest of the data that the L1d cant handle. And you guessed it, the L3 contains data of the next level. And then the RAM, and finally the rest is on the HDD.

When ever a higher priority cache/memory is not enough, the system will use the next available, so if L1d is not enough, the next step is L2, when that is full then L3 comes (if available), and when there's L4 cache it will be the next level also, if not RAM will be used and so on.

Do you remember why the system becomes very slow when you have heavy applications and low RAM ? so upgrading RAM sped up your system noticeably then ? or when you finally upgraded to SSD and saw a huge jump in responsiveness and speed ? This what happens if the higher level cache/memory becomes too low and the system is forced to go for the next "slower" one.

When Intel first released Celeron, they experimented with L2 cache less one to make it cost less, it did cost less to make. But it performed horribly. They quickly scrapped that and the next update came with L2.

So why not having more and more of cache ? there's several things to consider:-
1- Cache are expensive: They require a lot of die area and consume power.
2- More cache brings latency: The larger the cache is the more time it takes to actually look for the data you need, and latency is crucial here.
3- Performance gain with more cache is not linear.
4- Architecture favouring: Duo to the second and third points, and how the architecture actually handles the data and cache hierarchy is working, there will be an optimal cache size for each level that brings the most performance at best power/cost. Adding more might rise the power/cost too much for little performance boost or might actually bring performance down a little for some latency critical applications.

efikkan · Oct 24, 2018

Xajel said:
The main idea of the memory subsystem is to scale directly to how close the data is being processed handled, the closer they're the closer they get to the processor while being handled faster and with much lower latency. The L1 cache handles the actual data being processed at that exact time, that's why it consist of L1 Data and L1 Instruction, as the processor will use the L1i to process the data currently on the L1d. L2 cache contains data to be processed next or the rest of the data that the L1d cant handle. And you guessed it, the L3 contains data of the next level. And then the RAM, and finally the rest is on the HDD.

The purpose of the cache is to hide latency.
To be precise, L1 is still a cache, the actual data being processed are in registers.

Even some introductory books in CS describe the cache hierarchy incorrectly. L1…L3 is just a streaming buffer, it contains code and data that is likely to be used or have been recently used. Many mistakenly think that the most important stuff is stored in L1, then L2 and so on. These caches are overwritten thousands of times per second, no data ever stays there for long. And it's not like your running program can fit in there, or your most important variables in code.

Modern CPUs do aggressive prefetching, which means it preloads data you might need. Each bank in the cache is a usually a Least Recently Used(LRU) queue, which means that any time one cache line is written, the oldest one is discarded. So caching things that are not needed may actually replace useful data. Depending on workload, the cache may at times be mostly wasted, but it's of course still better than no cache.

Xajel said:
Do you remember why the system becomes very slow when you have heavy applications and low RAM ? so upgrading RAM sped up your system noticeably then ? or when you finally upgraded to SSD and saw a huge jump in responsiveness and speed ? This what happens if the higher level cache/memory becomes too low and the system is forced to go for the next "slower" one.

SSDs does wonders for file operations, but only affects responsiveness when the OS is swapping heavily, and by that point the system is too sluggish anyway. There is a lot of placebo tied to the benefits of SSDs. Don't get me wrong, SSDs are good, but they don't make code faster.

System Name	RBMK-1000
Processor	AMD Ryzen 7 5700G
Motherboard	ASUS ROG Strix B450-E Gaming
Cooling	DeepCool Gammax L240 V2
Memory	2x 8GB G.Skill Sniper X
Video Card(s)	Palit GeForce RTX 2080 SUPER GameRock
Storage	Western Digital Black NVMe 512GB
Display(s)	BenQ 1440p 60 Hz 27-inch
Case	Corsair Carbide 100R
Audio Device(s)	ASUS SupremeFX S1220A
Power Supply	Cooler Master MWE Gold 650W
Mouse	ASUS ROG Strix Impact
Keyboard	Gamdias Hermes E2
Software	Windows 11 Pro

System Name	3 "rigs"-gaming/spare pc/cruncher
Processor	R7-5800X3D/i7-7700K/R9-7950X
Motherboard	Asus ROG Crosshair VI Extreme/Asus Ranger Z170/Asus ROG Crosshair X670E-GENE
Cooling	Bitspower monoblock ,custom open loop,both passive and active/air tower cooler/air tower cooler
Memory	32GB DDR4/32GB DDR4/64GB DDR5
Video Card(s)	Gigabyte RX6900XT Alphacooled/AMD RX5700XT 50th Aniv./SOC(onboard)
Storage	mix of sata ssds/m.2 ssds/mix of sata ssds+an m.2 ssd
Display(s)	Dell UltraSharp U2410 , HP 24x
Case	mb box/Silverstone Raven RV-05/CoolerMaster Q300L
Audio Device(s)	onboard/onboard/onboard
Power Supply	3 Seasonics, a DeltaElectronics, a FractalDesing
Mouse	various/various/various
Keyboard	various wired and wireless
VR HMD	-
Software	W10.someting or another,all 3

System Name	BOX
Processor	Core i7 6950X @ 4,26GHz (1,28V)
Motherboard	X99 SOC Champion (BIOS F23c + bifurcation mod)
Cooling	Thermalright Venomous-X + 2x Delta 38mm PWM (Push-Pull)
Memory	Patriot Viper Steel 4000MHz CL16 4x8GB (@3240MHz CL12.12.12.24 CR2T @ 1,48V)
Video Card(s)	Titan V (~1650MHz @ 0.77V, HBM2 1GHz, Forced P2 state [OFF])
Storage	WD SN850X 2TB + Samsung EVO 2TB (SATA) + Seagate Exos X20 20TB (4Kn mode)
Display(s)	LG 27GP950-B
Case	Fractal Design Meshify 2 XL
Audio Device(s)	Motu M4 (audio interface) + ATH-A900Z + Behringer C-1
Power Supply	Seasonic X-760 (760W)
Mouse	Logitech RX-250
Keyboard	HP KB-9970
Software	Windows 10 Pro x64

Processor	AMD Ryzen 9 5900X \|\|\| Intel Core i7-3930K
Motherboard	ASUS ProArt B550-CREATOR \|\|\| Asus P9X79 WS
Cooling	Noctua NH-U14S \|\|\| Be Quiet Pure Rock
Memory	Crucial 2 x 16 GB 3200 MHz \|\|\| Corsair 8 x 8 GB 1333 MHz
Video Card(s)	MSI GTX 1060 3GB \|\|\| MSI GTX 680 4GB
Storage	Samsung 970 PRO 512 GB + 1 TB \|\|\| Intel 545s 512 GB + 256 GB
Display(s)	Asus ROG Swift PG278QR 27" \|\|\| Eizo EV2416W 24"
Case	Fractal Design Define 7 XL x 2
Audio Device(s)	Cambridge Audio DacMagic Plus
Power Supply	Seasonic Focus PX-850 x 2
Mouse	Razer Abyssus
Keyboard	CM Storm QuickFire XT
Software	Ubuntu

System Name	Sillicon Nightmares
Processor	Intel i7 9700KF 5ghz (5.1ghz 4 core load, no avx offset), 4.7ghz ring, 1.412vcore 1.3vcio 1.264vcsa
Motherboard	Asus Z390 Strix F
Cooling	DEEPCOOL Gamer Storm CAPTAIN 360
Memory	2x8GB G.Skill Trident Z RGB (B-Die) 3600 14-14-14-28 1t, tRFC 220 tREFI 65535, tFAW 16, 1.545vddq
Video Card(s)	ASUS GTX 1060 Strix 6GB XOC, Core: 2202-2240, Vcore: 1.075v, Mem: 9818mhz (Sillicon Lottery Jackpot)
Storage	Samsung 840 EVO 1TB SSD, WD Blue 1TB, Seagate 3TB, Samsung 970 Evo Plus 512GB
Display(s)	BenQ XL2430 1080p 144HZ + (2) Samsung SyncMaster 913v 1280x1024 75HZ + A Shitty TV For Movies
Case	Deepcool Genome ROG Edition
Audio Device(s)	Bunta Sniff Speakers From The Tip Edition With Extra Kenwoods
Power Supply	Corsair AX860i/Cable Mod Cables
Mouse	Logitech G602 Spilled Beer Edition
Keyboard	Dell KB4021
Software	Windows 10 x64
Benchmark Scores	13543 Firestrike (3dmark.com/fs/22336777) 601 points CPU-Z ST 37.4ns AIDA Memory

Intel Increases L1D and L2 Cache Sizes with "Ice Lake"

btarunr

Editor & Senior Moderator

Deleted member 178884

Guest

dont whant to set it"'

agent_x007

efikkan

srsbsns

birdie

Midland Dog

darkangel0504

hat

Enthusiast

First Strike

Xajel

efikkan

System Name	Windows 10 Pro 64 bit
Processor	Ryzen 5 5600 @4.65 GHz
Motherboard	Asus ROG X570-E
Cooling	Thermalright
Memory	32 GB 3200 MHz
Video Card(s)	Asus RX 6700XT 12 GB Dual
Storage	1TB Samsung 970 EVO Plus
Display(s)	SS QHD 144Hz + LG 55 Inch 4K
Case	Corsair 4000D
Power Supply	Superflower 850

System Name	Starlifter :: Dragonfly
Processor	i7 2600k 4.4GHz :: i5 10400
Motherboard	ASUS P8P67 Pro :: ASUS Prime H570-Plus
Cooling	Cryorig M9 :: Stock
Memory	4x4GB DDR3 2133 :: 2x8GB DDR4 2400
Video Card(s)	PNY GTX1070 :: Integrated UHD 630
Storage	Crucial MX500 1TB, 2x1TB Seagate RAID 0 :: Mushkin Enhanced 60GB SSD, 3x4TB Seagate HDD RAID5
Display(s)	Onn 165hz 1080p :: Acer 1080p
Case	Antec SOHO 1030B :: Old White Full Tower
Audio Device(s)	Creative X-Fi Titanium Fatal1ty Pro - Bose Companion 2 Series III :: None
Power Supply	FSP Hydro GE 550w :: EVGA Supernova 550
Software	Windows 10 Pro - Plex Server on Dragonfly
Benchmark Scores	>9000

System Name	Xajel Main
Processor	AMD Ryzen 7 5800X
Motherboard	ASRock X570M Steel Legened
Cooling	Corsair H100i PRO
Memory	G.Skill DDR4 3600 32GB (2x16GB)
Video Card(s)	ZOTAC GAMING GeForce RTX 3080 Ti AMP Holo
Storage	(OS) Gigabyte AORUS NVMe Gen4 1TB + (Personal) WD Black SN850X 2TB + (Store) WD 8TB HDD
Display(s)	LG 38WN95C Ultrawide 3840x1600 144Hz
Case	Cooler Master CM690 III
Audio Device(s)	Built-in Audio + Yamaha SR-C20 Soundbar
Power Supply	Thermaltake 750W
Mouse	Logitech MK710 Combo
Keyboard	Logitech MK710 Combo (M705)
Software	Windows 11 Pro