• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

I have a question about caches in CPU cores.

Joined
Dec 27, 2013
Messages
887 (0.22/day)
Location
somewhere
Sorry I made another thread. I don't mean to spam.:(

Btw I like discussion here first and foremost and interacting with people here makes me happy. :3

A question about the L1,2 and 3 caches in modern a CPU core. Ok so im interested in Skylake and Zen for now.

it's a simply question here it is:

Are the caches inside the core, L1D and L1I, tied to the Core's clock domain. I.e do the SRAMs run at the same speed as the core. What I mean by this, does increasing CPU clock rate also increase cache bandwidth?

I ask the same for the L2 cache, also. IDK about the L3, I did hear somewhere that Zen L3 cache is tied to core speed somewhere. Actually it would be easy to find out I guess with AIDA64 Extreme benchmark for Memory and Caches.

But some information would be helpful. It just occurs to me because of the OC Coffee Lake results having MUCH higher L1 and L2 bandwidth than Zen, is mainly because of the Clock rate advantage, right?

thanks

also please tell me if i am making too many threads. I don't mean to do it negatively. Actually i wont post any more today :x
 
Joined
Jun 10, 2014
Messages
2,995 (0.78/day)
Processor AMD Ryzen 9 5900X ||| Intel Core i7-3930K
Motherboard ASUS ProArt B550-CREATOR ||| Asus P9X79 WS
Cooling Noctua NH-U14S ||| Be Quiet Pure Rock
Memory Crucial 2 x 16 GB 3200 MHz ||| Corsair 8 x 8 GB 1333 MHz
Video Card(s) MSI GTX 1060 3GB ||| MSI GTX 680 4GB
Storage Samsung 970 PRO 512 GB + 1 TB ||| Intel 545s 512 GB + 256 GB
Display(s) Asus ROG Swift PG278QR 27" ||| Eizo EV2416W 24"
Case Fractal Design Define 7 XL x 2
Audio Device(s) Cambridge Audio DacMagic Plus
Power Supply Seasonic Focus PX-850 x 2
Mouse Razer Abyssus
Keyboard CM Storm QuickFire XT
Software Ubuntu
Are the caches inside the core, L1D and L1I, tied to the Core's clock domain. I.e do the SRAMs run at the same speed as the core. What I mean by this, does increasing CPU clock rate also increase cache bandwidth?
Yes, bandwidth is in bytes per clock. And yes, it can be increased by increasing the clock speed until you run into timing issues. But bandwidth and timing is highly dependent on the architecture.

Edit:
But some information would be helpful. It just occurs to me because of the OC Coffee Lake results having MUCH higher L1 and L2 bandwidth than Zen, is mainly because of the Clock rate advantage, right?
Actually not, cache efficiency is mainly due to three components; cache structure, latency and prefetcher.

Cache doesn't work the way most people think it does. The cache is a "streaming buffer", it's overwritten within a few microseconds. Memory is divided into what we call cache lines, which is 64 bytes on current x86 architectures. This means that whenever the CPU reads one byte from a single cache line, the entire cache line is cached. So once a 64 byte cache line is cached, anything else within those 64 bytes is also cached. If data is more spread, it makes the cache less efficient and so on, but this depends on the program.

Cache is divided into banks. E.g. if a cache is 256 kB 8-way, it means it's actually not one cache, it's 8 separate 32kB caches. A specific memory address will always be stored in a specific bank; the first 0-63 bytes in bank 0, 64-127 into bank 1, 128-191 bank 2, etc. looping over and over. This also means that cache banks might not be evenly used, depending on the alignment of data in memory. More cache banks reduces storage efficiency(hitrate) and may increase worst case latency, but may improve total bandwidth and be more simple to implement into the design.

Let's look at Skylake vs. Zen:
Skylake:
L1I: 32kB 8-way
L1D: 32kB 8-way (64-bytes per cycle bidirectional?)
L2: 256KB 4-way (64-bytes per cycle bidirectional?)

Zen:
L1I: 64kB 4-way (32-bytes per cycle?)
L1D: 32kB 8-way
L2: 512kB 8-way (32-bytes per cycle?)
This still doesn't tell everything, like how many clock cycles of latency for each ones, etc.

There is also the last thing; the prefetcher, which controles how the cache is used, but that's a subject of its own.
 
Last edited:
Joined
Apr 21, 2010
Messages
5,731 (1.07/day)
Location
West Midlands. UK.
System Name Ryzen Reynolds
Processor Ryzen 1600 - 4.0Ghz 1.415v - SMT disabled
Motherboard mATX Asrock AB350m AM4
Cooling Raijintek Leto Pro
Memory Vulcan T-Force 16GB DDR4 3000 16.18.18 @3200Mhz 14.17.17
Video Card(s) Sapphire Nitro+ 4GB RX 580 - 1450/2000 BIOS mod 8-)
Storage Seagate B'cuda 1TB/Sandisk 128GB SSD
Display(s) Acer ED242QR 75hz Freesync
Case Corsair Carbide Series SPEC-01
Audio Device(s) Onboard
Power Supply Corsair VS 550w
Mouse Zalman ZM-M401R
Keyboard Razor Lycosa
Software Windows 10 x64
Benchmark Scores https://www.3dmark.com/spy/6220813
Sorry I made another thread. I don't mean to spam.:(
also please tell me if i am making too many threads. I don't mean to do it negatively. Actually i wont post any more today :x
Hahaha you should just create an @ArbitraryAffection curiosity and general questions thread :p

But yes the caches speeds are linked to FSB and multiplier of the CPU you can see this quite simply by running stock settings and running aida 64 cache and memory benchmark and then overclocking the CPU
 

Aquinus

Resident Wat-man
Joined
Jan 28, 2012
Messages
13,171 (2.80/day)
Location
Concord, NH, USA
System Name Apollo
Processor Intel Core i9 9880H
Motherboard Some proprietary Apple thing.
Memory 64GB DDR4-2667
Video Card(s) AMD Radeon Pro 5600M, 8GB HBM2
Storage 1TB Apple NVMe, 4TB External
Display(s) Laptop @ 3072x1920 + 2x LG 5k Ultrafine TB3 displays
Case MacBook Pro (16", 2019)
Audio Device(s) AirPods Pro, Sennheiser HD 380s w/ FIIO Alpen 2, or Logitech 2.1 Speakers
Power Supply 96w Power Adapter
Mouse Logitech MX Master 3
Keyboard Logitech G915, GL Clicky
Software MacOS 12.1
But yes the caches speeds are linked to FSB and multiplier of the CPU you can see this quite simply by running stock settings and running aida 64 cache and memory benchmark and then overclocking the CPU
This isn't true for all CPUs. Sometimes the memory controller (which would include L3 cache,) has its own multiplier. I know that the Phenom II CPUs I had worked this way. X58 chips could individually alter the multiplier for the uncore as well. It's more accurate to say that the L1 and likely L2 are running at the core frequency. L3 really varies from CPU to CPU.
 

newtekie1

Semi-Retired Folder
Joined
Nov 22, 2005
Messages
28,473 (4.09/day)
Location
Indiana, USA
Processor Intel Core i7 10850K@5.2GHz
Motherboard AsRock Z470 Taichi
Cooling Corsair H115i Pro w/ Noctua NF-A14 Fans
Memory 32GB DDR4-3600
Video Card(s) RTX 2070 Super
Storage 500GB SX8200 Pro + 8TB with 1TB SSD Cache
Display(s) Acer Nitro VG280K 4K 28"
Case Fractal Design Define S
Audio Device(s) Onboard is good enough for me
Power Supply eVGA SuperNOVA 1000w G3
Software Windows 10 Pro x64
L3 really varies from CPU to CPU.

Yep, I know with the latest Intel CPUs, the L3 has its own multiplier and hence its own clock speed separate from the CPU.
 
Joined
Apr 21, 2010
Messages
5,731 (1.07/day)
Location
West Midlands. UK.
System Name Ryzen Reynolds
Processor Ryzen 1600 - 4.0Ghz 1.415v - SMT disabled
Motherboard mATX Asrock AB350m AM4
Cooling Raijintek Leto Pro
Memory Vulcan T-Force 16GB DDR4 3000 16.18.18 @3200Mhz 14.17.17
Video Card(s) Sapphire Nitro+ 4GB RX 580 - 1450/2000 BIOS mod 8-)
Storage Seagate B'cuda 1TB/Sandisk 128GB SSD
Display(s) Acer ED242QR 75hz Freesync
Case Corsair Carbide Series SPEC-01
Audio Device(s) Onboard
Power Supply Corsair VS 550w
Mouse Zalman ZM-M401R
Keyboard Razor Lycosa
Software Windows 10 x64
Benchmark Scores https://www.3dmark.com/spy/6220813
This isn't true for all CPUs. Sometimes the memory controller (which would include L3 cache,) has its own multiplier. I know that the Phenom II CPUs I had worked this way. X58 chips could individually alter the multiplier for the uncore as well. It's more accurate to say that the L1 and likely L2 are running at the core frequency. L3 really varies from CPU to CPU.
I was referring more to his setup being a Ryzen and my own experiences of cache speed and overclocking a Ryzen platform, but you are indeed correct and remember things like uncore from my time with i7 920's and before that AMD's HTT bus speed etc which would all effect things like cache speed, latencies, other buses etc , it all seems a little over simplified these days though I probably wouldnt have a clue on either of those chipsets these days it's been so long lol.
 
Joined
Dec 27, 2013
Messages
887 (0.22/day)
Location
somewhere
Thanks for replies^^ okay so somewhere i also heard that L3 cache speed on Ryzen is potentially what is holding back the overal clock speed potential of the CPU core as it is tied to that.(but i didnt know it was valid claim as i was under the impression cache was somehow clocked separately).. So maybe the L3 cache timing issues are the reason why Zen doesnt clock as high. I am sure GloFo 14nm/12nm is capable of higher frequencies than ~4.2Ghz right?
 

Aquinus

Resident Wat-man
Joined
Jan 28, 2012
Messages
13,171 (2.80/day)
Location
Concord, NH, USA
System Name Apollo
Processor Intel Core i9 9880H
Motherboard Some proprietary Apple thing.
Memory 64GB DDR4-2667
Video Card(s) AMD Radeon Pro 5600M, 8GB HBM2
Storage 1TB Apple NVMe, 4TB External
Display(s) Laptop @ 3072x1920 + 2x LG 5k Ultrafine TB3 displays
Case MacBook Pro (16", 2019)
Audio Device(s) AirPods Pro, Sennheiser HD 380s w/ FIIO Alpen 2, or Logitech 2.1 Speakers
Power Supply 96w Power Adapter
Mouse Logitech MX Master 3
Keyboard Logitech G915, GL Clicky
Software MacOS 12.1
I am sure GloFo 14nm/12nm is capable of higher frequencies than ~4.2Ghz right?
Nah. AMD actually has excellent quality control on these dies considering 4.2Ghz seems to be the magic number for just about every die that has been based on Xen from everything I've read. L3 tends to be an issue because it shares the same clock domain as the memory controller which is dictated by the speed of your DRAM. So faster DRAM makes the L3 run faster which translates into better performance, regardless of core clock speed however higher clocks might not reach their full potential if the IMC is the bottleneck.
 
Joined
Dec 27, 2013
Messages
887 (0.22/day)
Location
somewhere
Nah. AMD actually has excellent quality control on these dies considering 4.2Ghz seems to be the magic number for just about every die that has been based on Xen from everything I've read. L3 tends to be an issue because it shares the same clock domain as the memory controller which is dictated by the speed of your DRAM. So faster DRAM makes the L3 run faster which translates into better performance, regardless of core clock speed however higher clocks might not reach their full potential if the IMC is the bottleneck.
Oh so L3 runs at same speed as Infinity fabric bus and dram? so it is 1.6 Ghz with my Ram. Hm. Slightly off topic but do you think Zen2 will put IF and L3 in a seperate clock domain than DRAM. I heard this can increase latency jumping data between different clock domains, but maybe with potentially much, much higher IF and L3 Speeds it could be beneficial? I mean in BIOS would be nice to be able to set Infinity Fabric clock speed independent of dram:)

Actually not, cache efficiency is mainly due to three components; cache structure, latency and prefetcher.

Cache doesn't work the way most people think it does. The cache is a "streaming buffer", it's overwritten within a few microseconds. Memory is divided into what we call cache lines, which is 64 bytes on current x86 architectures. This means that whenever the CPU reads one byte from a single cache line, the entire cache line is cached. So once a 64 byte cache line is cached, anything else within those 64 bytes is also cached. If data is more spread, it makes the cache less efficient and so on, but this depends on the program.

Cache is divided into banks. E.g. if a cache is 256 kB 8-way, it means it's actually not one cache, it's 8 separate 32kB caches. A specific memory address will always be stored in a specific bank; the first 0-63 bytes in bank 0, 64-127 into bank 1, 128-191 bank 2, etc. looping over and over. This also means that cache banks might not be evenly used, depending on the alignment of data in memory. More cache banks reduces storage efficiency(hitrate) and may increase worst case latency, but may improve total bandwidth and be more simple to implement into the design.

Let's look at Skylake vs. Zen:
Skylake:
L1I: 32kB 8-way
L1D: 32kB 8-way (64-bytes per cycle bidirectional?)
L2: 256KB 4-way (64-bytes per cycle bidirectional?)

Zen:
L1I: 64kB 4-way (32-bytes per cycle?)
L1D: 32kB 8-way
L2: 512kB 8-way (32-bytes per cycle?)
This still doesn't tell everything, like how many clock cycles of latency for each ones, etc.

There is also the last thing; the prefetcher, which controles how the cache is used, but that's a subject of its own.
Oh wow thanks for this explanation. I'm not sure i fully understand how it all works but this is really informative thanks. So is it safe to say Skylake has a better cache system than Zen? If so does this also explain why Skylake is faster in games, i mean are they really sensitive to cache performance?

I also heard about Victim cache vs Inclusive. So essentially Zen L3 is like a huge overflow for its L2 right? Whereas a program can load stuff directly into the L3 , bypassing L2 with Skylake right? Also SKL-X is like Zen in this regard i think. Do you have any idea if this could also impact gaming performance? I would like to know the advantages and disadvantages of Victim/Inclusive caches though
 
Last edited:
Joined
Apr 21, 2010
Messages
5,731 (1.07/day)
Location
West Midlands. UK.
System Name Ryzen Reynolds
Processor Ryzen 1600 - 4.0Ghz 1.415v - SMT disabled
Motherboard mATX Asrock AB350m AM4
Cooling Raijintek Leto Pro
Memory Vulcan T-Force 16GB DDR4 3000 16.18.18 @3200Mhz 14.17.17
Video Card(s) Sapphire Nitro+ 4GB RX 580 - 1450/2000 BIOS mod 8-)
Storage Seagate B'cuda 1TB/Sandisk 128GB SSD
Display(s) Acer ED242QR 75hz Freesync
Case Corsair Carbide Series SPEC-01
Audio Device(s) Onboard
Power Supply Corsair VS 550w
Mouse Zalman ZM-M401R
Keyboard Razor Lycosa
Software Windows 10 x64
Benchmark Scores https://www.3dmark.com/spy/6220813
Nah. AMD actually has excellent quality control on these dies considering 4.2Ghz seems to be the magic number for just about every die that has been based on Xen from everything I've read. L3 tends to be an issue because it shares the same clock domain as the memory controller which is dictated by the speed of your DRAM. So faster DRAM makes the L3 run faster which translates into better performance, regardless of core clock speed however higher clocks might not reach their full potential if the IMC is the bottleneck.
That's not to say that CPU clock speed alone doesn't affect L3 cache on a Ryzen, case in point my aida64 benches, same RAM speed, timings etc, only difference is stock CPU clock (3.7ghz single core boost) compared to 3.9ghz all core boost, as you can see there's a significant different in all cache speeds from default CPU speeds and overclocked even with the same RAM timings.

 
Joined
Dec 27, 2013
Messages
887 (0.22/day)
Location
somewhere
That's not to say that CPU clock speed alone doesn't affect L3 cache on a Ryzen, case in point my aida64 benches, same RAM speed, timings etc, only difference is stock CPU clock (3.7ghz single core boost) compared to 3.9ghz all core boost, as you can see there's a significant different in all cache speeds from default CPU speeds and overclocked even with the same RAM timings.

This says Zen L3 cache is part of core clock domain so yeah also that kinda proves it I think.
 
Joined
Jun 10, 2014
Messages
2,995 (0.78/day)
Processor AMD Ryzen 9 5900X ||| Intel Core i7-3930K
Motherboard ASUS ProArt B550-CREATOR ||| Asus P9X79 WS
Cooling Noctua NH-U14S ||| Be Quiet Pure Rock
Memory Crucial 2 x 16 GB 3200 MHz ||| Corsair 8 x 8 GB 1333 MHz
Video Card(s) MSI GTX 1060 3GB ||| MSI GTX 680 4GB
Storage Samsung 970 PRO 512 GB + 1 TB ||| Intel 545s 512 GB + 256 GB
Display(s) Asus ROG Swift PG278QR 27" ||| Eizo EV2416W 24"
Case Fractal Design Define 7 XL x 2
Audio Device(s) Cambridge Audio DacMagic Plus
Power Supply Seasonic Focus PX-850 x 2
Mouse Razer Abyssus
Keyboard CM Storm QuickFire XT
Software Ubuntu
So is it safe to say Skylake has a better cache system than Zen? If so does this also explain why Skylake is faster in games, i mean are they really sensitive to cache performance?
To my understanding, cache is a part of the reason, and I do believe Skylake have lower L1 latency and higher bandwidth, but a larger factor is a more efficient prefetcher.

I also heard about Victim cache vs Inclusive. So essentially Zen L3 is like a huge overflow for its L2 right? Whereas a program can load stuff directly into the L3 , bypassing L2 with Skylake right? Also SKL-X is like Zen in this regard i think. Do you have any idea if this could also impact gaming performance? I would like to know the advantages and disadvantages of Victim/Inclusive caches though
An inclusive L3 stores a copy of the L2, which is more wasteful, but helps if another core needs it, which is rare, as I said the caches are overwritten very quickly.
A victim cache means data is not stored directly (prefetched) into L3, but only stored there when it's discarded from L2. While there does exist some machine code for prefetching, this is generally not controlled by the program, and the program is definitely not aware of where things are stored in various caches. From the program's perspective everything is stored in RAM.

I think the victim cache is not a disadvantage for gaming, there might be edge cases of course, but Skylake-X has performed well with this solution. The main advantage is of course storage efficiency; which means it can be used for something else, or effectively just a larger L3.
 
Joined
Dec 27, 2013
Messages
887 (0.22/day)
Location
somewhere
To my understanding, cache is a part of the reason, and I do believe Skylake have lower L1 latency and higher bandwidth, but a larger factor is a more efficient prefetcher.


An inclusive L3 stores a copy of the L2, which is more wasteful, but helps if another core needs it, which is rare, as I said the caches are overwritten very quickly.
A victim cache means data is not stored directly (prefetched) into L3, but only stored there when it's discarded from L2. While there does exist some machine code for prefetching, this is generally not controlled by the program, and the program is definitely not aware of where things are stored in various caches. From the program's perspective everything is stored in RAM.

I think the victim cache is not a disadvantage for gaming, there might be edge cases of course, but Skylake-X has performed well with this solution. The main advantage is of course storage efficiency; which means it can be used for something else, or effectively just a larger L3.
Thanks for explanation!!

One last question can I ask please. Some skylake X CPUs perform much worse in gaming than others. I talk about 7800X and to a lesser extent the 7820X. In many games the 7800X is worse than even a 2600X. Is this do you think, because of the mesh connection between the cores, or the way it is cut down. Btw also 7920X suffers performance issues too I heard. That's 12 core iirc, the most cut down of the HCC die. Thanks so much for taking the time to explain to me.:love:
 
Joined
Mar 23, 2016
Messages
4,844 (1.52/day)
Processor Core i7-13700
Motherboard MSI Z790 Gaming Plus WiFi
Cooling Cooler Master RGB something
Memory Corsair DDR5-6000 small OC to 6200
Video Card(s) XFX Speedster SWFT309 AMD Radeon RX 6700 XT CORE Gaming
Storage 970 EVO NVMe M.2 500GB,,WD850N 2TB
Display(s) Samsung 28” 4K monitor
Case Phantek Eclipse P400S
Audio Device(s) EVGA NU Audio
Power Supply EVGA 850 BQ
Mouse Logitech G502 Hero
Keyboard Logitech G G413 Silver
Software Windows 11 Professional v23H2
I am sure GloFo 14nm/12nm is capable of higher frequencies than ~4.2Ghz right?
The fabrication process tech. is a low power process hence why the clock speeds are lower and doesn't scale up.
 
Joined
Jun 3, 2010
Messages
2,540 (0.48/day)
Screenshot_20170718-015517.png
Screenshot_20170718-015539.png


LanOC.org has more recent results as well, still these will do.
One thing of note is L1-L2 channels work differently between Intel and AMD architectures. Both have latency improvement in L1 channels however it is more so in Intel, especially read amplification. AMD's L2 is almost equal in thoroughput - the only benefit is latency, not read access time.
 
Last edited:

hat

Enthusiast
Joined
Nov 20, 2006
Messages
21,747 (3.29/day)
Location
Ohio
System Name Starlifter :: Dragonfly
Processor i7 2600k 4.4GHz :: i5 10400
Motherboard ASUS P8P67 Pro :: ASUS Prime H570-Plus
Cooling Cryorig M9 :: Stock
Memory 4x4GB DDR3 2133 :: 2x8GB DDR4 2400
Video Card(s) PNY GTX1070 :: Integrated UHD 630
Storage Crucial MX500 1TB, 2x1TB Seagate RAID 0 :: Mushkin Enhanced 60GB SSD, 3x4TB Seagate HDD RAID5
Display(s) Onn 165hz 1080p :: Acer 1080p
Case Antec SOHO 1030B :: Old White Full Tower
Audio Device(s) Creative X-Fi Titanium Fatal1ty Pro - Bose Companion 2 Series III :: None
Power Supply FSP Hydro GE 550w :: EVGA Supernova 550
Software Windows 10 Pro - Plex Server on Dragonfly
Benchmark Scores >9000
It's just marketing. Ever read one of our news posts about a new product? It's always something like "such and such manufacturer, an industry leader in whatever this article is about, today announced so and so new product...".

Even the node names themselves are marketing. Intel's 10nm (if it worked properly) is actually pretty close to TSMC's 7nm, despite being a whole "3nm" larger, as those names would lead you to think.

I've read over and over again that GloFo's process, which has been used for making Zen chips so far, was not designed for high performance parts. It's a low power node, typically for smartphone chips and such.
 
Joined
Jun 10, 2014
Messages
2,995 (0.78/day)
Processor AMD Ryzen 9 5900X ||| Intel Core i7-3930K
Motherboard ASUS ProArt B550-CREATOR ||| Asus P9X79 WS
Cooling Noctua NH-U14S ||| Be Quiet Pure Rock
Memory Crucial 2 x 16 GB 3200 MHz ||| Corsair 8 x 8 GB 1333 MHz
Video Card(s) MSI GTX 1060 3GB ||| MSI GTX 680 4GB
Storage Samsung 970 PRO 512 GB + 1 TB ||| Intel 545s 512 GB + 256 GB
Display(s) Asus ROG Swift PG278QR 27" ||| Eizo EV2416W 24"
Case Fractal Design Define 7 XL x 2
Audio Device(s) Cambridge Audio DacMagic Plus
Power Supply Seasonic Focus PX-850 x 2
Mouse Razer Abyssus
Keyboard CM Storm QuickFire XT
Software Ubuntu
Thanks for explanation!!

One last question can I ask please. Some skylake X CPUs perform much worse in gaming than others. I talk about 7800X and to a lesser extent the 7820X. In many games the 7800X is worse than even a 2600X. Is this do you think, because of the mesh connection between the cores, or the way it is cut down. Btw also 7920X suffers performance issues too I heard. That's 12 core iirc, the most cut down of the HCC die. Thanks so much for taking the time to explain to me.:love:
My pleasure.

As far as I've seen, i7-7800X(6-core) performs in line with what we should expect for games; ahead of Broadwell-E, Haswell and Zen, but slightly behind higher clocked Kaby and Coffee Lake. Perhaps what you've seen is some kind of edge case? Or were you talking about highly overclocked CPUs?
i7-7800X(6-core) is a bit odd compared to its bigger brothers, it has the lowest boost clocks in the family, and also lacks the more aggressive turbo boost 3.0.
 
Joined
Dec 27, 2013
Messages
887 (0.22/day)
Location
somewhere
My pleasure.

As far as I've seen, i7-7800X(6-core) performs in line with what we should expect for games; ahead of Broadwell-E, Haswell and Zen, but slightly behind higher clocked Kaby and Coffee Lake. Perhaps what you've seen is some kind of edge case? Or were you talking about highly overclocked CPUs?
i7-7800X(6-core) is a bit odd compared to its bigger brothers, it has the lowest boost clocks in the family, and also lacks the more aggressive turbo boost 3.0.
:love:

I did see some seriously bad results for 7800x but i cant dig them out right now as my mum wants me to do some housework:cry:. i iwll try and look up where i saw it when i am done later today. But i did find this

https://www.techpowerup.com/235267/...core-i7-7700k-better-than-i7-7800x-for-gaming

and it shows 7700K much faster, i think faster than the clock speed increase would allow honestly. 8700k seems to do much better. what is all core boost for 7800x? 8700k is 4,.3 afaik and 7700k is 4,.4. not sure of 7800x.
 
Joined
Nov 27, 2010
Messages
924 (0.18/day)
System Name future xeon II
Processor DUAL SOCKET xeon e5 2686 v3 , 36c/72t, hacked all cores @3.5ghz, TDP limit hacked
Motherboard asrock rack ep2c612 ws
Cooling case fans,liquid corsair h100iv2 x2
Memory 96 gb ddr4 2133mhz gskill+corsair
Video Card(s) 2x 1080 sc acx3 SLI, @STOCK
Storage Hp ex950 2tb nvme+ adata xpg sx8200 pro 1tb nvme+ sata ssd's+ spinners
Display(s) philips 40" bdm4065uc 4k @60
Case silverstone temjin tj07-b
Audio Device(s) sb Z
Power Supply corsair hx1200i
Mouse corsair m95 16 buttons
Keyboard microsoft internet keyboard pro
Software windows 10 x64 1903 ,enterprise
Benchmark Scores fire strike ultra- 10k time spy- 15k cpu z- 400/15000
BTW, most intel cpus dont depend on ram controller /speed for L3, as this cache is usually on-die, and runs at processor full speed. What is equally important as cache bandwidth, is it's size. The more cache you have, the more instructions and data can be kept close to the core and prefetched. But the architecture works on many more factors, like system agent, qpi, and the processor engineering itself. For instance, the differences between ryzen and intel in the main design. If we look at aida64 processor section screenshot, we can see if the cache is running at full processor speed , and in cpu z you see it's properties, like 8 way, 16 way etc.

03102019-111536.jpg
 
Joined
Dec 27, 2013
Messages
887 (0.22/day)
Location
somewhere
BTW, most intel cpus dont depend on ram controller /speed for L3, as this cache is usually on-die, and runs at processor full speed. What is equally important as cache bandwidth, is it's size. The more cache you have, the more instructions and data can be kept close to the core and prefetched. But the architecture works on many more factors, like system agent, qpi, and the processor engineering itself. For instance, the differences between ryzen and intel in the main design. If we look at aida64 processor section screenshot, we can see if the cache is running at full processor speed , and in cpu z you see it's properties, like 8 way, 16 way etc.

View attachment 118425
Zen has 512kb of L2 though. i wonder why Skylake client can get away with 50% of the l2 cache ? more efficient prefetcher ? Well actually looking at Skylake server with 1MB of l2 per core not sure it makes a huge difference for gaming
 

hat

Enthusiast
Joined
Nov 20, 2006
Messages
21,747 (3.29/day)
Location
Ohio
System Name Starlifter :: Dragonfly
Processor i7 2600k 4.4GHz :: i5 10400
Motherboard ASUS P8P67 Pro :: ASUS Prime H570-Plus
Cooling Cryorig M9 :: Stock
Memory 4x4GB DDR3 2133 :: 2x8GB DDR4 2400
Video Card(s) PNY GTX1070 :: Integrated UHD 630
Storage Crucial MX500 1TB, 2x1TB Seagate RAID 0 :: Mushkin Enhanced 60GB SSD, 3x4TB Seagate HDD RAID5
Display(s) Onn 165hz 1080p :: Acer 1080p
Case Antec SOHO 1030B :: Old White Full Tower
Audio Device(s) Creative X-Fi Titanium Fatal1ty Pro - Bose Companion 2 Series III :: None
Power Supply FSP Hydro GE 550w :: EVGA Supernova 550
Software Windows 10 Pro - Plex Server on Dragonfly
Benchmark Scores >9000
Cache is expensive, in every sense of the word. It's expensive to produce, sucks down power and kicks out a lot of heat. You don't want more cache than you need.

C2D had a lot of cache... the venerable e8400 had a whopping 6MB L2, and that was just a dual core. Then the OG i7 came along with 256k L2 (per core, resulting in 1MB total). I assume it was because it was on a much faster bus than its predecessors, and the inclusion of L3 cache.

Why Zen has more cache than Skylake, I'm not sure. Maybe there's more "stuff" in the Zen cores than Skylake cores, which warrants having more cache?
 
Joined
Dec 27, 2013
Messages
887 (0.22/day)
Location
somewhere
Cache is expensive, in every sense of the word. It's expensive to produce, sucks down power and kicks out a lot of heat. You don't want more cache than you need.

C2D had a lot of cache... the venerable e8400 had a whopping 6MB L2, and that was just a dual core. Then the OG i7 came along with 256k L2 (per core, resulting in 1MB total). I assume it was because it was on a much faster bus than its predecessors, and the inclusion of L3 cache.

Why Zen has more cache than Skylake, I'm not sure. Maybe there's more "stuff" in the Zen cores than Skylake cores, which warrants having more cache?
Come to think of it i think Zen is wider design than Skylake. For sure it has a wider FPU (not in total vector width , but in more, but narrower fpu's. Higher granularity). And does gain a bit more from SMT than Skylake in my testing and reading, the bigger L2 cache surely helps with keeping those 4 FP pipes fed. But IDK for sure.

edit: maybe SKL-X also needs the huge L2 cache of 1MB of AVX512 FPU
 
Joined
May 8, 2016
Messages
1,919 (0.61/day)
System Name BOX
Processor Core i7 6950X @ 4,26GHz (1,28V)
Motherboard X99 SOC Champion (BIOS F23c + bifurcation mod)
Cooling Thermalright Venomous-X + 2x Delta 38mm PWM (Push-Pull)
Memory Patriot Viper Steel 4000MHz CL16 4x8GB (@3240MHz CL12.12.12.24 CR2T @ 1,48V)
Video Card(s) Titan V (~1650MHz @ 0.77V, HBM2 1GHz, Forced P2 state [OFF])
Storage WD SN850X 2TB + Samsung EVO 2TB (SATA) + Seagate Exos X20 20TB (4Kn mode)
Display(s) LG 27GP950-B
Case Fractal Design Meshify 2 XL
Audio Device(s) Motu M4 (audio interface) + ATH-A900Z + Behringer C-1
Power Supply Seasonic X-760 (760W)
Mouse Logitech RX-250
Keyboard HP KB-9970
Software Windows 10 Pro x64
Regardless of what you do, there is basic exchange you do with every cache : Speed vs. Capacity.
What is better for which level is up to architecture.
In short :
Speed, both in terms of Bandwitdh and latency (clock cycles), may be more important than having more data stored on-die.
Think about it this way : Getting any data faster to execution units, can be more important than ammount they actually get.
That's why Intel stick to 32kB/256kB of L1/L2 for so long, it was probably best compromise between capacity vs. speed for their architectures.

PS. @er557 Haswell(-e) has L3 cache multiplier, last CPU architecture with Core Clock L3 cache (no multiplier), is Ivy Bridge(-E). Here's X99 UEFI screenshot :

AIDA64 tab you showcased only changes if there is a half-speed or quarter speed.
It doesn't detect if cache is actually linked to core speed.
See what you have under "NB Frequency" in "Memory" tab from CPU-z (it's usually UnCore/L3 Cache clock).
 
Last edited:

hat

Enthusiast
Joined
Nov 20, 2006
Messages
21,747 (3.29/day)
Location
Ohio
System Name Starlifter :: Dragonfly
Processor i7 2600k 4.4GHz :: i5 10400
Motherboard ASUS P8P67 Pro :: ASUS Prime H570-Plus
Cooling Cryorig M9 :: Stock
Memory 4x4GB DDR3 2133 :: 2x8GB DDR4 2400
Video Card(s) PNY GTX1070 :: Integrated UHD 630
Storage Crucial MX500 1TB, 2x1TB Seagate RAID 0 :: Mushkin Enhanced 60GB SSD, 3x4TB Seagate HDD RAID5
Display(s) Onn 165hz 1080p :: Acer 1080p
Case Antec SOHO 1030B :: Old White Full Tower
Audio Device(s) Creative X-Fi Titanium Fatal1ty Pro - Bose Companion 2 Series III :: None
Power Supply FSP Hydro GE 550w :: EVGA Supernova 550
Software Windows 10 Pro - Plex Server on Dragonfly
Benchmark Scores >9000
Come to think of it i think Zen is wider design than Skylake. For sure it has a wider FPU (not in total vector width , but in more, but narrower fpu's. Higher granularity). And does gain a bit more from SMT than Skylake in my testing and reading, the bigger L2 cache surely helps with keeping those 4 FP pipes fed. But IDK for sure.

edit: maybe SKL-X also needs the huge L2 cache of 1MB of AVX512 FPU

Well, servers are big and slow, but do a lot of work. That's why you see 32 core EPYC (server) chips. By contrast, desktops have smaller, faster cores. It's like comparing a Mack truck to a Ferrari. You don't use a fleet of Ferraris to haul cargo (like lots of web traffic), and you don't take an 18 wheeler to a race track (like running your favorite game at 165hz).
 
Top