I have a question about caches in CPU cores.

er557 · Mar 10, 2019

true, although gaming gradually becoming multi threaded, my server chips are on fire in gaming, any game, any refresh rate

ArbitraryAffection · Mar 10, 2019

hat said:
Well, servers are big and slow, but do a lot of work. That's why you see 32 core EPYC (server) chips. By contrast, desktops have smaller, faster cores. It's like comparing a Mack truck to a Ferrari. You don't use a fleet of Ferraris to haul cargo (like lots of web traffic), and you don't take an 18 wheeler to a race track (like running your favorite game at 165hz).

This is a good way of putting it

but honestly if we all could ,we woul all have the 18 wheeler doing Ferrari speed and efficiency haha. But yeah...

Part of me does kinda want a 9980XE, waterchiller, and 4.6+ all core OC for this very fact :3

er557 said:
true, although gaming gradually becoming multi threaded, my server chips are on fire in gaming, any game, any refresh rate

this too^.^ i hope to see the 2600X's of the world maybe giving the 9600k's a run for their money one day. But probably not

efikkan · Mar 10, 2019

ArbitraryAffection said:
and it shows 7700K much faster, i think faster than the clock speed increase would allow honestly. 8700k seems to do much better. what is all core boost for 7800x? 8700k is 4,.3 afaik and 7700k is 4,.4. not sure of 7800x.

According to this it's 4.0 GHz, but I haven't verified.
i7-7700K is 4.5 GHz (1 core), 4.4 GHz (2-4 cores).

ArbitraryAffection said:
Zen has 512kb of L2 though. i wonder why Skylake client can get away with 50% of the l2 cache ? more efficient prefetcher ? Well actually looking at Skylake server with 1MB of l2 per core not sure it makes a huge difference for gaming

hat said:
Cache is expensive, in every sense of the word. It's expensive to produce, sucks down power and kicks out a lot of heat. You don't want more cache than you need.<snip>
Why Zen has more cache than Skylake, I'm not sure. Maybe there's more "stuff" in the Zen cores than Skylake cores, which warrants having more cache?

To both;
It's easy to become blind on specs. Heck, even old 80486 supported at least 512kB L2 cache (off-die). L1 and L2 is closely tied to the microarchitecture, which is probably why Intel and AMD tweak the config more or less every generation. Heat is not the primary concern, but the size on the die certainly is, since it needs to be connected in the ideal spot. Moving it slightly might cause higher latency, and with higher clock speeds this is more sensitive than ever.

So back to the subject you both were mentioning; why is Skylake-S more efficient with half the L2 cache of Zen? It comes down to how the cache is used. The front-end/prefetcher operates on an instruction window, does OoOE, predicts branches etc. While Skylake have a slightly larger instruction window than Zen (224 vs. 192), Zen have other advantages like a larger micro-op cache (2048 vs. 1536), more L2 cache and more execution ports, so on paper Zen looks fairly strong but still doesn't answer the question.

When it comes to prefetching, you might think more is better, right? Wrong. Each cache line you write to L2 kicks something else out, so if you cache "useless" stuff, it might kick out more useful stuff, and you'll end up hurting the performance. So the most important thing of all is the algorithm you use to predict, and while it may not be visible in the tech specs, it's more important than the size of the L2 etc. My impression is that AMD's approach here might be a bit more "brute force" than Intel's. This is one of the reasons why I keep saying good benchmarks is what matters, not buying CPUs or GPUs based on a "limited" understanding of what they even mean.

hat said:
Well, servers are big and slow, but do a lot of work. That's why you see 32 core EPYC (server) chips. By contrast, desktops have smaller, faster cores. It's like comparing a Mack truck to a Ferrari. You don't use a fleet of Ferraris to haul cargo (like lots of web traffic), and you don't take an 18 wheeler to a race track (like running your favorite game at 165hz).

It also comes down to types of workloads.
Many server workloads are typically async, which means they will scale nearly perfectly across any core count, so all that really matters then is the balance between performance and total efficiency.
Synchronized multithreaded workloads on the other hand tend to get diminishing returns with increasing core count, so balancing core count and core speed is more important for typical end-user workloads.

ArbitraryAffection · Mar 10, 2019

efikkan said:
According to this it's 4.0 GHz, but I haven't verified.
i7-7700K is 4.5 GHz (1 core), 4.4 GHz (2-4 cores).

To both;
It's easy to become blind on specs. Heck, even old 80486 supported at least 512kB L2 cache (off-die). L1 and L2 is closely tied to the microarchitecture, which is probably why Intel and AMD tweak the config more or less every generation. Heat is not the primary concern, but the size on the die certainly is, since it needs to be connected in the ideal spot. Moving it slightly might cause higher latency, and with higher clock speeds this is more sensitive than ever.

So back to the subject you both were mentioning; why is Skylake-S more efficient with half the L2 cache of Zen? It comes down to how the cache is used. The front-end/prefetcher operates on an instruction window, does OoOE, predicts branches etc. While Skylake have a slightly larger instruction window than Zen (224 vs. 192), Zen have other advantages like a larger micro-op cache (2048 vs. 1536), more L2 cache and more execution ports, so on paper Zen looks fairly strong but still doesn't answer the question.

When it comes to prefetching, you might think more is better, right? Wrong. Each cache line you write to L2 kicks something else out, so if you cache "useless" stuff, it might kick out more useful stuff, and you'll end up hurting the performance. So the most important thing of all is the algorithm you use to predict, and while it may not be visible in the tech specs, it's more important than the size of the L2 etc. This is one of the reasons why I keep saying good benchmarks is what matters, not buying CPUs or GPUs based on a "limited" understanding of what they even mean.

It also comes down to types of workloads.
Many server workloads are typically async, which means they will scale nearly perfectly across any core count, so all that really matters then is the balance between performance and total efficiency.
Synchronized multithreaded workloads on the other hand tend to get diminishing returns with increasing core count, so balancing core count and core speed is more important for typical end-user workloads.

I'm loving reading your replies. IK i said thank you before but i say it agian because this atm i am like a kid in a candy store with this. haha. About zen, i heard Skylake has a much better branch predictor (also what exactly does BP do, and does it also rely on the cache?) so Zen may be wider but skylake is currently, smarter ? also on this subject, from Fritzchen Fritz' flickr, i grabbed the Zen+ die shot and using something i read i cropped a core and labeled some of the bits. I also tried to put a highlight on the execution engine logic, i mean the bits that do the number calculations. Alu, fpu, etc. and i may be wrong here, but i notice the amount of die space in the CPU core dedicated to actually working with the numbers, is much, much smaller than the bits dedicated to making it work with the numbers effectively. What do you think?

Please correct me if im wrong thanks^^

I hear Zen 1 is front end limited. in that the EU's are not being fully realised in performance due to scheduling, and stuff like that. And that Zen2 brings a completely redesigned front end. maybe this will make Zen finally smarter than skylake.

also, if 7800X is 4 ghz all core then yeah it is 10% lower is going to explain some of the performance delta.But in some games it is a whopping 30-50% slower. Just seems weird to me when 8700K has no such penalty. It must be something to do with the SKL-X architecture

All credit for these AWESOME die shots goes to Fritzchens Fritz, here

efikkan · Mar 10, 2019

ArbitraryAffection said:
… so Zen may be wider but skylake is currently, smarter ?

For non-AVX workloads, yes!
Skylake has 4 shared execution ports hooked up to ALUs, FPUs etc.
Zen (to my understanding) has 4 for INT and 2 for floats, the two for floats are fused to one when doing AVX, but work as two separate 128-bit FPUs otherwise.

ArbitraryAffection said:
also on this subject, from Fritzchen Fritz' flickr, i grabbed the Zen+ die shot and using something i read i cropped a core and labeled some of the bits. I also tried to put a highlight on the execution engine logic, i mean the bits that do the number calculations. Alu, fpu, etc. and i may be wrong here, but i notice the amount of die space in the CPU core dedicated to actually working with the numbers, is much, much smaller than the bits dedicated to making it work with the numbers effectively. What do you think?

Yes, your observation is correct.
The infrastructure to feed the execution engine is much bigger than the execution engine itself.
And if you removed the AVX part of the execution engine, it will be much reduced. Single FPUs and especially single ALUs are surprisingly tiny.

ArbitraryAffection said:
And that Zen2 brings a completely redesigned front end. maybe this will make Zen finally smarter than skylake.

I've said it many times, the key to matching or exceeding Intel's performance lies in the front-end.
AMD still have a sizable gap to close, but my guess is that Zen 2 will be much closer to Skylake.
Since I don't have any inside information, and the final products don't exist yet, I will refrain from making precise guesses about performance, since I know it will be pointless. The truth will be revealed in benchmarks.

But that being said, I wouldn't be too surprised if Zen 2 is faster than Skylake in some workloads, but still slower in others. I'm talking per core, of course.

ArbitraryAffection said:
also, if 7800X is 4 ghz all core then yeah it is 10% lower is going to explain some of the performance delta.But in some games it is a whopping 30-50% slower. Just seems weird to me when 8700K has no such penalty. It must be something to do with the SKL-X architecture

Could be, but it's hard to conclude without more details.
BTW; are you sure it was a proper comparison? Some motherboards auto-overclock etc.

ArbitraryAffection · Mar 10, 2019

Acording to Wikichip Zen has four FP pipes , 4x 128bits, two for multiplication and two for addition. so overal vector width is the same as skylake client (512bit) but has higher granularity. iam not 100% sure of this but it seems pretty reliable source. This also means, surely, Zen has better FPU for SMT due to the granularity.

According to AMD Zen2 is 4x float per socket vs Zen1. ROME vs NAPLES so thats 64 vs 32 cores. 2x of that is from the doubling of cores and two from the doubling of the FPU right?

so my theory is Zen2 has 4x 256 bit FPU pipes , this potentially means Zen2 can do AVX512 ops the same way Zen1 can do avx2.

edit fixed my typo of 1024 bit. Apparently i cant add XD

efikkan · Mar 10, 2019

ArbitraryAffection said:
Acording to Wikichip Zen has four FP pipes , 4x 128bits, two for multiplication and two for addition. so overal vector width is the same as skylakelike client (512bit) but has higher granularity. iam not 100% sure of this but it seems pretty reliable source. This also means, surely, Zen has better FPU for SMT due to the granularity.

According to AMD Zen2 is 4x float per socket vs Zen1. ROME vs NAPLES so thats 64 vs 32 cores. 2x of that is from the doubling of cores and two from the doubling of the FPU right?

so my theory is Zen2 has 4x 256 bit FPU pipes , this potentially means Zen2 can do AVX512 ops the same way Zen1 can do avx2.

I was counting complete sets of ADD + MUL, but sure, by your terms Zen technically has "four" 128-bit units.

Zen 2 will have 2 complete 256-bit sets. I haven't seen any info so far if they can be fused to one 512-bit unit or not.

ArbitraryAffection · Mar 10, 2019

efikkan said:
I was counting complete sets of ADD + MUL, but sure, by your terms Zen technically has "four" 128-bit units.

Zen 2 will have 2 complete 256-bit sets. I haven't seen any info so far if they can be fused to one 512-bit unit or not.

Ah right sorry, i misunderstood. Btw is the seperation of the mul and add units better than the combined ones? I mean surely amd had a reason for it. I always assumed this approach is better for SMT when using float heavy code, as with the 4 pipes it can be shared between the 2 threads in the core better?

biffzinker · Mar 10, 2019

ArbitraryAffection said:
I hear Zen 1 is front end limited. in that the EU's are not being fully realised in performance due to scheduling, and stuff like that. And that Zen2 brings a completely redesigned front end. maybe this will make Zen finally smarter than skylake.

It was covered here on TPU back in November:

The front-end of "Zen" and "Zen+" cores are believed to be refinements of previous-generation architectures such as "Excavator." Zen 2 gets a brand-new front-end that's better optimized to distribute and collect workloads between the various on-die components of the core.
https://www.techpowerup.com/249450/amd-zen-2-ipc-29-percent-higher-than-zen

efikkan · Mar 10, 2019

ArbitraryAffection said:
Ah right sorry, i misunderstood. Btw is the seperation of the mul and add units better than the combined ones? I mean surely amd had a reason for it. I always assumed this approach is better for SMT when using float heavy code, as with the 4 pipes it can be shared between the 2 threads in the core better?

It depends on how many execution ports are hooked up to them.
Some AVX instructions, like FMA requires both MUL and ADD at the same time, either on the same execution port or multiple ports fused.

I wouldn't mix threads (SMT) into this, they are not executing at the same time, but switching between them.

ArbitraryAffection · Mar 10, 2019

efikkan said:
It depends on how many execution ports are hooked up to them.
Some AVX instructions, like FMA requires both MUL and ADD at the same time, either on the same execution port or multiple ports fused.

I wouldn't mix threads (SMT) into this, they are not executing at the same time, but switching between them.

I thought SMT uses thread level paralelism to allow resources in the core to be used concurrently by two threads to increase utilisation? :s surely at some point, for example, an integer operation on thread 1 is being used at the same time as, say, an Float operation on thread 2? This is where my understanding really is quite low, getting into things like instruction level parallelism and such.

IDK but surely these threads can use the FP engine concurrently some time, if for example thread 1 needs only 128bit Add, and thread 2 needs 128bit mul?

edit sorry if im being dumb. i'm learning all the time^^

edit 2: fixed typo

efikkan · Mar 10, 2019

ArbitraryAffection said:
I thought SMT uses thread level paralelism to allow resources in the core to be used concurrently by two cores to increase utilisation?

It's fine to ask. It depends on the implementation. To my understanding, Intel's implementation is mostly about utilizing idle cycles (caused by cache misses, flushes from branch mispredictions, etc.).

I forgot to answer this one:

ArbitraryAffection said:
also what exactly does BP do, and does it also rely on the cache?

I don't know the exact details of the algorithm, but I have a general understanding of how it works, based on reading documentation from Intel. The branch prediction basically keeps a list of recent conditionals and some kind of statistics of how often they are true or false. This list is not stored in L1, it has a separate specialized bank of memory for this.

Let's say the CPU iterates a loop, and on a specific address it runs into a conditional. Every time it meets the same conditional from the list, it uses the statistics to guess true or false, and every time it's done executing it feeds the updated statistics back.

It's worth mentioning that this list is not large, and probably just contains the most recent conditionals. So it's not like it contains everything in a program, and it's not not stored for later benefiting then next time you run the program, it's more like a "short-term memory" of the last few sections of code.

mtcn77 · Mar 10, 2019

Cache size versus efficiency is an issue. That is why Intel had less L2 versus L3, for efficiency, recently they too opted for faster lower level caches.
There was a reference I cannot recall as of right now, but if you can allocate more data in a higher state cache, you save on fetches which end up making Ryzen quite more efficient with slower & larger-efficient SRAM cells.

Aquinus · Mar 10, 2019

efikkan said:
It's fine to ask. It depends on the implementation. To my understanding, Intel's implementation is mostly about utilizing idle cycles (caused by cache misses, flushes from branch mispredictions, etc.).

I forgot to answer this one:

I don't know the exact details of the algorithm, but I have a general understanding of how it works, based on reading documentation from Intel. The branch prediction basically keeps a list of recent conditionals and some kind of statistics of how often they are true or false. This list is not stored in L1, it has a separate specialized bank of memory for this.

Let's say the CPU iterates a loop, and on a specific address it runs into a conditional. Every time it meets the same conditional from the list, it uses the statistics to guess true or false, and every time it's done executing it feeds the updated statistics back.

It's worth mentioning that this list is not large, and probably just contains the most recent conditionals. So it's not like it contains everything in a program, and it's not not stored for later benefiting then next time you run the program, it's more like a "short-term memory" of the last few sections of code.

Just to add a little bit of background on this, I want to answer the question that wasn't asked: why do we need branch prediction (not just how it works.) Branch prediction is important because otherwise the pipeline in a superscalar CPU would stall every time a conditional was encountered because you don't know what the next instruction will be if the conditional hasn't been evaluated yet. This is the same reason why speculative execution exists, but takes a different approach to keep the pipeline filled, by executing both code paths (as much as possible,) in order to mitigate the impact of a stall should the branch be predicted incorrectly. Both of these techniques are designed to prevent or mitigate pipeline stalls which are more costly the longer the pipeline is.

Edit: The thing about branch prediction also is that it's possible that the data that determines the condition may have already been calculated and that something in the pipeline isn't required to determine the branch, in this case, the computer can accurately say "I already know what the result of this is going to be even though the instruction hasn't executed yet." This is far harder to solve when the last instruction alters the data used for the condition.

mtcn77 · Mar 10, 2019

Aquinus said:
Just to add a little bit of background on this, I want to answer the question that wasn't asked: why do we need branch prediction (not just how it works.) Branch prediction is important because otherwise the pipeline in a superscalar CPU would stall every time a conditional was encountered because you don't know what the next instruction will be if the conditional hasn't been evaluated yet. This is the same reason why speculative execution exists, but takes a different approach to keep the pipeline filled, by executing both code paths (as much as possible,) in order to mitigate the impact of a stall should the branch be predicted incorrectly. Both of these techniques are designed to prevent or mitigate pipeline stalls which are more costly the longer the pipeline is.

Edit: The thing about branch prediction also is that it's possible that the data that determines the condition may have already been calculated and that something in the pipeline isn't required to determine the branch, in this case, the computer can accurately say "I already know what the result of this is going to be even though the instruction hasn't executed yet." This is far harder to solve when the last instruction alters the data used for the condition.

What I've noticed is, the opposite is the case with gpus. That makes me believe the caches far outweigh the execution resource power budget in cpus. I still wish I understood gpu caching and pipeline scalarization in general.

ArbitraryAffection · Mar 10, 2019

biffzinker said:
It was covered here on TPU back in November:

Yep thats where i read it^^ back then i wasnt so active on TPU though :c

I'm learning a lot. thanks everyone for the insight^^

efikkan · Mar 10, 2019

Aquinus said:
Branch prediction is important because otherwise the pipeline in a superscalar CPU would stall every time a conditional was encountered because you don't know what the next instruction will be if the conditional hasn't been evaluated yet. This is the same reason why speculative execution exists, but takes a different approach to keep the pipeline filled, by executing both code paths (as much as possible,) in order to mitigate the impact of a stall should the branch be predicted incorrectly. Both of these techniques are designed to prevent or mitigate pipeline stalls which are more costly the longer the pipeline is.

You are mixing the terms a bit here. Both predictive execution and "eager execution" (executing both branches of a conditional) are types of speculative execution. Each strategy have their advantages and disadvantages. Most notably is the problem of executing both branches gives an exponential problem and doesn't scale well for multiple conditionals. Predictive execution works fairly well with repeated conditionals (with the same outcome) in a loop, but everywhere else it has a 50/50 chance of success per branching. It's worth noting that both AMD and Intel currently relies on predictive execution.

Vya Domus · Mar 10, 2019

ArbitraryAffection said:
I thought SMT uses thread level paralelism to allow resources in the core to be used concurrently by two cores to increase utilisation?

Resource are used concurrently anyway because all modern CPU are superscalar and out of order, the deal with SMT is somewhat more complex. A core may have up to something like 3 ALUs and 2 FPUs for example, depending on which instructions are issued from 1 thread different execution units are in use and some remain idle. This happens because dependencies exist between instructions and simply because of the fact that for instance you may have a sequence of 1000 instructions and not a single one of them may need an floating point calculation.

Having more than one thread from which you can reorder instructions means more opportunities to use the executions units available hence the usefulness of SMT.

mtcn77 said:
What I've noticed is, the opposite is the case with gpus. That makes me believe the caches far outweigh the execution resource power budget in cpus. I still wish I understood gpu caching and pipeline scalarization in general.

Modern GPU cores are based on SIMT architectures, Single Instruction Multiple Threads. This means you simply can't have complex control over which thread does what which doesn't, the way GPUs handle conditional instructions is that that both paths are executed and a mask is applied to filter out the threads that took the wrong branch. This is wasteful but that's why GPUs are designed to have up to 64 threads per core.

As far as caches are concerned their effectiveness isn't as relevant because most of the latencies caused by cache misses are hidden by other instructions that are already scheduled to be executed.

ArbitraryAffection · Mar 10, 2019

btw i used to write scripts for fallout 3 and new vegas. So i ask, is a"conditional":

Code:

scn myscript

short var1

begin onActivate
    if var1 == 1 ; is this the conditional check?
        ;do this operation if var1 returns a value of 1
    elseif var1 == 2
        ;do this operation if var1 returns a value of 2
    else
       ; do this if var1 is not those 2
   endif

end

btw it been YEARS since i did scritps for this engine so i may made a mistake. but i think thats how it goes. even then you get the idea XD

Actually, i think i would have done it this way:

Code:

scn myscript

short var1

begin onActivate
    if var1 == 0 ; is this the conditional check?
        enable ;turn my light on
        set var1 to 1
    endif
    if var == 1
        disable ;turn my light off
        set var1 to 0
    endif

end

rip. I could have used the getEnabled check. LOL, ok now i want to install F3 again and make a mod.........

Vya Domus · Mar 10, 2019

Yeah that would be an example of instruction branching.

efikkan · Mar 10, 2019

If statements is a type of conditional, which is a logical branching in control flow. If statements is the most common type, but there are also others such as ternary operators, switch etc. "Conditional" is the generic term.

System Name	future xeon II
Processor	DUAL SOCKET xeon e5 2686 v3 , 36c/72t, hacked all cores @3.5ghz, TDP limit hacked
Motherboard	asrock rack ep2c612 ws
Cooling	case fans,liquid corsair h100iv2 x2
Memory	96 gb ddr4 2133mhz gskill+corsair
Video Card(s)	2x 1080 sc acx3 SLI, @STOCK
Storage	Hp ex950 2tb nvme+ adata xpg sx8200 pro 1tb nvme+ sata ssd's+ spinners
Display(s)	philips 40" bdm4065uc 4k @60
Case	silverstone temjin tj07-b
Audio Device(s)	sb Z
Power Supply	corsair hx1200i
Mouse	corsair m95 16 buttons
Keyboard	microsoft internet keyboard pro
Software	windows 10 x64 1903 ,enterprise
Benchmark Scores	fire strike ultra- 10k time spy- 15k cpu z- 400/15000

Processor	AMD Ryzen 9 5900X \|\|\| Intel Core i7-3930K
Motherboard	ASUS ProArt B550-CREATOR \|\|\| Asus P9X79 WS
Cooling	Noctua NH-U14S \|\|\| Be Quiet Pure Rock
Memory	Crucial 2 x 16 GB 3200 MHz \|\|\| Corsair 8 x 8 GB 1333 MHz
Video Card(s)	MSI GTX 1060 3GB \|\|\| MSI GTX 680 4GB
Storage	Samsung 970 PRO 512 GB + 1 TB \|\|\| Intel 545s 512 GB + 256 GB
Display(s)	Asus ROG Swift PG278QR 27" \|\|\| Eizo EV2416W 24"
Case	Fractal Design Define 7 XL x 2
Audio Device(s)	Cambridge Audio DacMagic Plus
Power Supply	Seasonic Focus PX-850 x 2
Mouse	Razer Abyssus
Keyboard	CM Storm QuickFire XT
Software	Ubuntu

Processor	AMD Ryzen 9 5900X \|\|\| Intel Core i7-3930K
Motherboard	ASUS ProArt B550-CREATOR \|\|\| Asus P9X79 WS
Cooling	Noctua NH-U14S \|\|\| Be Quiet Pure Rock
Memory	Crucial 2 x 16 GB 3200 MHz \|\|\| Corsair 8 x 8 GB 1333 MHz
Video Card(s)	MSI GTX 1060 3GB \|\|\| MSI GTX 680 4GB
Storage	Samsung 970 PRO 512 GB + 1 TB \|\|\| Intel 545s 512 GB + 256 GB
Display(s)	Asus ROG Swift PG278QR 27" \|\|\| Eizo EV2416W 24"
Case	Fractal Design Define 7 XL x 2
Audio Device(s)	Cambridge Audio DacMagic Plus
Power Supply	Seasonic Focus PX-850 x 2
Mouse	Razer Abyssus
Keyboard	CM Storm QuickFire XT
Software	Ubuntu

Processor	AMD Ryzen 9 5900X \|\|\| Intel Core i7-3930K
Motherboard	ASUS ProArt B550-CREATOR \|\|\| Asus P9X79 WS
Cooling	Noctua NH-U14S \|\|\| Be Quiet Pure Rock
Memory	Crucial 2 x 16 GB 3200 MHz \|\|\| Corsair 8 x 8 GB 1333 MHz
Video Card(s)	MSI GTX 1060 3GB \|\|\| MSI GTX 680 4GB
Storage	Samsung 970 PRO 512 GB + 1 TB \|\|\| Intel 545s 512 GB + 256 GB
Display(s)	Asus ROG Swift PG278QR 27" \|\|\| Eizo EV2416W 24"
Case	Fractal Design Define 7 XL x 2
Audio Device(s)	Cambridge Audio DacMagic Plus
Power Supply	Seasonic Focus PX-850 x 2
Mouse	Razer Abyssus
Keyboard	CM Storm QuickFire XT
Software	Ubuntu

Processor	Core i7-13700
Motherboard	MSI Z790 Gaming Plus WiFi
Cooling	Cooler Master RGB something
Memory	Corsair DDR5-6000 small OC to 6200
Video Card(s)	XFX Speedster SWFT309 AMD Radeon RX 6700 XT CORE Gaming
Storage	970 EVO NVMe M.2 500GB,,WD850N 2TB
Display(s)	Samsung 28” 4K monitor
Case	Phantek Eclipse P400S
Audio Device(s)	EVGA NU Audio
Power Supply	EVGA 850 BQ
Mouse	Logitech G502 Hero
Keyboard	Logitech G G413 Silver
Software	Windows 11 Professional v23H2

System Name	Apollo
Processor	Intel Core i9 9880H
Motherboard	Some proprietary Apple thing.
Memory	64GB DDR4-2667
Video Card(s)	AMD Radeon Pro 5600M, 8GB HBM2
Storage	1TB Apple NVMe, 4TB External
Display(s)	Laptop @ 3072x1920 + 2x LG 5k Ultrafine TB3 displays
Case	MacBook Pro (16", 2019)
Audio Device(s)	AirPods Pro, Sennheiser HD 380s w/ FIIO Alpen 2, or Logitech 2.1 Speakers
Power Supply	96w Power Adapter
Mouse	Logitech MX Master 3
Keyboard	Logitech G915, GL Clicky
Software	MacOS 12.1

System Name	Good enough
Processor	AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard	ASRock B650 Pro RS
Cooling	2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory	32GB - FURY Beast RGB 5600 Mhz
Video Card(s)	Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage	1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s)	LG UltraGear 32GN650-B + 4K Samsung TV
Case	Phanteks NV7
Power Supply	GPS-750C

I have a question about caches in CPU cores.

Resident Wat-man