Friday, November 15th 2024
Intel Plans to Copy AMD's 3D V-Cache Tech in 2025, Just Not for Desktops
Intel is coming around to the idea of large last-level caches on its processors. Florian Maislinger, a tech communications manager for Intel, in an interview with Der8auer and Bens Hardware, revealed that the company is working on augmenting its processors with large shared L3 caches, however, it will begin doing so only with its server processors. The company is working on a new server/workstation processor for 2025 that comes with cache tiles that augment the shared L3 cache on its server processor, so it excels in the kind of workloads AMD's EPYC "Genoa-X" processors and upcoming "Turin-X" processors excel at—technical computing. On "Genoa-X" processors, each of the up to 12 "Zen 4" CCDs comes with stacked 3D V-Cache, which is found to have a profound impact on performance in applications that are cache-sensitive, such as the Ansys suite, OpenFOAM, etc.
The interview reveals that the server processor with large last-level cache should come out in 2025, however there is no such effort on the horizon for the company's client processors, such as the Core Ultra "Arrow Lake-S," at least not in the year 2025. The company's recently launched "Arrow Lake-S" desktop processors do not provide a generational gaming performance uplift over the 14th Gen Core "Raptor Lake Refresh," however, Intel claims to have identified certain correctable reasons for the gaming performance falling below expectations, and is hoping to release updates to the processor (possibly in the form of a new microcode, or something at the OS-vendor level). This, the company claims, should improve the gaming performance of "Arrow Lake-S."
Sources:
Der8auer (YouTube), VideoCardz, HardwareLuxx.de
The interview reveals that the server processor with large last-level cache should come out in 2025, however there is no such effort on the horizon for the company's client processors, such as the Core Ultra "Arrow Lake-S," at least not in the year 2025. The company's recently launched "Arrow Lake-S" desktop processors do not provide a generational gaming performance uplift over the 14th Gen Core "Raptor Lake Refresh," however, Intel claims to have identified certain correctable reasons for the gaming performance falling below expectations, and is hoping to release updates to the processor (possibly in the form of a new microcode, or something at the OS-vendor level). This, the company claims, should improve the gaming performance of "Arrow Lake-S."
77 Comments on Intel Plans to Copy AMD's 3D V-Cache Tech in 2025, Just Not for Desktops
Next time I want a patch for an application I'll write some random factory in China, too.
TSMC's 3D Stacked SoIC Packaging Making Quick Progress, Eyeing Ultra-Dense 3μm Pitch In 2027
And you have this deck from TSMC back in 2021 regarding 3d stacking: Advanced Technology Leadership
Intel will have an issue though: For all but its highest-billing most demanding customers, adding extra cache will 'extend' the usable life of the platform.
I wholly expect hardware-level platform locking, and a non-existent 2nd hand market (in years to come).
As heavy workloads move more and more towards SIMD (e.g. AVX-512), the amount of data streaming through memory->L2->L3 is greater than ever, and the chances of a hit in L3 data cache is getting slimmer and slimmer. (Which should be obvious, as the workload needs to be cache optimized, for both instruction and data, otherwise the pipeline would stall.) The amount of data cache lines greatly outnumbers instruction cache lines, which is why AMD needed so much of it in order to make a tiny difference.
While instruction cache lines are comparatively "few" in number and not bottlenecked by memory bandwidth, the cache hierarchy for data cache lines behave like a "streaming buffer"; a continuous stream of data flowing from memory->L2->L3, all the data being overwritten every few thousand clock cycles, so the bottleneck here would not be L3 bandwidth, but rather memory bandwidth.
It's no accident that CPUs over the past decade or so have continuously increased bandwidth of both memory and caches, especially for heavy AVX workloads, and even prioritizing bandwidth over latency. While the cache sizes (L1I, L1D, L2, L3) have comparatively remained fairly stable until the arrival of 3D V-cache (except growing L3 proportionally to core count), otherwise you might have expected a 1GB L2 cache by now. And this "discrepancy" is due to misconceptions about how chaches work; as said the caches are an extremely efficient streaming buffer to keep the execution ports fed (with staggering amounts of data flowing through there), not a hierarchy of data based on "importance". :) Nice attempt at a straw man argument there, but you are in fact just grasping at straws.
Intel's taking a similar approach, but will call it something else.
Oh well.
L1 1ns - 4 cycles
L2 3ns - 14 cycles
L3 10ns - 50 cycles
L4/eDRAM 36ns - 140 cycles
DRAM 60-100ns - MANY cycles
Guess where X3D stands
Now the L4 has what? 50-100GB/sec bandwidth?
Just for comparison the first gen X3D can hit 600GB/sec with 47 cycles latency.
So it has 6x bandwidth and 3x faster access times....which is the same as most L3 caches.
Just fyi simple CPU instructions usually last 1-4 cycles and more complex ones like AVX might be up to 20-60-100 cycles
I think that the reason for L1/L2 caches not increasing is because they're part of the cores, doubling of size means greater area and bigger dies, which means higher latencies, only recently has density improved enough (die shrinks used to provide 2-3x density) due to EUV, that we saw some improvement.
In fact both L1 and L2 have increased in the last few gens after 20 years of staying between 256-512KB (not counting halo products like the FX or the shared L2....but different FX) all without increasing the latencies.
L3 is just easier to increase or move into it's own stacked die, there's even rumours that AMD plans to have the next Zen arch with L3 cache completely moved on a stacked die
U PC have something wrong, maybe slow Ram?
my second PC whit 9900K and 4090 there is no difference in 4k gaming VS my main system 7800X3D whit same GPU
even 1440p there is no big differences
But if i use GPU like 4060 then i will se difference Asap,not because CPU but because Slow GPU
AMD have good CPUs but in real world GPU is much more important.
Both Intel/Amd even whit older CPUs can do gaming just fine.
Ppls just hyped extra % they see in Bench 1080p+4090 it was engineered by TSMC not AMD
TM says it all. TSMC invention licensed to AMD i guess for their use. I don't think it was originally for memory stacking was it? AMD just used it that way.
Can't wait to see how Intel does it, surely they can't copy, unless they get a secret license from TSMC to use it
Most core AVX operations are within 1-5 cycles on recent architectures. Haswell and Skylake did a lot to improve AVX throughput, but there have been several improvements since then too. E.g. add operations are now down from 4 to 2 cycles on Alder Lake and Sapphire Rapids. Shift operations are down to a single cycle. This is as fast as single integer operations. And FYI, all floating point operations go through the vector units, whether it's single operation, SSE or AVX, the latency will be the same. ;)
TSMCs 3D technology was used to manufacture this idea and they decided to further improve the technology.
Don't mix general 3D manufacturing process with that tile of extra cache im X3D CPUs.
Yes the fabrication technologies was researched by TSMC and they are the one doing it. But guess what, this is normal as they are they are the one making those chip for AMD. AMD is not a fab.
But, AMD is the only one right now using that technologies because this isn't just a box you tick when you order TSMC some wafer. It's not "I would take X chips with more cache". You still have to design a chip that will be able to communicate with the cache chips, send power etc.
The physical portion of 3D Vcache is a TSMC technology. This is expected as AMD is fabless.
The logical portion of 3D Vcache is an AMD technology. This is expected as TSMC do not design chip.
In the end, it's a collaboration of both company.
Also, The added chip is indeed L3. There is no separate lookup for that chip when there is a check if it de data is in the L3 cache. The whole 96 MB is looked at the same time and there is no penalty for accessing data into the 3d vcache chip.
The below is from my second link.
If you want TSMC's official link you can find it here: nhttps://3dfabric.tsmc.com/english/dedicatedFoundry/technology/3DFabric.htmt
In fact, here is TSMCs press release introducing the packaging, Introducing TSMC 3DFabric: TSMC’s Family of 3D Silicon Stacking, Advanced Packaging Technologies and Services - Taiwan Semiconductor Manufacturing Company Limited