• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Intel Plans to Copy AMD's 3D V-Cache Tech in 2025, Just Not for Desktops

Joined
Sep 17, 2014
Messages
22,564 (6.03/day)
Location
The Washing Machine
System Name Tiny the White Yeti
Processor 7800X3D
Motherboard MSI MAG Mortar b650m wifi
Cooling CPU: Thermalright Peerless Assassin / Case: Phanteks T30-120 x3
Memory 32GB Corsair Vengeance 30CL6000
Video Card(s) ASRock RX7900XT Phantom Gaming
Storage Lexar NM790 4TB + Samsung 850 EVO 1TB + Samsung 980 1TB + Crucial BX100 250GB
Display(s) Gigabyte G34QWC (3440x1440)
Case Lian Li A3 mATX White
Audio Device(s) Harman Kardon AVR137 + 2.1
Power Supply EVGA Supernova G2 750W
Mouse Steelseries Aerox 5
Keyboard Lenovo Thinkpad Trackpoint II
VR HMD HD 420 - Green Edition ;)
Software W11 IoT Enterprise LTSC
Benchmark Scores Over 9000
It's not AMD's 3D vcache, it's TSMC's
So its not Intel's CPU either anymore then. Neat!

Next time I want a patch for an application I'll write some random factory in China, too.
 
Joined
Nov 4, 2023
Messages
101 (0.25/day)
Last edited:
Joined
Apr 13, 2022
Messages
1,185 (1.22/day)
What a shame! Intel's desktop lineup could really use such a boost.
Desktop is by far the least important line up. Server and mobile are what matters. Desktop is so far behind either it's laughable. They are getting crushed in server but doing ok in mobile.
 
Last edited:
Joined
May 31, 2005
Messages
285 (0.04/day)
There was a Broadwell chip with 60MB L3 cache. They aren't new to big L3. Sapphire Rapids has around 110MB L3 and also optionally a huge L4. More cache is just the natural progression for all these companies because the problems to solve are the same as ever.
 
Last edited:
Joined
Nov 4, 2005
Messages
11,994 (1.72/day)
System Name Compy 386
Processor 7800X3D
Motherboard Asus
Cooling Air for now.....
Memory 64 GB DDR5 6400Mhz
Video Card(s) 7900XTX 310 Merc
Storage Samsung 990 2TB, 2 SP 2TB SSDs, 24TB Enterprise drives
Display(s) 55" Samsung 4K HDR
Audio Device(s) ATI HDMI
Mouse Logitech MX518
Keyboard Razer
Software A lot.
Benchmark Scores Its fast. Enough.
It is not, TSMC owns the 3d cache packaging. It is not an AMD design. AMD simply took advantage of a services that TSMC offered (3d cache) and tried it out on their processers.

TSMC's 3D Stacked SoIC Packaging Making Quick Progress, Eyeing Ultra-Dense 3μm Pitch In 2027

And you have this deck from TSMC back in 2021 regarding 3d stacking: Advanced Technology Leadership
It was based on AMD interposer technology for the first HBM stacks in 2015. That Intel also copied, and Nvidia.

1731705723105.jpeg
 
Joined
Mar 18, 2023
Messages
911 (1.43/day)
System Name Never trust a socket with less than 2000 pins
Give us HEDT CPUs with the cache and ECC memory and I'll forget about the desktop. Deal?
 
Joined
Apr 18, 2019
Messages
2,387 (1.16/day)
Location
Olympia, WA
System Name Sleepy Painter
Processor AMD Ryzen 5 3600
Motherboard Asus TuF Gaming X570-PLUS/WIFI
Cooling FSP Windale 6 - Passive
Memory 2x16GB F4-3600C16-16GVKC @ 16-19-21-36-58-1T
Video Card(s) MSI RX580 8GB
Storage 2x Samsung PM963 960GB nVME RAID0, Crucial BX500 1TB SATA, WD Blue 3D 2TB SATA
Display(s) Microboard 32" Curved 1080P 144hz VA w/ Freesync
Case NZXT Gamma Classic Black
Audio Device(s) Asus Xonar D1
Power Supply Rosewill 1KW on 240V@60hz
Mouse Logitech MX518 Legend
Keyboard Red Dragon K552
Software Windows 10 Enterprise 2019 LTSC 1809 17763.1757
Been sayin' that X3D is good for more than gaming...

Intel will have an issue though: For all but its highest-billing most demanding customers, adding extra cache will 'extend' the usable life of the platform.
I wholly expect hardware-level platform locking, and a non-existent 2nd hand market (in years to come).
 
Joined
Jun 10, 2014
Messages
2,987 (0.78/day)
Processor AMD Ryzen 9 5900X ||| Intel Core i7-3930K
Motherboard ASUS ProArt B550-CREATOR ||| Asus P9X79 WS
Cooling Noctua NH-U14S ||| Be Quiet Pure Rock
Memory Crucial 2 x 16 GB 3200 MHz ||| Corsair 8 x 8 GB 1333 MHz
Video Card(s) MSI GTX 1060 3GB ||| MSI GTX 680 4GB
Storage Samsung 970 PRO 512 GB + 1 TB ||| Intel 545s 512 GB + 256 GB
Display(s) Asus ROG Swift PG278QR 27" ||| Eizo EV2416W 24"
Case Fractal Design Define 7 XL x 2
Audio Device(s) Cambridge Audio DacMagic Plus
Power Supply Seasonic Focus PX-850 x 2
Mouse Razer Abyssus
Keyboard CM Storm QuickFire XT
Software Ubuntu
Even well optimized games and workloads can benefit if the highly utilized code can be contained in the cache, as it is higher bandwidth and lower latency than waiting to go to system RAM. Even factorio, which is an extremely well optimized game, massively benefits from this, as do many other workloads.
You don't grasp the difference between L2 and L3 caches. L3 only contains data recently discarded by L2, so it's cache lines that have either been very recently used or more likely pre-fetched and then never used at all. The most data and computationally intensive workloads see no benefit beyond a decent L3 cache, because the program is what we called cached optimized, which is a requirement for any performant piece of software. For any such heavy workload, the chances of a hit in L3 of a data cache line is extremely low, except for the few times cores are synced. This means the few hits that you actually get is likely instruction cache lines, and the rest is just meaningless garbage streaming through the L3. Sensitivity to L3 cache is mainly known as an indicator of bloat in software optimization, and the solution is to reduce said bloat and make the code more computationally dense.

As heavy workloads move more and more towards SIMD (e.g. AVX-512), the amount of data streaming through memory->L2->L3 is greater than ever, and the chances of a hit in L3 data cache is getting slimmer and slimmer. (Which should be obvious, as the workload needs to be cache optimized, for both instruction and data, otherwise the pipeline would stall.) The amount of data cache lines greatly outnumbers instruction cache lines, which is why AMD needed so much of it in order to make a tiny difference.

While instruction cache lines are comparatively "few" in number and not bottlenecked by memory bandwidth, the cache hierarchy for data cache lines behave like a "streaming buffer"; a continuous stream of data flowing from memory->L2->L3, all the data being overwritten every few thousand clock cycles, so the bottleneck here would not be L3 bandwidth, but rather memory bandwidth.

It's no accident that CPUs over the past decade or so have continuously increased bandwidth of both memory and caches, especially for heavy AVX workloads, and even prioritizing bandwidth over latency. While the cache sizes (L1I, L1D, L2, L3) have comparatively remained fairly stable until the arrival of 3D V-cache (except growing L3 proportionally to core count), otherwise you might have expected a 1GB L2 cache by now. And this "discrepancy" is due to misconceptions about how chaches work; as said the caches are an extremely efficient streaming buffer to keep the execution ports fed (with staggering amounts of data flowing through there), not a hierarchy of data based on "importance". :)

You may as well say computers don't need more than 64k of RAM and any applications that do are poorly optimized.
Nice attempt at a straw man argument there, but you are in fact just grasping at straws.
 
Joined
Aug 12, 2010
Messages
134 (0.03/day)
Location
Brazil
Processor Ryzen 7 7800X3D
Motherboard ASRock B650M PG Riptide
Cooling Wraith Max + 2x Noctua Redux NF-P12
Memory 2x16GB ADATA XPG Lancer Blade DDR5-6000 CL30
Video Card(s) Powercolor RX 7800 XT Fighter OC
Storage ADATA Legend 970 2TB PCIe 5.0
Display(s) Dell 32" S3222DGM - 1440P 165Hz + P2422H
Case HYTE Y40
Audio Device(s) Microsoft Xbox TLL-00008
Power Supply Cooler Master MWE 750 V2
Mouse Alienware AW320M
Keyboard Alienware AW510K
Software Windows 11 Pro
It's not AMD's 3D vcache, it's TSMC's
It was engineered by AMD and manufactured by TSMC.

Intel's taking a similar approach, but will call it something else.
 
Joined
Nov 13, 2007
Messages
10,808 (1.73/day)
Location
Austin Texas
System Name stress-less
Processor 9800X3D @ 5.42GHZ
Motherboard MSI PRO B650M-A Wifi
Cooling Thermalright Phantom Spirit EVO
Memory 64GB DDR5 6400 1:1 CL30-36-36-76 FCLK 2200
Video Card(s) RTX 4090 FE
Storage 2TB WD SN850, 4TB WD SN850X
Display(s) Alienware 32" 4k 240hz OLED
Case Jonsbo Z20
Audio Device(s) Yes
Power Supply Corsair SF750
Mouse DeathadderV2 X Hyperspeed
Keyboard 65% HE Keyboard
Software Windows 11
Benchmark Scores They're pretty good, nothing crazy.
AMD has clearly better efficiency, but it's not due to the large L3 cache.
But it's hard to find something more deserving of the title "waste of sand" than throwing a bunch of L3 cache on a die, as it's only a tiny subset of very poorly optimized code which significantly benefit from it, namely certain outliers in applications and games running at very unrealistically low GPU load. It would be much better to have a CPU with 5% more computational power, especially down the road, as future games are likely to become more demanding so the bottleneck will be computational performance, not "artificial" ones running games at hundreds of frames per second.
For CPUs to advance, they should stop focusing on gimmicks and make actual architectural advancements instead. Large L3 caches is a waste of precious development resources as well as production capacity.

unless your architectural advancements are bottlenecked by memory bandwidth and latency- then that waste of sand turns into out of stock products that everyone who runs games wants….
 
Joined
Apr 30, 2020
Messages
995 (0.59/day)
System Name S.L.I + RTX research rig
Processor Ryzen 7 5800X 3D.
Motherboard MSI MEG ACE X570
Cooling Corsair H150i Cappellx
Memory Corsair Vengeance pro RGB 3200mhz 32Gbs
Video Card(s) 2x Dell RTX 2080 Ti in S.L.I
Storage Western digital Sata 6.0 SDD 500gb + fanxiang S660 4TB PCIe 4.0 NVMe M.2
Display(s) HP X24i
Case Corsair 7000D Airflow
Power Supply EVGA G+1600watts
Mouse Corsair Scimitar
Keyboard Cosair K55 Pro RGB
I must the only person who wants to see AMD try Forveros for dual CCD die cpu's....
Oh well.
 
Joined
Jul 21, 2016
Messages
102 (0.03/day)
Caches are usually defined by cycle latencies, not by size or preference.

L1 1ns - 4 cycles
L2 3ns - 14 cycles
L3 10ns - 50 cycles
L4/eDRAM 36ns - 140 cycles
DRAM 60-100ns - MANY cycles

Guess where X3D stands
Now the L4 has what? 50-100GB/sec bandwidth?
Just for comparison the first gen X3D can hit 600GB/sec with 47 cycles latency.
So it has 6x bandwidth and 3x faster access times....which is the same as most L3 caches.
Just fyi simple CPU instructions usually last 1-4 cycles and more complex ones like AVX might be up to 20-60-100 cycles

I think that the reason for L1/L2 caches not increasing is because they're part of the cores, doubling of size means greater area and bigger dies, which means higher latencies, only recently has density improved enough (die shrinks used to provide 2-3x density) due to EUV, that we saw some improvement.

In fact both L1 and L2 have increased in the last few gens after 20 years of staying between 256-512KB (not counting halo products like the FX or the shared L2....but different FX) all without increasing the latencies.

L3 is just easier to increase or move into it's own stacked die, there's even rumours that AMD plans to have the next Zen arch with L3 cache completely moved on a stacked die
 
Joined
Dec 29, 2010
Messages
3,809 (0.75/day)
Processor AMD 5900x
Motherboard Asus x570 Strix-E
Cooling Hardware Labs
Memory G.Skill 4000c17 2x16gb
Video Card(s) RTX 3090
Storage Sabrent
Display(s) Samsung G9
Case Phanteks 719
Audio Device(s) Fiio K5 Pro
Power Supply EVGA 1000 P2
Mouse Logitech G600
Keyboard Corsair K95
Joined
Apr 18, 2019
Messages
2,387 (1.16/day)
Location
Olympia, WA
System Name Sleepy Painter
Processor AMD Ryzen 5 3600
Motherboard Asus TuF Gaming X570-PLUS/WIFI
Cooling FSP Windale 6 - Passive
Memory 2x16GB F4-3600C16-16GVKC @ 16-19-21-36-58-1T
Video Card(s) MSI RX580 8GB
Storage 2x Samsung PM963 960GB nVME RAID0, Crucial BX500 1TB SATA, WD Blue 3D 2TB SATA
Display(s) Microboard 32" Curved 1080P 144hz VA w/ Freesync
Case NZXT Gamma Classic Black
Audio Device(s) Asus Xonar D1
Power Supply Rosewill 1KW on 240V@60hz
Mouse Logitech MX518 Legend
Keyboard Red Dragon K552
Software Windows 10 Enterprise 2019 LTSC 1809 17763.1757
Joined
Oct 22, 2014
Messages
14,142 (3.82/day)
Location
Sunshine Coast
System Name H7 Flow 2024
Processor AMD 5800X3D
Motherboard Asus X570 Tough Gaming
Cooling Custom liquid
Memory 32 GB DDR4
Video Card(s) Intel ARC A750
Storage Crucial P5 Plus 2TB.
Display(s) AOC 24" Freesync 1m.s. 75Hz
Mouse Lenovo
Keyboard Eweadn Mechanical
Software W11 Pro 64 bit
Copying could open the door for litigation.
 
Joined
Sep 19, 2014
Messages
60 (0.02/day)
I don't understand those people honestly. Benchmarks don't show periodic stutters which you can get in some games for example, and those are fully eliminated for me. Plus, you do see the better lows and general performance in benchmarks. If CPUs didn't make a difference we'd all have 4090s paired with ancient processors.

I have my 7800X3D at 40-60W providing a much better, much more consistent experience with the same GPU and screen than my 9900k that was eating 150W.
CPU is only important now because its AMD right?

U PC have something wrong, maybe slow Ram?
my second PC whit 9900K and 4090 there is no difference in 4k gaming VS my main system 7800X3D whit same GPU
even 1440p there is no big differences

But if i use GPU like 4060 then i will se difference Asap,not because CPU but because Slow GPU

AMD have good CPUs but in real world GPU is much more important.
Both Intel/Amd even whit older CPUs can do gaming just fine.

Ppls just hyped extra % they see in Bench 1080p+4090

It was engineered by AMD and manufactured by TSMC.

Intel's taking a similar approach, but will call it something else.
it was engineered by TSMC not AMD
 
Joined
Oct 24, 2022
Messages
225 (0.29/day)
It looks like (another secret) agreement between Intel and AMD, dividing up which of the two gets which market.
 
Joined
May 3, 2019
Messages
2,120 (1.03/day)
System Name BigRed
Processor I7 12700k
Motherboard Asus Rog Strix z690-A WiFi D4
Cooling Noctua D15S chromax black/MX6
Memory TEAM GROUP 32GB DDR4 4000C16 B die
Video Card(s) MSI RTX 3080 Gaming Trio X 10GB
Storage M.2 drives WD SN850X 1TB 4x4 BOOT/WD SN850X 4TB 4x4 STEAM/USB3 4TB OTHER
Display(s) Dell s3422dwg 34" 3440x1440p 144hz ultrawide
Case Corsair 7000D
Audio Device(s) Logitech Z5450/KEF uniQ speakers/Bowers and Wilkins P7 Headphones
Power Supply Corsair RM850x 80% gold
Mouse Logitech G604 lightspeed wireless
Keyboard Logitech G915 TKL lightspeed wireless
Software Windows 10 Pro X64
Benchmark Scores Who cares
CPU is only important now because its AMD right?

U PC have something wrong, maybe slow Ram?
my second PC whit 9900K and 4090 there is no difference in 4k gaming VS my main system 7800X3D whit same GPU
even 1440p there is no big differences

But if i use GPU like 4060 then i will se difference Asap,not because CPU but because Slow GPU

AMD have good CPUs but in real world GPU is much more important.
Both Intel/Amd even whit older CPUs can do gaming just fine.

Ppls just hyped extra % they see in Bench 1080p+4090


it was engineered by TSMC not AMD

3d.jpg

TM says it all. TSMC invention licensed to AMD i guess for their use. I don't think it was originally for memory stacking was it? AMD just used it that way.


Can't wait to see how Intel does it, surely they can't copy, unless they get a secret license from TSMC to use it
 
Joined
Jun 10, 2014
Messages
2,987 (0.78/day)
Processor AMD Ryzen 9 5900X ||| Intel Core i7-3930K
Motherboard ASUS ProArt B550-CREATOR ||| Asus P9X79 WS
Cooling Noctua NH-U14S ||| Be Quiet Pure Rock
Memory Crucial 2 x 16 GB 3200 MHz ||| Corsair 8 x 8 GB 1333 MHz
Video Card(s) MSI GTX 1060 3GB ||| MSI GTX 680 4GB
Storage Samsung 970 PRO 512 GB + 1 TB ||| Intel 545s 512 GB + 256 GB
Display(s) Asus ROG Swift PG278QR 27" ||| Eizo EV2416W 24"
Case Fractal Design Define 7 XL x 2
Audio Device(s) Cambridge Audio DacMagic Plus
Power Supply Seasonic Focus PX-850 x 2
Mouse Razer Abyssus
Keyboard CM Storm QuickFire XT
Software Ubuntu
Just fyi simple CPU instructions usually last 1-4 cycles and more complex ones like AVX might be up to 20-60-100 cycles
That's just plainly wrong.
Most core AVX operations are within 1-5 cycles on recent architectures. Haswell and Skylake did a lot to improve AVX throughput, but there have been several improvements since then too. E.g. add operations are now down from 4 to 2 cycles on Alder Lake and Sapphire Rapids. Shift operations are down to a single cycle. This is as fast as single integer operations. And FYI, all floating point operations go through the vector units, whether it's single operation, SSE or AVX, the latency will be the same. ;)
 
Joined
Jul 24, 2024
Messages
263 (1.85/day)
System Name AM4_TimeKiller
Processor AMD Ryzen 5 5600X @ all-core 4.7 GHz
Motherboard ASUS ROG Strix B550-E Gaming
Cooling Arctic Freezer II 420 rev.7 (push-pull)
Memory G.Skill TridentZ RGB, 2x16 GB DDR4, B-Die, 3800 MHz @ CL14-15-14-29-43 1T, 53.2 ns
Video Card(s) ASRock Radeon RX 7800 XT Phantom Gaming
Storage Samsung 990 PRO 1 TB, Kingston KC3000 1 TB, Kingston KC3000 2 TB
Case Corsair 7000D Airflow
Audio Device(s) Creative Sound Blaster X-Fi Titanium
Power Supply Seasonic Prime TX-850
Mouse Logitech wireless mouse
Keyboard Logitech wireless keyboard
To put large cache tile onto CPU cores was idea of one person in AMD team.

TSMCs 3D technology was used to manufacture this idea and they decided to further improve the technology.

Don't mix general 3D manufacturing process with that tile of extra cache im X3D CPUs.
 
Last edited:
Joined
May 23, 2016
Messages
4 (0.00/day)
Location
Italy
Processor Intel Core i5-9600K
Motherboard Asus TUF Z390-Plus Gaming (Wi-Fi)
Memory 2x8GB Corsair Vengeance LPX CMK16GX4M2A2666C16
Video Card(s) Palit GeForce GTX 1060 Super JetStream 6GB
Storage Samsung SSD 970 PRO 512GB, Samsung SSD 870 EVO 500GB
Case Thermaltake CORE X31
Power Supply Seasonic SS-620GB
Mouse Logitech G403
Keyboard Logitech G213 Prodigy
Software Windows 10 Professional
I do not agree with that. Intel already had such a processor with extra "cache". i7-5775C


Again, the CPU includes 6MB of L3 cache and 128MB of eDRAM.


It's up to discussion. I see the 7800X3d Cache as 4th level one like the EDRAM cache of the i7-5775C
I always thought the eDRAM in those Intel processors was for the iGPU...
 
Joined
Oct 12, 2005
Messages
709 (0.10/day)
View attachment 371926
TM says it all. TSMC invention licensed to AMD i guess for their use. I don't think it was originally for memory stacking was it? AMD just used it that way.


Can't wait to see how Intel does it, surely they can't copy, unless they get a secret license from TSMC to use it
Well the chip are made by TSMC, so in really, it's not AMD Ryzen but TSMC Ryzen right ?


Yes the fabrication technologies was researched by TSMC and they are the one doing it. But guess what, this is normal as they are they are the one making those chip for AMD. AMD is not a fab.


But, AMD is the only one right now using that technologies because this isn't just a box you tick when you order TSMC some wafer. It's not "I would take X chips with more cache". You still have to design a chip that will be able to communicate with the cache chips, send power etc.

The physical portion of 3D Vcache is a TSMC technology. This is expected as AMD is fabless.
The logical portion of 3D Vcache is an AMD technology. This is expected as TSMC do not design chip.

In the end, it's a collaboration of both company.

Also, The added chip is indeed L3. There is no separate lookup for that chip when there is a check if it de data is in the L3 cache. The whole 96 MB is looked at the same time and there is no penalty for accessing data into the 3d vcache chip.
 
Top