• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Intel Plans to Copy AMD's 3D V-Cache Tech in 2025, Just Not for Desktops

Joined
Nov 4, 2023
Messages
101 (0.25/day)
It was based on AMD interposer technology for the first HBM stacks in 2015. That Intel also copied, and Nvidia.

View attachment 371864
Yes, AMD designed a chip around the packaging technology that is TSMCs, the 3d packaging technology is TSMC's. It is not AMD's.

The below is from my second link.

If you want TSMC's official link you can find it here: nhttps://3dfabric.tsmc.com/english/dedicatedFoundry/technology/3DFabric.htmt

In fact, here is TSMCs press release introducing the packaging, Introducing TSMC 3DFabric: TSMC’s Family of 3D Silicon Stacking, Advanced Packaging Technologies and Services - Taiwan Semiconductor Manufacturing Company Limited

1731773625175.png
 
Last edited:
Joined
Jul 21, 2016
Messages
102 (0.03/day)
That's just plainly wrong.
Most core AVX operations are within 1-5 cycles on recent architectures. Haswell and Skylake did a lot to improve AVX throughput, but there have been several improvements since then too. E.g. add operations are now down from 4 to 2 cycles on Alder Lake and Sapphire Rapids. Shift operations are down to a single cycle. This is as fast as single integer operations. And FYI, all floating point operations go through the vector units, whether it's single operation, SSE or AVX, the latency will be the same. ;)
Oh, I confused them with I guess x87 ones that take forever, but I was mostly talking about the complex AVX ones not just add/multiply, because I am pretty sure there are a few that take 20-40 cycles

Yes, AMD designed a chip around the packaging technology that is TSMCs, the 3d packaging technology is TSMC's. It is not AMD's.

The below is from my second link.

If you want TSMC's official link you can find it here: nhttps://3dfabric.tsmc.com/english/dedicatedFoundry/technology/3DFabric.htmt

In fact, here is TSMCs press release introducing the packaging, Introducing TSMC 3DFabric: TSMC’s Family of 3D Silicon Stacking, Advanced Packaging Technologies and Services - Taiwan Semiconductor Manufacturing Company Limited

View attachment 371976
Wasn't Micron with HBM the first one to release 3d stacking(4 stacks iirc)

With a product released in 2015 by AMD with packaging on interposer?
Same as what intel later calls "foveros" just with active Vs passive interposer
 
Joined
May 10, 2023
Messages
304 (0.52/day)
Location
Brazil
Processor 5950x
Motherboard B550 ProArt
Cooling Fuma 2
Memory 4x32GB 3200MHz Corsair LPX
Video Card(s) 2x RTX 3090
Display(s) LG 42" C2 4k OLED
Power Supply XPG Core Reactor 850W
Software I use Arch btw
You don't grasp the difference between L2 and L3 caches. L3 only contains data recently discarded by L2
L3 victim cache*
Bit pedantic given how almost all (if not all) CPUs use L3 as a victim cache, but I think it's important to explain that's what causes the behaviour you mentioned w.r.t. the L2<->L3 relationship.
 
Joined
Jul 31, 2024
Messages
412 (3.07/day)
U PC have something wrong, maybe slow Ram?
my second PC whit 9900K and 4090 there is no difference in 4k gaming VS my main system 7800X3D whit same GPU
even 1440p there is no big differences

That depends on your software.

And if it's human visible. Which is a big topic of discussion. Yes vs no.
I do not see those 45 FPS from my "freesync" ASUS PA278QV monitor with a RAdoen 7800XT in widows 11 pro.

--

I had for a few months in 2023 a Ryzen 3 3100 at the time I sold my ryzen 5800x and my b550 mainboard. For daily usage this cheap 30 € second hand bought and sold for 30€ cpu was totally fine. I did not saw any difference in gnu gentoo linux. The software is always compiling next to my pc usage time. It does not really matter if it takes more minutes in the background - the linux kernel handles the load quite well.

--

I see more the issue with badly designed compilers. There should be more optimisations for a processor. I think only a few packages uses the avx512 instruction set from my ryzen 7600x in gnu gentoo linux while compiling or while executing the software.
 

Space Lynx

Astronaut
Joined
Oct 17, 2014
Messages
17,392 (4.69/day)
Location
Kepler-186f
Processor 7800X3D -25 all core
Motherboard B650 Steel Legend
Cooling Frost Commander 140
Video Card(s) Merc 310 7900 XT @3100 core -.75v
Display(s) Agon 27" QD-OLED Glossy 240hz 1440p
Case NZXT H710 (Red/Black)
Audio Device(s) Asgard 2, Modi 3, HD58X
Power Supply Corsair RM850x Gold
It was based on AMD interposer technology for the first HBM stacks in 2015. That Intel also copied, and Nvidia.

View attachment 371864

I have never understood the point of RnD in a company, as you mention here, AMD invested in something, but then Intel and Nvidia just copied it, win win for them and saving on RnD costs, but if AMD tries to copy CUDA programming language, they will die... I will never understand that world. Sounds shady as hell though if you ask me.
 
Joined
Jul 15, 2006
Messages
1,307 (0.19/day)
Location
Noir York
Processor AMD Ryzen 7 5700G
Motherboard Gigabyte B450M S2H
Cooling Scythe Kotetsu Mark II
Memory 2 x 16GB SK Hynix CJR OEM DDR4-3200 @ 4000 20-22-20-48
Video Card(s) Colorful RTX 2060 SUPER 8GB GDDR6
Storage 250GB WD BLACK SN750 M.2 + 4TB WD Red Plus + 4TB WD Purple
Display(s) AOpen 27HC5R 27" 1080p 165Hz curved VA
Case AIGO Darkflash C285
Audio Device(s) Creative SoundBlaster Z + Kurtzweil KS-40A bookshelf / Sennheiser HD555
Power Supply Great Wall GW-EPS1000DA 1kW
Mouse Razer Deathadder Essential
Keyboard Cougar Attack2 Cherry MX Black
Software Windows 10 Pro x64 22H2
Joined
Jun 8, 2022
Messages
388 (0.42/day)
Location
Ohio, USA
System Name Trackstar
Processor AMD Ryzen 7 5800X3D -30 All Core CO (on Corsair XC5 block)
Motherboard Gigabyte B550 AORUS Elite V2 Rev 1.0 (F17 BIOS)
Cooling Corsair XD5 pump / Corsair XR5 1x 360mm (front) + 1x 420mm (top) rads
Memory 32GB G.Skill DDR4-3600 CL14 1:1 (F4-3600C14Q-32GVKA kit)
Video Card(s) ASRock RX 6950XT OC Formula (on Bykski A-AR6900XTOCF-X block)
Storage WD_BLACK SN850X 2TB w/HS (FW ver. 620361WD)
Display(s) Dell S3222DGM 32" 1440p/165Hz FreeSync
Case Fractal Design Meshify S2
Audio Device(s) Realtek ALC1200 Integrated Audio
Power Supply Super Flower Leadex Platinum SE 1200W on Liebert GXT4-1500RT120 UPS
Mouse Corsair Nightsword RGB
Keyboard Corsair K60 RGB PRO
VR HMD N/A
Software Windows 11 Pro 23H2 (Build 22631.3958)
Benchmark Scores https://www.3dmark.com/sw/1131940 https://www.3dmark.com/fs/29315810
I always thought the eDRAM in those Intel processors was for the iGPU...
It typically is but if you disable the iGPU in the BIOS the CPU cores get exclusive access to the eDRAM. I have heard it does shave off another ~2MB L3 cache in order to store the tags necessary to address the 128MB slice though. Funnily enough I've recently been messing around with an old i7-5775C.
 
Joined
Nov 4, 2005
Messages
11,994 (1.72/day)
System Name Compy 386
Processor 7800X3D
Motherboard Asus
Cooling Air for now.....
Memory 64 GB DDR5 6400Mhz
Video Card(s) 7900XTX 310 Merc
Storage Samsung 990 2TB, 2 SP 2TB SSDs, 24TB Enterprise drives
Display(s) 55" Samsung 4K HDR
Audio Device(s) ATI HDMI
Mouse Logitech MX518
Keyboard Razer
Software A lot.
Benchmark Scores Its fast. Enough.
I have never understood the point of RnD in a company, as you mention here, AMD invested in something, but then Intel and Nvidia just copied it, win win for them and saving on RnD costs, but if AMD tries to copy CUDA programming language, they will die... I will never understand that world. Sounds shady as hell though if you ask me.
Much like Intel pays AMD for X64, I would wager their engineering costs are covered by their royalties. How many years did AMD bleed money, they were propped up at least some by their competitors through fees
 
Joined
Jun 14, 2020
Messages
3,526 (2.15/day)
System Name Mean machine
Processor 12900k
Motherboard MSI Unify X
Cooling Noctua U12A
Memory 7600c34
Video Card(s) 4090 Gamerock oc
Storage 980 pro 2tb
Display(s) Samsung crg90
Case Fractal Torent
Audio Device(s) Hifiman Arya / a30 - d30 pro stack
Power Supply Be quiet dark power pro 1200
Mouse Viper ultimate
Keyboard Blackwidow 65%
What a shame! Intel's desktop lineup could really use such a boost.
To be fair, they don't. I checked TPU's latest review - a 13600k can deliver - on average - 2 to 3 times more frames at 720p than the 4090 can do at 4k. Depends on the game of course - but if you include all the games tested on the 9800x 3d review, we need much much much faster GPUs for the CPUs to play any important role. It's only 2 games that a faster CPU than the 13600k would matter but in both of those framerate was at 120 and above (Bg3 and starfield).
 
Joined
Mar 18, 2023
Messages
911 (1.43/day)
System Name Never trust a socket with less than 2000 pins
I have never understood the point of RnD in a company, as you mention here, AMD invested in something, but then Intel and Nvidia just copied it, win win for them and saving on RnD costs, but if AMD tries to copy CUDA programming language, they will die... I will never understand that world. Sounds shady as hell though if you ask me.

There is a lot of software written in CUDA. And the quality of the tools for CUDA ensure that this trend won't stop anytime soon.

AMD doesn't copy CUDA, they just want to be able to execute CUDA code.
 
Joined
Jul 5, 2013
Messages
28,101 (6.72/day)
To be fair, they don't.
Sure they do. Why wouldn't they? Are you kidding? The reason AMD currently has the gaming throne is directly because of the Ryzen CPUs with stacked 3D cache. Without it, well, where are the NON-3D cache AMD CPUs ranked? Right, there's the answer. If Intel were to employ something similar and get it right, their CPU line up would return rather handily to the top spot.
 

wolf

Better Than Native
Joined
May 7, 2007
Messages
8,202 (1.28/day)
System Name MightyX
Processor Ryzen 9800X3D
Motherboard Gigabyte X650I AX
Cooling Scythe Fuma 2
Memory 32GB DDR5 6000 CL30
Video Card(s) Asus TUF RTX3080 Deshrouded
Storage WD Black SN850X 2TB
Display(s) LG 42C2 4K OLED
Case Coolermaster NR200P
Audio Device(s) LG SN5Y / Focal Clear
Power Supply Corsair SF750 Platinum
Mouse Corsair Dark Core RBG Pro SE
Keyboard Glorious GMMK Compact w/pudding
VR HMD Meta Quest 3
Software case populated with Artic P12's
Benchmark Scores 4k120 OLED Gsync bliss
my second PC whit 9900K and 4090 there is no difference in 4k gaming VS my main system 7800X3D whit same GPU
This could be for a few reasons, like the games you play, or the FPS you're comfy with or targeting.

The 7800X3D is undeniably a much faster gaming CPU and will give you a significantly higher FPS ceiling (and better 1/0.1% lows) vs a 9900K @ 5GHZ. There absolutely can be the situation where both provide beyond the FPS you target in certain games, or being GPU limited in certain games, but that does nothing to tell you the true gaming performance of a CPU.

Excellent video covering it here. This literally has absolutely nothing to do with the brand of the CPU.
 
Joined
Dec 14, 2016
Messages
32 (0.01/day)
As someone that got the i7 - 5775C five years ago and the Ryzen 5800X3D two and a half years ago I still can't understand why there is even conversation about the big cache? Reminds me back in the days , is a 2 core CPU better than a higher clocked single core CPU , does HyperThreading actually do something. Why should I use a 64bit CPU? I mean the benchmarks are clear and here is nothing to argue about with them.

i7 5775C was a good CPU end of story. If someone doesn't know how to properly configure it or don't wanna bother just get yourself iMac this year don't bother and get the next iMac next year.

 
Joined
Jul 5, 2013
Messages
28,101 (6.72/day)
The 7800X3D is undeniably a much faster gaming CPU and will give you a significantly higher FPS ceiling (and better 1/0.1% lows) vs a 9900K @ 5GHZ.
But that's only for non-4k resolution. At 4k res, a 4090 is going to render the same general results with ANY CPU that doesn't bottleneck it, and that's big damn list. The 7800X3D only counts for resolutions under 4k. The benchmarks here at TPU and elsewhere bare that out.
 
Last edited:

wolf

Better Than Native
Joined
May 7, 2007
Messages
8,202 (1.28/day)
System Name MightyX
Processor Ryzen 9800X3D
Motherboard Gigabyte X650I AX
Cooling Scythe Fuma 2
Memory 32GB DDR5 6000 CL30
Video Card(s) Asus TUF RTX3080 Deshrouded
Storage WD Black SN850X 2TB
Display(s) LG 42C2 4K OLED
Case Coolermaster NR200P
Audio Device(s) LG SN5Y / Focal Clear
Power Supply Corsair SF750 Platinum
Mouse Corsair Dark Core RBG Pro SE
Keyboard Glorious GMMK Compact w/pudding
VR HMD Meta Quest 3
Software case populated with Artic P12's
Benchmark Scores 4k120 OLED Gsync bliss
But that's only for non-4k resolution. At that res, a 4090 is going to render the same general results with ANY CPU that doesn't bottleneck it, and that's big damn list. The 7800X3D only counts for resolutions under 4k. The benchmark here at TPU and elsewhere bare that out.
The video makes some great points however. Say you're GPU limited at 70 FPS on all ultra settings with a 4090 in a given game, and the 9900K can give you 80 fps, then sure you're all good, but what if you want 120fps and are willing to lower settings/use upscaling etc to get there? the 9900K will make your FPS ceiling 80 FPS and no amount of settings lowering, reducing resolution or using upscaling will improve that. It heavily depends on the game and user's preferences, but it absolutely applies to 4k too if you want to game at higher fps. For someone happy with 30/40/60 it matters a fair bit less for sure, but in many games I'm absolutely willing to lower visuals to get the balance of visuals and FPS to my taste, and a CPU absolutely can matter there.

I also recall W1zzards 4090 vs 53 games on a 5800x vs 5800X3D had some games showing differences at 4k, very much a large dose of 'it depends' on this one, but I do find the math/science of it undeniable, it more so a case of if that matters to the individual or not.
 
Joined
Jul 30, 2019
Messages
3,315 (1.69/day)
System Name Still not a thread ripper but pretty good.
Processor Ryzen 9 7950x, Thermal Grizzly AM5 Offset Mounting Kit, Thermal Grizzly Extreme Paste
Motherboard ASRock B650 LiveMixer (BIOS/UEFI version P3.08, AGESA 1.2.0.2)
Cooling EK-Quantum Velocity, EK-Quantum Reflection PC-O11, D5 PWM, EK-CoolStream PE 360, XSPC TX360
Memory Micron DDR5-5600 ECC Unbuffered Memory (2 sticks, 64GB, MTC20C2085S1EC56BD1) + JONSBO NF-1
Video Card(s) XFX Radeon RX 5700 & EK-Quantum Vector Radeon RX 5700 +XT & Backplate
Storage Samsung 4TB 980 PRO, 2 x Optane 905p 1.5TB (striped), AMD Radeon RAMDisk
Display(s) 2 x 4K LG 27UL600-W (and HUANUO Dual Monitor Mount)
Case Lian Li PC-O11 Dynamic Black (original model)
Audio Device(s) Corsair Commander Pro for Fans, RGB, & Temp Sensors (x4)
Power Supply Corsair RM750x
Mouse Logitech M575
Keyboard Corsair Strafe RGB MK.2
Software Windows 10 Professional (64bit)
Benchmark Scores RIP Ryzen 9 5950x, ASRock X570 Taichi (v1.06), 128GB Micron DDR4-3200 ECC UDIMM (18ASF4G72AZ-3G2F1)
Joined
Mar 23, 2005
Messages
4,089 (0.57/day)
Location
Ancient Greece, Acropolis (Time Lord)
System Name RiseZEN Gaming PC
Processor AMD Ryzen 7 5800X @ Auto
Motherboard Asus ROG Strix X570-E Gaming ATX Motherboard
Cooling Corsair H115i Elite Capellix AIO, 280mm Radiator, Dual RGB 140mm ML Series PWM Fans
Memory G.Skill TridentZ 64GB (4 x 16GB) DDR4 3200
Video Card(s) ASUS DUAL RX 6700 XT DUAL-RX6700XT-12G
Storage Corsair Force MP500 480GB M.2 & MP510 480GB M.2 - 2 x WD_BLACK 1TB SN850X NVMe 1TB
Display(s) ASUS ROG Strix 34” XG349C 144Hz 1440p + Asus ROG 27" MG278Q 144Hz WQHD 1440p
Case Corsair Obsidian Series 450D Gaming Case
Audio Device(s) SteelSeries 5Hv2 w/ Sound Blaster Z SE
Power Supply Corsair RM750x Power Supply
Mouse Razer Death-Adder + Viper 8K HZ Ambidextrous Gaming Mouse - Ergonomic Left Hand Edition
Keyboard Logitech G910 Orion Spectrum RGB Gaming Keyboard
Software Windows 11 Pro - 64-Bit Edition
Benchmark Scores I'm the Doctor, Doctor Who. The Definition of Gaming is PC Gaming...
Intel's been copying AMDs tech for decades, nothing new here people. :laugh:

Sure they do. Why wouldn't they? Are you kidding? The reason AMD currently has the gaming throne is directly because of the Ryzen CPUs with stacked 3D cache. Without it, well, where are the NON-3D cache AMD CPUs ranked? Right, there's the answer. If Intel were to employ something similar and get it right, their CPU line up would return rather handily to the top spot.
Not everybody is looking for the 3DX CPUs, though they do provide the best gaming.
Top 15 best sellers on Amazon has 2 x ZENs with 3DX the rest are all Ryzen's and 2 Intel CPUs.
 
Joined
May 3, 2019
Messages
2,119 (1.03/day)
System Name BigRed
Processor I7 12700k
Motherboard Asus Rog Strix z690-A WiFi D4
Cooling Noctua D15S chromax black/MX6
Memory TEAM GROUP 32GB DDR4 4000C16 B die
Video Card(s) MSI RTX 3080 Gaming Trio X 10GB
Storage M.2 drives WD SN850X 1TB 4x4 BOOT/WD SN850X 4TB 4x4 STEAM/USB3 4TB OTHER
Display(s) Dell s3422dwg 34" 3440x1440p 144hz ultrawide
Case Corsair 7000D
Audio Device(s) Logitech Z5450/KEF uniQ speakers/Bowers and Wilkins P7 Headphones
Power Supply Corsair RM850x 80% gold
Mouse Logitech G604 lightspeed wireless
Keyboard Logitech G915 TKL lightspeed wireless
Software Windows 10 Pro X64
Benchmark Scores Who cares
Intel's been copying AMDs tech for decades, nothing new here people. :laugh:


Not everybody is looking for the 3DX CPUs, though they do provide the best gaming.
Top 15 best sellers on Amazon has 2 x ZENs with 3DX the rest are all Ryzen's and 2 Intel CPUs.

Really, so why did Intel thrash AMD for over 10 years then?
 
Joined
Jun 14, 2020
Messages
3,526 (2.15/day)
System Name Mean machine
Processor 12900k
Motherboard MSI Unify X
Cooling Noctua U12A
Memory 7600c34
Video Card(s) 4090 Gamerock oc
Storage 980 pro 2tb
Display(s) Samsung crg90
Case Fractal Torent
Audio Device(s) Hifiman Arya / a30 - d30 pro stack
Power Supply Be quiet dark power pro 1200
Mouse Viper ultimate
Keyboard Blackwidow 65%
Really, so why did Intel thrash AMD for over 10 years then?
They didn't manage to reverse engineer Bulldozer, so they couldn't copy it, that's why.
 
Joined
Jul 24, 2024
Messages
263 (1.87/day)
System Name AM4_TimeKiller
Processor AMD Ryzen 5 5600X @ all-core 4.7 GHz
Motherboard ASUS ROG Strix B550-E Gaming
Cooling Arctic Freezer II 420 rev.7 (push-pull)
Memory G.Skill TridentZ RGB, 2x16 GB DDR4, B-Die, 3800 MHz @ CL14-15-14-29-43 1T, 53.2 ns
Video Card(s) ASRock Radeon RX 7800 XT Phantom Gaming
Storage Samsung 990 PRO 1 TB, Kingston KC3000 1 TB, Kingston KC3000 2 TB
Case Corsair 7000D Airflow
Audio Device(s) Creative Sound Blaster X-Fi Titanium
Power Supply Seasonic Prime TX-850
Mouse Logitech wireless mouse
Keyboard Logitech wireless keyboard
A fact that X3D cache improves performance only in games is busted with Zen 5.
You can see improvements in any workloads that relies heavily on memory, e.g. rendering.
 
Joined
May 3, 2019
Messages
2,119 (1.03/day)
System Name BigRed
Processor I7 12700k
Motherboard Asus Rog Strix z690-A WiFi D4
Cooling Noctua D15S chromax black/MX6
Memory TEAM GROUP 32GB DDR4 4000C16 B die
Video Card(s) MSI RTX 3080 Gaming Trio X 10GB
Storage M.2 drives WD SN850X 1TB 4x4 BOOT/WD SN850X 4TB 4x4 STEAM/USB3 4TB OTHER
Display(s) Dell s3422dwg 34" 3440x1440p 144hz ultrawide
Case Corsair 7000D
Audio Device(s) Logitech Z5450/KEF uniQ speakers/Bowers and Wilkins P7 Headphones
Power Supply Corsair RM850x 80% gold
Mouse Logitech G604 lightspeed wireless
Keyboard Logitech G915 TKL lightspeed wireless
Software Windows 10 Pro X64
Benchmark Scores Who cares
They didn't manage to reverse engineer Bulldozer, so they couldn't copy it, that's why.

Bulldozer wasn't worth reverse engineering

Anyway this is OT so lets just stop this ok
 
Joined
Sep 20, 2007
Messages
212 (0.03/day)
Location
SANTIAGO - CHILE
System Name pote
Processor Amd 7800x3d
Motherboard Gigabyte b650m aorus elite ax
Cooling Thermalright peerless assasin 120 se
Memory 32gb lexar 6000 cl32
Video Card(s) AMD XFX 580 8GB
Storage Lexar 2tb nvme gen4
Display(s) asus ips 24inch 1080p, LG OLED B9 4K120
Case Antec CRAP
Audio Device(s) REALTEK CRAP
Power Supply corsair 550w vx550
Mouse microsoft PROINTELLIMOUSE
Keyboard microsoft wireless
Software win11 PRO x64
We need moar cache to be the new moar cores and moar ghz
 
Joined
May 10, 2023
Messages
304 (0.52/day)
Location
Brazil
Processor 5950x
Motherboard B550 ProArt
Cooling Fuma 2
Memory 4x32GB 3200MHz Corsair LPX
Video Card(s) 2x RTX 3090
Display(s) LG 42" C2 4k OLED
Power Supply XPG Core Reactor 850W
Software I use Arch btw
A fact that X3D cache improves performance only in games is busted with Zen 5.
You can see improvements in any workloads that relies heavily on memory, e.g. rendering.
Was it? Afaik performance is pretty much the same with or without the extra cache for most of those. The difference between a 9800x3D and a 9700x in those tasks was mostly due to the V-cache model having a higher power limit and clocking a bit higher due to that.
Rendering is one of the cases were the extra cache does no difference, as far as I've seen.

Only cases the extra cache is really worth it is for CFD/HPC stuff, and some other specific database workloads.
 
Joined
Jun 10, 2014
Messages
2,987 (0.78/day)
Processor AMD Ryzen 9 5900X ||| Intel Core i7-3930K
Motherboard ASUS ProArt B550-CREATOR ||| Asus P9X79 WS
Cooling Noctua NH-U14S ||| Be Quiet Pure Rock
Memory Crucial 2 x 16 GB 3200 MHz ||| Corsair 8 x 8 GB 1333 MHz
Video Card(s) MSI GTX 1060 3GB ||| MSI GTX 680 4GB
Storage Samsung 970 PRO 512 GB + 1 TB ||| Intel 545s 512 GB + 256 GB
Display(s) Asus ROG Swift PG278QR 27" ||| Eizo EV2416W 24"
Case Fractal Design Define 7 XL x 2
Audio Device(s) Cambridge Audio DacMagic Plus
Power Supply Seasonic Focus PX-850 x 2
Mouse Razer Abyssus
Keyboard CM Storm QuickFire XT
Software Ubuntu
Oh, I confused them with I guess x87 ones that take forever, but I was mostly talking about the complex AVX ones not just add/multiply, because I am pretty sure there are a few that take 20-40 cycles
There are many instructions that are very slow, although they are usually a tiny fraction of the workload, if at all.
Interestingly enough, Ice Lake/Rocket Lake brought the legacy FMUL down from 5 to 4 cycles, as well as integer division(IDIV) from 97 cycles to 18 cycles.

For comparison, Intel's current CPUs have 4 cycles for multiplication, 11 cycles for division of fp32 using AVX, and 5 cycles for integer multiplication using AVX. (official spec)

As for "worst case" performers of legacy x87 instructions: examples are FSQRT(square root) at 14-21 cycles, sin/cos/tan ~50-160 cycles and the most complex; FBSTP at 264, but this one is probably not very useful today. FDIV is 14-16 cycles (so slightly slower than its AVX counterpart). And for comparison, in Zen 4, legacy x87 instructions seems to be overall lower latency than Intel. All of these figures are from agner.org and are benchmarked, so a grain of salt, but they are probably good approximations.

Many think "legacy" instructions are holding back the performance of modern x86 CPUs, but that's not true. Since the mid 90s, they've all translated the x86 ISA to their own specific micro-operations, and this is also how they support x87/MMX/SSE/AVX through the same execution ports; the legacy instructions are translated to micro-ops anyways. This allows them to design the CPUs to be as efficient as possible with the new features, yet support the old ones. If the older ones happens to have worse latency, it's usually not an issue, as applications that rely on those are probably very old. One thing of note is that x87 instructions are rounding off differently than normal IEEE 754 fp32/fp64 does.

L3 victim cache*
Bit pedantic given how almost all (if not all) CPUs use L3 as a victim cache, but I think it's important to explain that's what causes the behaviour you mentioned w.r.t. the L2<->L3 relationship.
It's not pedantic at all, you missed the point.
The prefetcher only feeds the L2, not the L3, so anything in L3 must first be prefetched into L2 then eventually evicted to L3, where it remains for a short window before being evicted there too. Adding a lot of extra L3 only means the "garbage dump" will be larger, while adding just a tiny bit more of L2 would allow the prefetcher to work differently. In other words; a larger L3 doesn't mean you can prefetch a lot more, it just means that the data you've already prefetched anyways stays a little longer.
Secondly, as I said, the stream of data flowing through L3 is all coming from memory->L2, so the overall bandwidth here is limited by memory, even though the tiny bit you read back will have higher burst speed.

Software that will be more demanding in the coming years will be more computationally intensive, so over the long term the faster CPUs will be favored over those with more L3 cache. Those that are very L3 sensitive will remain outliers.
 

Aquinus

Resident Wat-man
Joined
Jan 28, 2012
Messages
13,171 (2.80/day)
Location
Concord, NH, USA
System Name Apollo
Processor Intel Core i9 9880H
Motherboard Some proprietary Apple thing.
Memory 64GB DDR4-2667
Video Card(s) AMD Radeon Pro 5600M, 8GB HBM2
Storage 1TB Apple NVMe, 4TB External
Display(s) Laptop @ 3072x1920 + 2x LG 5k Ultrafine TB3 displays
Case MacBook Pro (16", 2019)
Audio Device(s) AirPods Pro, Sennheiser HD 380s w/ FIIO Alpen 2, or Logitech 2.1 Speakers
Power Supply 96w Power Adapter
Mouse Logitech MX Master 3
Keyboard Logitech G915, GL Clicky
Software MacOS 12.1
In other words; a larger L3 doesn't mean you can prefetch a lot more, it just means that the data you've already prefetched anyways stays a little longer.
First, excellent explanation as a whole. Second, I think the quote above is a key point and likely the biggest reason why a larger L3 cache shows improvement in some workloads. Keeping data resident in cache longer only helps hit rates assuming latency remains constant. I'm sure there is a tipping point where if you have a relatively tight loop where if you have too much data in cache that you'll be evicting things before the loop starts from the beginning again where a little more cache might get over that hurdle. An example of this might be a tight rendering loop in a game.
 
Top