• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Intel Adds More L3 Cache to Its Tiger Lake CPUs

Joined
Jan 3, 2015
Messages
3,034 (0.83/day)
System Name The beast and the little runt.
Processor Ryzen 5 5600X - Ryzen 9 5950X
Motherboard ASUS ROG STRIX B550-I GAMING - ASUS ROG Crosshair VIII Dark Hero X570
Cooling Noctua NH-L9x65 SE-AM4a - NH-D15 chromax.black with IPPC Industrial 3000 RPM 120/140 MM fans.
Memory G.SKILL TRIDENT Z ROYAL GOLD/SILVER 32 GB (2 x 16 GB and 4 x 8 GB) 3600 MHz CL14-15-15-35 1.45 volts
Video Card(s) GIGABYTE RTX 4060 OC LOW PROFILE - GIGABYTE RTX 4090 GAMING OC
Storage Samsung 980 PRO 1 TB + 2 TB - Samsung 870 EVO 4 TB - 2 x WD RED PRO 16 GB + WD ULTRASTAR 22 TB
Display(s) Asus 27" TUF VG27AQL1A and a Dell 24" for dual setup
Case Phanteks Enthoo 719/LUXE 2 BLACK
Audio Device(s) Onboard on both boards
Power Supply Phanteks Revolt X 1200W
Mouse Logitech G903 Lightspeed Wireless Gaming Mouse
Keyboard Logitech G910 Orion Spectrum
Software WINDOWS 10 PRO 64 BITS on both systems
Benchmark Scores Se more about my 2 in 1 system here: kortlink.dk/2ca4x
you are confused. The core 2 line has NO L3 cache! The 12MB of cache was entirely L2 cache.

There is still a sizeable market for quad core CPUs. Not everyone needs 8 cores. Unless you are playing certian games, even 6 cores has little tangible benefit outside of the creative market. Until that market decides it suddenyl needs more power, quad cores will still sell well,a dn will still be intel's consumer bread and butter.

Oh darn you Are right about core 2 quad and only l2 cashe.

But I dissagreed about quad-core for gaming, specially if its with out HT/SMT. games these days needs at least 8 threads or cores to run properly. Many reports stutter in new games running quad-cores and certain if the CPU only has 4 threads as well. More and more games Are getting optimized for 6 cores and some games Even benefit from 12 threads.

I would never reccoment or Buy a quad-core CPU for gaming today. Caretainly if you have a powerful gpu and/or want as many fps as possible.

 
Joined
Nov 13, 2007
Messages
10,895 (1.74/day)
Location
Austin Texas
System Name stress-less
Processor 9800X3D @ 5.42GHZ
Motherboard MSI PRO B650M-A Wifi
Cooling Thermalright Phantom Spirit EVO
Memory 64GB DDR5 6400 1:1 CL30-36-36-76 FCLK 2200
Video Card(s) RTX 4090 FE
Storage 2TB WD SN850, 4TB WD SN850X
Display(s) Alienware 32" 4k 240hz OLED
Case Jonsbo Z20
Audio Device(s) Yes
Power Supply Corsair SF750
Mouse DeathadderV2 X Hyperspeed
Keyboard 65% HE Keyboard
Software Windows 11
Benchmark Scores They're pretty good, nothing crazy.
I mean if AMD can't compete single core, they have no reason to push themselves faster do they? Even Ryzen 3000 falls behind in single in a lot of applications, so my guess is Intel knows Ryzen 4000 might be a 5-8% IPC gain over 3000 (if AMD is lucky), and they probably have already done the math knowing they can beat that or tie it on 10nm 4.5ghz. /shrug

Competition is the only way to make bigger gains, and Intel still has none for single core (not to mention you can't really OC ryzen 3000 and Intel chips OC like a beast even on big air heatsinks like Noctua) further widening the gap.

Yeah - it's one of the reasons i went with 8700k for now vs 3600. I think the 4 series if they can get clocks up will be on better than current intel stock. So if Intel has a hard time beating their old SC @ 5ghz, they will be in a tough spot.
 
Joined
May 5, 2016
Messages
98 (0.03/day)
Not only is the silicon costs higher but there is also a power penalty with using increased amounts of on-die cache memory.

Having said that I think the world is ready to move beyond 2MB/core for mobile devices.
 
Joined
Nov 13, 2007
Messages
10,895 (1.74/day)
Location
Austin Texas
System Name stress-less
Processor 9800X3D @ 5.42GHZ
Motherboard MSI PRO B650M-A Wifi
Cooling Thermalright Phantom Spirit EVO
Memory 64GB DDR5 6400 1:1 CL30-36-36-76 FCLK 2200
Video Card(s) RTX 4090 FE
Storage 2TB WD SN850, 4TB WD SN850X
Display(s) Alienware 32" 4k 240hz OLED
Case Jonsbo Z20
Audio Device(s) Yes
Power Supply Corsair SF750
Mouse DeathadderV2 X Hyperspeed
Keyboard 65% HE Keyboard
Software Windows 11
Benchmark Scores They're pretty good, nothing crazy.
what ever happened to the broadwell design witht the l4 cache? That seemed to be useful in gaming and other certain applications.
 
Joined
Nov 4, 2005
Messages
12,048 (1.72/day)
System Name Compy 386
Processor 7800X3D
Motherboard Asus
Cooling Air for now.....
Memory 64 GB DDR5 6400Mhz
Video Card(s) 7900XTX 310 Merc
Storage Samsung 990 2TB, 2 SP 2TB SSDs, 24TB Enterprise drives
Display(s) 55" Samsung 4K HDR
Audio Device(s) ATI HDMI
Mouse Logitech MX518
Keyboard Razer
Software A lot.
Benchmark Scores Its fast. Enough.
my 8350 runs 5GHz on Air, not exotic either.
On what process node? Higher latency (deep in order pipelines) is masked with higher frequency, which pipeline stalls when it has to be flushed cause out of order or speculative execution and branch prediction to fail, meaning less IPC.

Architecture designed for a process node is best, which is why trading cache for TDP is a good value on smaller boxes that don't run at as high of frequency.

Intel can't get the frequency they want, so are trading it for more TDP in cache to increase IPC.
 

eidairaman1

The Exiled Airman
Joined
Jul 2, 2007
Messages
43,045 (6.72/day)
Location
Republic of Texas (True Patriot)
System Name PCGOD
Processor AMD FX 8350@ 5.0GHz
Motherboard Asus TUF 990FX Sabertooth R2 2901 Bios
Cooling Scythe Ashura, 2×BitFenix 230mm Spectre Pro LED (Blue,Green), 2x BitFenix 140mm Spectre Pro LED
Memory 16 GB Gskill Ripjaws X 2133 (2400 OC, 10-10-12-20-20, 1T, 1.65V)
Video Card(s) AMD Radeon 290 Sapphire Vapor-X
Storage Samsung 840 Pro 256GB, WD Velociraptor 1TB
Display(s) NEC Multisync LCD 1700V (Display Port Adapter)
Case AeroCool Xpredator Evil Blue Edition
Audio Device(s) Creative Labs Sound Blaster ZxR
Power Supply Seasonic 1250 XM2 Series (XP3)
Mouse Roccat Kone XTD
Keyboard Roccat Ryos MK Pro
Software Windows 7 Pro 64
On what process node? Higher latency (deep in order pipelines) is masked with higher frequency, which pipeline stalls when it has to be flushed cause out of order or speculative execution and branch prediction to fail, meaning less IPC.

Architecture designed for a process node is best, which is why trading cache for TDP is a good value on smaller boxes that don't run at as high of frequency.

Intel can't get the frequency they want, so are trading it for more TDP in cache to increase IPC.

Fx8350.
 

Aquinus

Resident Wat-man
Joined
Jan 28, 2012
Messages
13,173 (2.78/day)
Location
Concord, NH, USA
System Name Apollo
Processor Intel Core i9 9880H
Motherboard Some proprietary Apple thing.
Memory 64GB DDR4-2667
Video Card(s) AMD Radeon Pro 5600M, 8GB HBM2
Storage 1TB Apple NVMe, 4TB External
Display(s) Laptop @ 3072x1920 + 2x LG 5k Ultrafine TB3 displays
Case MacBook Pro (16", 2019)
Audio Device(s) AirPods Pro, Sennheiser HD 380s w/ FIIO Alpen 2, or Logitech 2.1 Speakers
Power Supply 96w Power Adapter
Mouse Logitech MX Master 3
Keyboard Logitech G915, GL Clicky
Software MacOS 12.1
You know, you need more cache to help try and cancel out the cost of memory latency inherent in MCM designs. Speculation is dangerous though.
 
Joined
Sep 10, 2019
Messages
94 (0.05/day)
I mean if AMD can't compete single core, they have no reason to push themselves faster do they? Even Ryzen 3000 falls behind in single in a lot of applications, so my guess is Intel knows Ryzen 4000 might be a 5-8% IPC gain over 3000 (if AMD is lucky), and they probably have already done the math knowing they can beat that or tie it on 10nm 4.5ghz. /shrug

Competition is the only way to make bigger gains, and Intel still has none for single core (not to mention you can't really OC ryzen 3000 and Intel chips OC like a beast even on big air heatsinks like Noctua) further widening the gap.

Intel can’t compete with AMD in IPC. They’ve lost that battle. The reason you sometimes see higher Single Thread results is because those intel cpus run at higher frequencies and the software running is probably intel optimized.

I can’t believe it either!
 
Joined
Jun 10, 2014
Messages
3,010 (0.78/day)
Processor AMD Ryzen 9 5900X ||| Intel Core i7-3930K
Motherboard ASUS ProArt B550-CREATOR ||| Asus P9X79 WS
Cooling Noctua NH-U14S ||| Be Quiet Pure Rock
Memory Crucial 2 x 16 GB 3200 MHz ||| Corsair 8 x 8 GB 1333 MHz
Video Card(s) MSI GTX 1060 3GB ||| MSI GTX 680 4GB
Storage Samsung 970 PRO 512 GB + 1 TB ||| Intel 545s 512 GB + 256 GB
Display(s) Asus ROG Swift PG278QR 27" ||| Eizo EV2416W 24"
Case Fractal Design Define 7 XL x 2
Audio Device(s) Cambridge Audio DacMagic Plus
Power Supply Seasonic Focus PX-850 x 2
Mouse Razer Abyssus
Keyboard CM Storm QuickFire XT
Software Ubuntu
pushing L3 cache? for me looks like a tweak to make it looks a little bit better
Increasing the L3 cache will only give marginal gains (except for edge cases), so I suspect this is partially a marketing decision.

L3 cache is a "spillover" cache, which basically means it's data "discarded" by the memory controller because it didn't fit L2 any more. While the L3 has the advantage of being accessible across CPU cores, the Skylake family (excluding Skylake-X/-SP) does this interesting thing where the L3 cache is inclusive, meaning L3 cache contains a duplicate of the core's L2 just in case another core wants to access it (which is mostly wasted), meaning Skylake family chips effectively have much less L3 cache than you might think.

Speaking of usefulness, each cache line in L2 is obviously many times more useful than each cache line in L3. L2 cache is where the data is prefetched into, while L3 is data recently discarded from L3. More L2 cache seems like an obvious benefit, but L2 is more "costly" for certain reasons, not only because it needs more transistors per capacity, but also because it's more closely connected to the pipeline, the front-end and is very timing sensitive. This is why it's relatively easy to throw in extra L3 into an existing design, while changing L2 requires a redesign.

Sooner or later more (or smarter) L2 cache will be needed to be able to feed the multiple execution ports and SIMD units in the cores. I would love to see CPU designs with way more L2 cache, like 1 MB or even 2 MB, but even with node shrinks it will get challenging to way beyond that. I would argue that it may be time to split L2 and possible even L3 into separate instruction and data caches. This would allow more flexible placement on the die, plus with the "shared" L3 cache it's only the instruction cache that is really shared in practice.

what ever happened to the broadwell design witht the l4 cache? That seemed to be useful in gaming and other certain applications.
I believe it was mostly used by the integrated graphics.
The problem with L4 is generally the same as the problem with L3, just worse; it's a spillover cache, which means it's only useful when it contains discarded cache from the last few thousand clock cycles. The cache discards the least recently used data in each cache bank, there is no prioritization beyond that, which means that you may need extreme amounts of L4 cache to make a significant difference across different workloads.

If L4 data and instruction caches were separate though (read my paragraph above), I would imagine that just a few MB of it could be useful, as data flows through at the rate of GBs per ms, while instructions will usually jump back and forth within "relatively few" MB.

Intel can’t compete with AMD in IPC. They’ve lost that battle. The reason you sometimes see higher Single Thread results is because those intel cpus run at higher frequencies and the software running is probably intel optimized.
Nope, Intel still have the lead in IPC, while AMD manages better multicore clock scaling and individual boosting, plus they have the extra burst boost speed of XFR on top of regular boost.
Software isn't "Intel optimized". This BS needs to end now. They use the same ISA, we don't have access to use their microoperations, so there is no real way to optimize for it, even if we wanted.
 
Joined
Mar 16, 2017
Messages
2,184 (0.76/day)
Location
Tanagra
System Name Budget Box
Processor Xeon E5-2667v2
Motherboard ASUS P9X79 Pro
Cooling Some cheap tower cooler, I dunno
Memory 32GB 1866-DDR3 ECC
Video Card(s) XFX RX 5600XT
Storage WD NVME 1GB
Display(s) ASUS Pro Art 27"
Case Antec P7 Neo
More cache could act as a thermal buffer on these smaller nodes. The chips can only get so small before you run out of surface area to dissipate heat. Cache could be a fairly simple way to add size without adding more of the heat-producing transistors. It’s also easy to segment off to increase yields. Oh, and you can get a small IPC uplift as well.
 
Joined
Nov 4, 2005
Messages
12,048 (1.72/day)
System Name Compy 386
Processor 7800X3D
Motherboard Asus
Cooling Air for now.....
Memory 64 GB DDR5 6400Mhz
Video Card(s) 7900XTX 310 Merc
Storage Samsung 990 2TB, 2 SP 2TB SSDs, 24TB Enterprise drives
Display(s) 55" Samsung 4K HDR
Audio Device(s) ATI HDMI
Mouse Logitech MX518
Keyboard Razer
Software A lot.
Benchmark Scores Its fast. Enough.
More cache could act as a thermal buffer on these smaller nodes. The chips can only get so small before you run out of surface area to dissipate heat. Cache could be a fairly simple way to add size without adding more of the heat-producing transistors. It’s also easy to segment off to increase yields. Oh, and you can get a small IPC uplift as well.


Cache is the primary power consumption as it requires constant refreshes to keep the data valid. After prediction and caches it's up to the code quality to determine how efficient it is, which is why latency matters, and why AMD Bulldozer was high frequency but couldn't keep up with Intel.
 
Joined
Aug 17, 2017
Messages
274 (0.10/day)
Finally close to release, THIS is the CPU I have been waiting for, for many years... I hope I live long enough to be able to buy two (desk top and in a laptop)
 
Joined
Sep 10, 2019
Messages
94 (0.05/day)
Nope, Intel still have the lead in IPC, while AMD manages better multicore clock scaling and individual boosting, plus they have the extra burst boost speed of XFR on top of regular boost.

Have a look here:

And Here: https://www.anandtech.com/show/14605/the-and-ryzen-3700x-3900x-review-raising-the-bar

132157



Software isn't "Intel optimized". This BS needs to end now. They use the same ISA, we don't have access to use their microoperations, so there is no real way to optimize for it, even if we wanted.

Have a look here: https://software.intel.com/en-us/ipp

And here: https://www.amazon.com/Optimizing-Applications-Multi-Core-Processors-Performance/dp/1934053015/ref=sr_1_1?keywords=Optimizing+Applications+for+Multi-Core+Processors,+Using+the+Intel+Integrated+Performance+Primitives&qid=1568800003&s=gateway&sr=8-1

I'm seriously considering switching from my trusty 4770K to a 3950X for my main system. Hell has frozen over...
 
Joined
Aug 13, 2009
Messages
3,254 (0.58/day)
Location
Czech republic
Processor Ryzen 5800X
Motherboard Asus TUF-Gaming B550-Plus
Cooling Noctua NH-U14S
Memory 32GB G.Skill Trident Z Neo F4-3600C16D-32GTZNC
Video Card(s) AMD Radeon RX 6600
Storage HP EX950 512GB + Samsung 970 PRO 1TB
Display(s) HP Z Display Z24i G2
Case Fractal Design Define R6 Black
Audio Device(s) Creative Sound Blaster AE-5
Power Supply Seasonic PRIME Ultra 650W Gold
Mouse Roccat Kone AIMO Remastered
Software Windows 10 x64
So Tiger Lake is a successor to Ice Lake CPUs that don't exist either.
Yes, I get it.
Dafuq Intel.
 
Joined
Apr 16, 2019
Messages
632 (0.30/day)
I mean if AMD can't compete single core, they have no reason to push themselves faster do they? Even Ryzen 3000 falls behind in single in a lot of applications, so my guess is Intel knows Ryzen 4000 might be a 5-8% IPC gain over 3000 (if AMD is lucky), and they probably have already done the math knowing they can beat that or tie it on 10nm 4.5ghz. /shrug

Competition is the only way to make bigger gains, and Intel still has none for single core (not to mention you can't really OC ryzen 3000 and Intel chips OC like a beast even on big air heatsinks like Noctua) further widening the gap.
Precisely and at least when you also OC Intel's chips, they lead in all single thread heavy applications, full stop. Despite what hordes of ardent AMD fan(boy)s all over the internet would have you believe, the red team is still the one doing catch-up.
 

nurion

New Member
Joined
Jun 30, 2019
Messages
4 (0.00/day)
Holy sh*t, Intel must be really desperate to sacrifice that much silicon real estate to more cache in a bid to catch up with AMD.

Larger silicon die will seriously cut in to Intel's profits, at this point Intel is desperate when they realized that 10nm is not going to save them from AMD's 7nm+ EUV.

Only thing Intel can do now is continue lying and using inaccurate data in the press to try holding back AMD from cutting in to the big market share they have in notebooks but rest assured that AMD is coming for that too in a big way next year.

i dont think we ever gonna see 10nm desktop parts ,more than 4 cores.
Intel though having so many fabs ,goes to Samsung to help them out because the 10nm node
isn't returning what expected numcores-yields-freq-voltages..besides technical difficulties of manufacturing.
so i think Intel goes for the 7nm in mid'20 to '21.
 
Joined
Jun 10, 2014
Messages
3,010 (0.78/day)
Processor AMD Ryzen 9 5900X ||| Intel Core i7-3930K
Motherboard ASUS ProArt B550-CREATOR ||| Asus P9X79 WS
Cooling Noctua NH-U14S ||| Be Quiet Pure Rock
Memory Crucial 2 x 16 GB 3200 MHz ||| Corsair 8 x 8 GB 1333 MHz
Video Card(s) MSI GTX 1060 3GB ||| MSI GTX 680 4GB
Storage Samsung 970 PRO 512 GB + 1 TB ||| Intel 545s 512 GB + 256 GB
Display(s) Asus ROG Swift PG278QR 27" ||| Eizo EV2416W 24"
Case Fractal Design Define 7 XL x 2
Audio Device(s) Cambridge Audio DacMagic Plus
Power Supply Seasonic Focus PX-850 x 2
Mouse Razer Abyssus
Keyboard CM Storm QuickFire XT
Software Ubuntu
Have a look here: https://software.intel.com/en-us/ipp

And here: https://www.amazon.com/Optimizing-A...ce+Primitives&qid=1568800003&s=gateway&sr=8-1

I'm seriously considering switching from my trusty 4770K to a 3950X for my main system. Hell has frozen over...
Just linking to some random Intel libraries…, yeah, you don't quite get how software development works.
As I said, (normal) software isn't "Intel optimized". To "optimize" for something specific, we would need unique instructions differentiating it from the competition, and make multiple compiled versions of the software, but then it will no longer be the same software. As I said Intel and AMD generally have the same ISA, with the exception of new instructions that one or the other adds, and then the other responds by adding support later. So if you wanted to "optimize for Intel", you would have to look for instructions that AMD don't support (yet), and build the software around that using assembly code or intrinsics, not high-level stuff. But if these new instructions are useful, then AMD usually adds support shortly after, and then your code is no longer "Intel optimized".

In reality, optimizing code is not about optimizing it for Intel or AMD, and even if so, it would be features of microarchitectures, not "Intel" or "AMD". The reason why a piece of code performs differently on e.g. Skylake, Ice Lake, Zen 1 or Zen 2 is different resource bottlenecks. Intel and AMD keeps changing/improving the various resources in their CPUs; like prefetching, branch prediction, caches, execution port configuration, ALUs, FPUs, vector units, AGUs, etc. Even if I intentionally or unintentionally optimizes my code so it happens right now to scale better on Zen 2 than Skylake, Ice Lake or the next one is likely to change that resource balance and tilt that the other way. When we write software, we can't target the CPU's microoperations, so we can't truly optimize for a specific microarchtecture, but when we have "optimal" code where one algorithm scales better on Skylake and another scales better on Zen 2, it doesn't mean there is something wrong with either, it just means their workload happens to be better balanced for those respective CPUs, like balancing integer and floating point operations, branching, SIMD, etc.

Since the ISA for x86 CPUs are the same, and we can't target any of the underlying microarchitectures, optimization is by design generic. Optimizing code is about removing redundancies, bloat, abstractions, branching, SIMD and often most importantly cache optimization. Optimizations like this will always benefit all modern x86 microarchitectures, and while the relative gain may vary, a good optimization will work for all of them, including future unknown microarchitectures.

So no; software like games, photo editors, video editors, CADs, web browsers, office applications, development tools, etc. are not "Intel optimized".
 
Joined
Feb 11, 2009
Messages
5,606 (0.96/day)
System Name Cyberline
Processor Intel Core i7 2600k -> 12600k
Motherboard Asus P8P67 LE Rev 3.0 -> Gigabyte Z690 Auros Elite DDR4
Cooling Tuniq Tower 120 -> Custom Watercoolingloop
Memory Corsair (4x2) 8gb 1600mhz -> Crucial (8x2) 16gb 3600mhz
Video Card(s) AMD RX480 -> RX7800XT
Storage Samsung 750 Evo 250gb SSD + WD 1tb x 2 + WD 2tb -> 2tb MVMe SSD
Display(s) Philips 32inch LPF5605H (television) -> Dell S3220DGF
Case antec 600 -> Thermaltake Tenor HTCP case
Audio Device(s) Focusrite 2i4 (USB)
Power Supply Seasonic 620watt 80+ Platinum
Mouse Elecom EX-G
Keyboard Rapoo V700
Software Windows 10 Pro 64bit
Precisely and at least when you also OC Intel's chips, they lead in all single thread heavy applications, full stop. Despite what hordes of ardent AMD fan(boy)s all over the internet would have you believe, the red team is still the one doing catch-up.

Are you sure you are not living in a bubble?
 
Joined
Aug 13, 2009
Messages
3,254 (0.58/day)
Location
Czech republic
Processor Ryzen 5800X
Motherboard Asus TUF-Gaming B550-Plus
Cooling Noctua NH-U14S
Memory 32GB G.Skill Trident Z Neo F4-3600C16D-32GTZNC
Video Card(s) AMD Radeon RX 6600
Storage HP EX950 512GB + Samsung 970 PRO 1TB
Display(s) HP Z Display Z24i G2
Case Fractal Design Define R6 Black
Audio Device(s) Creative Sound Blaster AE-5
Power Supply Seasonic PRIME Ultra 650W Gold
Mouse Roccat Kone AIMO Remastered
Software Windows 10 x64
Just linking to some random Intel libraries…, yeah, you don't quite get how software development works.
As I said, (normal) software isn't "Intel optimized". To "optimize" for something specific, we would need unique instructions differentiating it from the competition, and make multiple compiled versions of the software, but then it will no longer be the same software. As I said Intel and AMD generally have the same ISA, with the exception of new instructions that one or the other adds, and then the other responds by adding support later. So if you wanted to "optimize for Intel", you would have to look for instructions that AMD don't support (yet), and build the software around that using assembly code or intrinsics, not high-level stuff. But if these new instructions are useful, then AMD usually adds support shortly after, and then your code is no longer "Intel optimized".

In reality, optimizing code is not about optimizing it for Intel or AMD, and even if so, it would be features of microarchitectures, not "Intel" or "AMD". The reason why a piece of code performs differently on e.g. Skylake, Ice Lake, Zen 1 or Zen 2 is different resource bottlenecks. Intel and AMD keeps changing/improving the various resources in their CPUs; like prefetching, branch prediction, caches, execution port configuration, ALUs, FPUs, vector units, AGUs, etc. Even if I intentionally or unintentionally optimizes my code so it happens right now to scale better on Zen 2 than Skylake, Ice Lake or the next one is likely to change that resource balance and tilt that the other way. When we write software, we can't target the CPU's microoperations, so we can't truly optimize for a specific microarchtecture, but when we have "optimal" code where one algorithm scales better on Skylake and another scales better on Zen 2, it doesn't mean there is something wrong with either, it just means their workload happens to be better balanced for those respective CPUs, like balancing integer and floating point operations, branching, SIMD, etc.

Since the ISA for x86 CPUs are the same, and we can't target any of the underlying microarchitectures, optimization is by design generic. Optimizing code is about removing redundancies, bloat, abstractions, branching, SIMD and often most importantly cache optimization. Optimizations like this will always benefit all modern x86 microarchitectures, and while the relative gain may vary, a good optimization will work for all of them, including future unknown microarchitectures.

So no; software like games, photo editors, video editors, CADs, web browsers, office applications, development tools, etc. are not "Intel optimized".
So why does some software/games run better on Intel for example?
 
Joined
Jun 10, 2014
Messages
3,010 (0.78/day)
Processor AMD Ryzen 9 5900X ||| Intel Core i7-3930K
Motherboard ASUS ProArt B550-CREATOR ||| Asus P9X79 WS
Cooling Noctua NH-U14S ||| Be Quiet Pure Rock
Memory Crucial 2 x 16 GB 3200 MHz ||| Corsair 8 x 8 GB 1333 MHz
Video Card(s) MSI GTX 1060 3GB ||| MSI GTX 680 4GB
Storage Samsung 970 PRO 512 GB + 1 TB ||| Intel 545s 512 GB + 256 GB
Display(s) Asus ROG Swift PG278QR 27" ||| Eizo EV2416W 24"
Case Fractal Design Define 7 XL x 2
Audio Device(s) Cambridge Audio DacMagic Plus
Power Supply Seasonic Focus PX-850 x 2
Mouse Razer Abyssus
Keyboard CM Storm QuickFire XT
Software Ubuntu
So why does some software/games run better on Intel for example?
As I said, Intel and AMD keeps changing/improving the balance of resources in their CPUs; like prefetching, branch prediction, caches, execution port configuration, ALUs, FPUs, vector units, AGUs, etc. As of right now, Zen∕Zen2 have ALUs and FPUs/vector units spread across more execution ports, while Skylake have theirs spread across fewer more flexible execution ports, so depending on how the instructions are shuffled up AMD can reach a higher max throughput for single (non-vector) operations if they are just the right mix of int and float, while Skylake's more flexible design can reach a better average performance across a wider range of workloads, but the maximum throughput for non-vector operations is lower. This is why we see Zen(2) pull ahead with a good margin in a few benchmarks, while Intel does very well on average across more workloads. Intel is of course still helped by a better front-end (especially games).

Most x86 microarchitectures have since the early 90s been using custom "RISC-like" micro-operations. These native architecture-specific instructions are not available to us software developers, nor would it be feasible to use them; as any code using such instructions will be locked to a specific microarchtecture, and the assembly code would have to be tied specifically to the precise ALU, FPU and register configuration of the superscalar design. So there is no direct way to control the micro-operations on the CPU, so we are left with the x86 ISA which is shared between them. Even if we wanted to, we can't truly optimize for a specific one, just change the algorithms/logic and benchmark them to see what performs the best.

Very little software these days are even using assembly to optimize code. Most applications you use are written in C++ or more even higher level languages, and any low-level optimizations (even generic x86) is very rare in such applications. In fact, most software today are poorly written, rushed, highly abstracted pieces of crap, and it's more common that code bases are not performance optimized at all.
Even if it was technically possible, most coders are too lazy to conspire to "optimize" for Intel and sabotage AMD.
 
Joined
Aug 13, 2009
Messages
3,254 (0.58/day)
Location
Czech republic
Processor Ryzen 5800X
Motherboard Asus TUF-Gaming B550-Plus
Cooling Noctua NH-U14S
Memory 32GB G.Skill Trident Z Neo F4-3600C16D-32GTZNC
Video Card(s) AMD Radeon RX 6600
Storage HP EX950 512GB + Samsung 970 PRO 1TB
Display(s) HP Z Display Z24i G2
Case Fractal Design Define R6 Black
Audio Device(s) Creative Sound Blaster AE-5
Power Supply Seasonic PRIME Ultra 650W Gold
Mouse Roccat Kone AIMO Remastered
Software Windows 10 x64
Hm, so laziness is the reason/one of the reasons why we need faster and faster computers to run basically the same stuff? I'm looking at you, Windows.
 

eidairaman1

The Exiled Airman
Joined
Jul 2, 2007
Messages
43,045 (6.72/day)
Location
Republic of Texas (True Patriot)
System Name PCGOD
Processor AMD FX 8350@ 5.0GHz
Motherboard Asus TUF 990FX Sabertooth R2 2901 Bios
Cooling Scythe Ashura, 2×BitFenix 230mm Spectre Pro LED (Blue,Green), 2x BitFenix 140mm Spectre Pro LED
Memory 16 GB Gskill Ripjaws X 2133 (2400 OC, 10-10-12-20-20, 1T, 1.65V)
Video Card(s) AMD Radeon 290 Sapphire Vapor-X
Storage Samsung 840 Pro 256GB, WD Velociraptor 1TB
Display(s) NEC Multisync LCD 1700V (Display Port Adapter)
Case AeroCool Xpredator Evil Blue Edition
Audio Device(s) Creative Labs Sound Blaster ZxR
Power Supply Seasonic 1250 XM2 Series (XP3)
Mouse Roccat Kone XTD
Keyboard Roccat Ryos MK Pro
Software Windows 7 Pro 64
Top