Editorial x86 Lacks Innovation, Arm is Catching up. Enough to Replace the Giant?

notb · Apr 9, 2020

trparky said:
Yet AMD is showing that they can do more work with more efficiency. Goes to show you that Intel badly needs a new architecture, the current one is showing that it's not up to the task.

No. AMD's lead comes only from the node advantage.
Zen+ efficiency was on par with 8-9th gen Core. Zen2 efficiency seems roughly on par with Ice Lake (not a big niche to compare...). Same x86 realm, compatible instruction set and 1/0 bits that need to be switched. Smaller transistors mean less energy. There's really no magic inside.

Intel already has a few more modern architectures. They're just not in the product line you're interested in.

Frick · Apr 9, 2020

R-T-B said:
For extremely old software, there is emulation as an option. For more modern games, it's a little more tricky.

Tricky when it's supposed to control hardware. Like industrial control systems and the like. Possible, but no one is interested in taking the cost, which frankly makes sense.

londiste · Apr 9, 2020

Accurate emulation can be very expensive in terms of performance:

Accuracy takes power: one man’s 3GHz quest to build a perfect SNES emulator

How can it take 3GHz to emulate a Super Nintendo? The man behind a major SNES …

arstechnica.com

efikkan · Apr 9, 2020

notb said:
It's not about compliting for ARM being hard or not. It's about the cost of recompliting everything we need. And more importantly: of supporting and optimizing for both architectures.

Most tool chains, standard libraries etc. already have excellent support for ARM. Most applications today don't have x86 optimizations at all, so porting them is easy, but making them perform well might be harder if the ISA is inherently slower. ARM is a very customized ISA, and if you want better performance you have to leverage the custom accelerated features of the specific chip. Even if you just target Apple's chips, in two-three years time they will replace some of it and you have to rewrite your core application. Now imagine if you want to support all the major ARM chips at the market (current and 2-3 years back), you have to rewrite the software over and over again. The other option is to only stick to the basic ARM ISA, and the lacking performance resulting from it. Having an ISA that is very stable and backwards compatible like x86 is a huge advantage for software.

notb said:
If we now ask companies to make the same software for both x86 and ARM, they'll just ask more for it. And software is already way more expensive than hardware (at least in the enterprise segment).

Developing software is very expensive, especially maintaining it over time.
All the more reason not to prioritize a dozen or so different ARM variations.

notb said:
There's still a lot that can be done to improve x86 efficiency and flexibility. The heterogeneous architecture idea (e.g. Intel Lakefield) is the most obvious - and for some reason: very underestimated on this forum...
If ARM didn't have big.LITTLE, we probably wouldn't be having this discussion. ARM SoCs would suck just like they did 10 years ago.

Sure, but this is microarchtecture, not ISA.

Aquinus · Apr 9, 2020

londiste said:
Intel needs a new manufacturing process. They have architecture(s).

To some degree. Dies can only still get so big without yields going to crap. The MCM approach really is the way to solve this problem and we're seeing it a little bit from Intel on that front with their really high core count Xeons, but we're talking about progress that AMD had already made years ago. Even if they get their CPUs on a smaller process, there is still a barrier to how many cores they can throw on a die and that's going to continue to hamstring Intel with both consumer chips and server chips because they're at an extreme disadvantage when it comes to producing CPUs with more than 8 cores.

notb · Apr 9, 2020

Aquinus said:
To some degree. Dies can only still get so big without yields going to crap. The MCM approach really is the way to solve this problem and we're seeing it a little bit from Intel on that front with their really high core count Xeons, but we're talking about progress that AMD had already made years ago. Even if they get their CPUs on a smaller process, there is still a barrier to how many cores they can throw on a die and that's going to continue to hamstring Intel with both consumer chips and server chips because they're at an extreme disadvantage when it comes to producing CPUs with more than 8 cores.

People here are too eager to call Intel's lineup unattractive just because it can't make the same core density as AMD/TSMC.
If Intel isn't able to make chips larger than say 150mm2 profitable, they just won't fight in these segments.
But below, with the right pricing, Intel will easily keep the market share leadership.

High-core server chips are a problem and this is something Intel will have to address one way or another.
Everything else is either doable one node behind or not that significant volume-wise. Intel makes many other products - they'll utilize their fab capacity anyway.

And yeah, there's obviously a market share threshold that will trigger Intel to go harder on AMD - e.g. taking over TSMC node supply. At this point they don't see this as necessary.

londiste · Apr 9, 2020

Aquinus said:
To some degree. Dies can only still get so big without yields going to crap. The MCM approach really is the way to solve this problem and we're seeing it a little bit from Intel on that front with their really high core count Xeons, but we're talking about progress that AMD had already made years ago. Even if they get their CPUs on a smaller process, there is still a barrier to how many cores they can throw on a die and that's going to continue to hamstring Intel with both consumer chips and server chips because they're at an extreme disadvantage when it comes to producing CPUs with more than 8 cores.

MCM on Zen/Zen+ level would not require architectural change from Intel. Their problem from last year compared to Zen/Zen+ is primarily in the lack of QPI links on their smaller dies for optimal topology with many dies. Xeon 9200 is literally two 28-core dies on one package. Xeons are meant for up to 8-socket configurations and there is a topological reason why Zen/Zen+ EPYCs were up to 2-socket.

IO Die is kind of both new and old at the same time. Architecturally speaking this is good old pure chipset + bus configuration. Packaging and bus organization is new and different. No doubt Intel will follow AMD's lead on this one.

None of this is on topic though. Core count and managing core-to-core traffic has different approach for the manycore ARM and other CPUs as well. Mesh (not Intel one but generic mesh principle) does seem to be an up and coming one there, primarily due to the number of cores it seems.

R0H1T · Apr 9, 2020

They certainly would need something to change, you can't just slap "IF" like links onto any *cove & call it a day! Suspect that's why Keller is there & we'll probably see the first fruits of his work in Sapphire Rapids, likely the same MCM approach? And by the time Intel do go MCM, AMD will have DDR5 & what IF v4 or v5 by then & before anyone says CXL ~ yeah no that won't work right off the bat, if at all.

londiste · Apr 9, 2020

R0H1T said:
They certainly would need something to change, you can't just slap "IF" like links onto any *cove & call it a day!

Intel has QPI that is used to connect CPUs in multi-CPU system. Its parameters are very-very similar to AMD's inter-die IF links. They would not even need to slap these on, it is already there. They might need a couple additional links though.

Edit:
Sorry, it is called UPI now. Same thing.

R0H1T · Apr 9, 2020

UPI is inter die connect so not quite the same as IF, from what I remember anyway. There are other differences as well, I'll see if I can find something better on UPI.

londiste · Apr 9, 2020

R0H1T said:
UPI is inter die connect so not quite the same as IF, from what I remember anyway. There are other differences as well, I'll see if I can find something better on UPI.

UPI is primarily used for inter-socket connection. Using it for inter-die connections is trivial (see Xeon 9200). It should also be flexible enough to make it faster and/or wider if physical connection permits (say, EMIB).
I do not understand exactly what you mean by interconnect for inter-die connect not being quite the same as IF. IF is used throughout Zen precisely for inter-die connections - from one die to another in some topology in Zen/Zen+ and from CCD to IOD on Zen2.

R0H1T · Apr 9, 2020

londiste said:
I do not understand exactly what you mean by interconnect for inter-die connect not being quite the same as IF. IF is used throughout Zen precisely for inter-die connections - from one die to another in some topology in Zen/Zen+ and from CCD to IOD on Zen2.

IF is also present on the die itself, this is a Zen layout & IIRC it's still the same way Zen2 operates.

The Ryzen Die

londiste · Apr 9, 2020

R0H1T said:
IF is also present on the die itself, this is a Zen layout & IIRC it's still the same way Zen2 operates.

There is SDF, think of it as an IF switch. This is what CCXs connect to. Traffic inside CCX does not use IF. That one basically goes through L3$.

R0H1T · Apr 9, 2020

IF includes both SCF & SDF, unless you want to elaborate on that?

The Infinity Fabric consists of two separate communication planes - Infinity Scalable Data Fabric (SDF) and the Infinity Scalable Control Fabric (SCF).

Infinity Fabric (IF) - AMD - WikiChip

Infinity Fabric (IF) is a proprietary system interconnect architecture that facilitates data and control transmission across all linked components. This architecture is utilized by AMD's recent microarchitectures for both CPU (i.e., Zen) and graphics (e.g., Vega), and any other additional...

en.wikichip.org

trparky · Apr 9, 2020

notb said:
No. AMD's lead comes only from the node advantage.
Zen+ efficiency was on par with 8-9th gen Core. Zen2 efficiency seems roughly on par with Ice Lake (not a big niche to compare...). Same x86 realm, compatible instruction set and 1/0 bits that need to be switched. Smaller transistors mean less energy. There's really no magic inside.

Now this may be based upon rumors, but it's said that Zen 3 will actually surpass Intel in terms of IPC. This will allow AMD, in some workloads, be able to do more work with less clock speed which of course leads to less power usage and less heat. It not the GHz that matters, it's what you can do with those GHz that really counts. Higher IPC contributes to better efficiency which is where Intel really hasn't shown any progress as of late. Sure, they've been giving us a few percentages lately but the leaps that AMD has been making makes Intel look like they're standing still.

bug · Apr 9, 2020

trparky said:
Now this may be based upon rumors, but it's said that Zen 3 will actually surpass Intel in terms of IPC.

Considering Zen2 is already trading blows with current Intel offerings (as fara as IPC is concerned), it would be really surprising if Zen3 didn't surpass Intel.

trparky said:
This will allow AMD, in some workloads, be able to do more work with less clock speed which of course leads to less power usage and less heat.

Lower clocks help, alot at times, but they don't tell the whole story. Die size matters, architecture itself matters, voltage matters (though I don't see voltage going up this round).

R0H1T · Apr 9, 2020

Zen 2 is already ahead of Skylake or the last iterative "Whiskey" lake or what ever it was, even I don't remember at this point :laugh:

Not Icelake though or Sunny Cove to be precise, even if it's only relegated to laptops.

trparky · Apr 9, 2020

bug said:
Lower clocks help, a lot at times, but they don't tell the whole story. Die size matters, architecture itself matters, voltage matters (though I don't see voltage going up this round).

It does seem funny that as you ramp up the clock speeds in the Intel camp the voltages required to get those higher clocks increases in ways that aren't exactly nice. Sure, you might win the silicon lottery and get a chip that overclocks nicely while not having to pump quite so much voltage into it to achieve stability but there's a reason why they call it the silicon lottery; most people don't win it. So more often than not, you have to push some hideously high voltages to get those high clock speeds in the Intel camp which leads to decreased overall efficiency.

Never mind the fact that as you push past 5 GHz on Intel it appears that the performance gains drop off a cliff. Sure, you achieved 5.2 GHz... but does it really matter? Has it helped you? Nope. All it has given you is bragging rights at that point, no real performance gains while doing nothing but suck power.

bug said:
Considering Zen2 is already trading blows with current Intel offerings (as far as IPC is concerned), it would be really surprising if Zen3 didn't surpass Intel.

Exactly. AMD doesn't appear to be taking the foot up off the gas pedal so look forward to even more IPC increases as AMD moves forward. We may very well see an AMD chip that's clocked significantly lower (while using less power) than an Intel chip yet is trouncing them (Intel).

r9 · Apr 12, 2020

lexluthermiester said:
No they're not. They're a great idea.

Idea yes execution not so much.
Only the native apps run descent (not that many) everything else runs like crap.

ARF · Apr 12, 2020

lexluthermiester said:
No they're not. They're a great idea.

There is no problem with ARM in a laptop form factor - actually imagine a laptop with the TDP of a smartphone and still running its screen perfectly smooth at resolutions like 3200 x 1440, and smooth apps, too.

notb · Apr 12, 2020

ARF said:
There is no problem with ARM in a laptop form factor - actually imagine a laptop with the TDP of a smartphone

You have a laptop with 2500U (15W). Do you run any kind of tasks that use most of that SoC?
Maybe some of them can be replicated on your smartphone. Try it.
If it's interactive - you'll see if it's as smooth and responsive as on your laptop.
If it's more about processing, time it.

ARM can be a viable alternative for x86 in *some* scenarios and *some* people. And I'll be the first to admit it because I'm very much in that group (I'm mostly cloud-based).
But your suggestions that it's a solution for everyone is absurd.
Seriously, spend some time learning how computers work and how they are used.

still running its screen perfectly smooth at resolutions like 3200 x 1440, and smooth apps, too.

There's something seriously wrong with your PCs if they can't run a smooth video output. :eek:

lexluthermiester · Apr 13, 2020

notb said:
You have a laptop with 2500U (15W). Do you run any kind of tasks that use most of that SoC?
Maybe some of them can be replicated on your smartphone. Try it.

I have and screw that crap. Call me a screen snob if you wish but I just can not do general computing on a 5.5" phone screen.

Aquinus · Apr 13, 2020

lexluthermiester said:
I have and screw that crap. Call me a screen snob if you wish but I just can not do general computing on a 5.5" phone screen.

I agree. i have an iPhone 11 Pro Max with a 6.5" screen and even if I connected it to a monitor, I'm pretty sure I wouldn't want to use it for my entire job. There are some things that make sense, like management-related apps like JIRA, or for communication like email, Slack, or Zoom. It however is grossly insufficient to dev upon because I need the JVM, VIM, Node, and a Postgres server. At the end of the day, it's not unreasonable for me to be using 11-14GB of memory on my tower or laptop and that's not going to change if I were capable of doing it on my phone, yet my phone has only 4GB.

On the other hand, I have a laptop with an i7 8550u in it and its single core performance more or less matches the performance of my overclocked 3930k and also has 16GB of ram, not to mention the capability to drive two 4k displays in addition to the built-in 1080p display. My phone isn't doing that.

londiste · Apr 13, 2020

You are not target market for ARM notebook. While large amounts of RAM is not something that depends on architecture, you are more than likely to run into performance issues on "phone-class TDP" ARM laptop. You are producer or creator, you create content (in its wider meaning).

Target market kind of overlaps with the people who are OK doing things with the phone, except maybe proper screen and keyboard. Consumer is a pretty accurate term here.

For a consumer, ARM is enough. So is an x86 CPU from last decade. The primary place where lowest end of such CPUs run into trouble for consumer is media playback and recording. This is usually resolved with ASIC blocks in the SoC - things like encoders/decoders and image processing in phones or similar blocks on x86 CPUs, usually integrated into iGPU for the latter. As long as iGPU or SoC can decode video stream from Youtube and Netflix, average user can make do with ARM (or Atom) as CPU.

ARF · Apr 13, 2020

londiste said:
You are not target market for ARM notebook. While large amounts of RAM is not something that depends on architecture, you are more than likely to run into performance issues on "phone-class TDP" ARM laptop. You are producer or creator, you create content (in its wider meaning).

Target market kind of overlaps with the people who are OK doing things with the phone, except maybe proper screen and keyboard. Consumer is a pretty accurate term here.

For a consumer, ARM is enough. So is an x86 CPU from last decade. The primary place where lowest end of such CPUs run into trouble for consumer is media playback and recording. This is usually resolved with ASIC blocks in the SoC - things like encoders/decoders and image processing in phones or similar blocks on x86 CPUs, usually integrated into iGPU for the latter. As long as iGPU or SoC can decode video stream from Youtube and Netflix, average user can make do with ARM (or Atom) as CPU.

Yes, we are speaking about consumer apps, he shifts to offtopic workstations...

System Name	Black MC in Tokyo
Processor	Ryzen 5 7600
Motherboard	MSI X670E Gaming Plus Wifi
Cooling	Be Quiet! Pure Rock 2
Memory	2 x 16GB Corsair Vengeance @ 6000Mhz
Video Card(s)	XFX 6950XT Speedster MERC 319
Storage	Kingston KC3000 1TB \| WD Black SN750 2TB \|WD Blue 1TB x 2 \| Toshiba P300 2TB \| Seagate Expansion 8TB
Display(s)	Samsung U32J590U 4K + BenQ GL2450HT 1080p
Case	Fractal Design Define R4
Audio Device(s)	AuraSound AS42 Soundbar \| Plantronics 5220 \| Sony WH-1000XM3 \| Nektar SE61 \| Behringer XR18
Power Supply	Corsair RM850x v3
Mouse	Logitech G602
Keyboard	Dell SK3205
Software	Windows 10 Pro
Benchmark Scores	Rimworld 4K ready!

Processor	Ryzen 7800X3D
Motherboard	ROG STRIX B650E-F GAMING WIFI
Memory	2x16GB G.Skill Flare X5 DDR5-6000 CL36 (F5-6000J3636F16GX2-FX5)
Video Card(s)	INNO3D GeForce RTX™ 4070 Ti SUPER TWIN X2
Storage	2TB Samsung 980 PRO, 4TB WD Black SN850X
Display(s)	42" LG C2 OLED, 27" ASUS PG279Q
Case	Thermaltake Core P5
Power Supply	Fractal Design Ion+ Platinum 760W
Mouse	Corsair Dark Core RGB Pro SE
Keyboard	Corsair K100 RGB
VR HMD	HTC Vive Cosmos

Processor	AMD Ryzen 9 5900X \|\|\| Intel Core i7-3930K
Motherboard	ASUS ProArt B550-CREATOR \|\|\| Asus P9X79 WS
Cooling	Noctua NH-U14S \|\|\| Be Quiet Pure Rock
Memory	Crucial 2 x 16 GB 3200 MHz \|\|\| Corsair 8 x 8 GB 1333 MHz
Video Card(s)	MSI GTX 1060 3GB \|\|\| MSI GTX 680 4GB
Storage	Samsung 970 PRO 512 GB + 1 TB \|\|\| Intel 545s 512 GB + 256 GB
Display(s)	Asus ROG Swift PG278QR 27" \|\|\| Eizo EV2416W 24"
Case	Fractal Design Define 7 XL x 2
Audio Device(s)	Cambridge Audio DacMagic Plus
Power Supply	Seasonic Focus PX-850 x 2
Mouse	Razer Abyssus
Keyboard	CM Storm QuickFire XT
Software	Ubuntu

System Name	Apollo
Processor	Intel Core i9 9880H
Motherboard	Some proprietary Apple thing.
Memory	64GB DDR4-2667
Video Card(s)	AMD Radeon Pro 5600M, 8GB HBM2
Storage	1TB Apple NVMe, 2TB external SSD, 4TB external HDD for backup.
Display(s)	32" Dell UHD, 27" LG UHD, 28" LG 5k
Case	MacBook Pro (16", 2019)
Audio Device(s)	AirPods Pro, AirPods Max
Power Supply	Display or Thunderbolt 4 Hub
Mouse	Logitech G502
Keyboard	Logitech G915, GL Clicky
Software	MacOS 15.5

Processor	Ryzen 7800X3D
Motherboard	ROG STRIX B650E-F GAMING WIFI
Memory	2x16GB G.Skill Flare X5 DDR5-6000 CL36 (F5-6000J3636F16GX2-FX5)
Video Card(s)	INNO3D GeForce RTX™ 4070 Ti SUPER TWIN X2
Storage	2TB Samsung 980 PRO, 4TB WD Black SN850X
Display(s)	42" LG C2 OLED, 27" ASUS PG279Q
Case	Thermaltake Core P5
Power Supply	Fractal Design Ion+ Platinum 760W
Mouse	Corsair Dark Core RGB Pro SE
Keyboard	Corsair K100 RGB
VR HMD	HTC Vive Cosmos

Editorial x86 Lacks Innovation, Arm is Catching up. Enough to Replace the Giant?

notb

Frick

Fishfaced Nincompoop

londiste

Accuracy takes power: one man’s 3GHz quest to build a perfect SNES emulator

efikkan

Aquinus

Resident Wat-man

notb

londiste

R0H1T

londiste

R0H1T

londiste

R0H1T

londiste

R0H1T

Infinity Fabric (IF) - AMD - WikiChip

trparky

bug

R0H1T

trparky

r9

ARF

notb

lexluthermiester

Aquinus

Resident Wat-man

londiste

ARF

System Name	My Ryzen 7 7700X Super Computer
Processor	AMD Ryzen 7 7700X
Motherboard	Gigabyte B650 Aorus Elite AX
Cooling	DeepCool AK620 with Arctic Silver 5
Memory	2x16GB G.Skill Trident Z5 NEO DDR5 EXPO (CL30)
Video Card(s)	XFX AMD Radeon RX 7900 GRE
Storage	Samsung 980 EVO 1 TB NVMe SSD (System Drive), Samsung 970 EVO 500 GB NVMe SSD (Game Drive)
Display(s)	Acer Nitro XV272U (DisplayPort) and Acer Nitro XV270U (DisplayPort)
Case	Lian Li LANCOOL II MESH C
Audio Device(s)	On-Board Sound / Sony WH-XB910N Bluetooth Headphones
Power Supply	MSI A850GF
Mouse	Logitech M705
Keyboard	Steelseries
Software	Windows 11 Pro 64-bit
Benchmark Scores	https://valid.x86.fr/liwjs3

Processor	Intel i5-12600k
Motherboard	Asus H670 TUF
Cooling	Arctic Freezer 34
Memory	2x16GB DDR4 3600 G.Skill Ripjaws V
Video Card(s)	EVGA GTX 1060 SC
Storage	500GB Samsung 970 EVO, 500GB Samsung 850 EVO, 1TB Crucial MX300 and 2TB Crucial MX500
Display(s)	Dell U3219Q + HP ZR24w
Case	Raijintek Thetis
Audio Device(s)	Audioquest Dragonfly Red :D
Power Supply	Seasonic 620W M12
Mouse	Logitech G502 Proteus Core
Keyboard	G.Skill KM780R
Software	Arch Linux + Win10

System Name	Primary\|Secondary\|Poweredge r410\|Dell XPS\|SteamDeck
Processor	i7 11700k\|i7 9700k\|2 x E5620 \|i5 5500U\|Zen 2 4c/8t
Memory	32GB DDR4\|16GB DDR4\|16GB DDR4\|32GB ECC DDR3\|8GB DDR4\|16GB LPDDR5
Video Card(s)	RX 7800xt\|RX 6700xt \|On-Board\|On-Board\|8 RDNA 2 CUs
Storage	2TB m.2\|512GB SSD+1TB SSD\|2x256GBSSD 2x2TBGB\|256GB sata\|512GB nvme
Display(s)	50" 4k TV \| Dell 27" \|22" \|3.3"\|7"
VR HMD	Samsung Odyssey+ \| Oculus Quest 2
Software	Windows 11 Pro\|Windows 10 Pro\|Windows 10 Home\| Server 2012 r2\|Windows 10 Pro