• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Intel Officially Sinks the Itanic, Future of IA-64 Architecture Uncertain

Joined
Apr 9, 2018
Messages
781 (0.32/day)
I recall reading about Itanium in the early days, it seems like ever since it's inception it has been a myth at the best of times.
 

FordGT90Concept

"I go fast!1!11!1!"
Joined
Oct 13, 2008
Messages
26,259 (4.44/day)
Location
IA, USA
System Name BY-2021
Processor AMD Ryzen 7 5800X (65w eco profile)
Motherboard MSI B550 Gaming Plus
Cooling Scythe Mugen (rev 5)
Memory 2 x Kingston HyperX DDR4-3200 32 GiB
Video Card(s) AMD Radeon RX 7900 XT
Storage Samsung 980 Pro, Seagate Exos X20 TB 7200 RPM
Display(s) Nixeus NX-EDG274K (3840x2160@144 DP) + Samsung SyncMaster 906BW (1440x900@60 HDMI-DVI)
Case Coolermaster HAF 932 w/ USB 3.0 5.25" bay + USB 3.2 (A+C) 3.5" bay
Audio Device(s) Realtek ALC1150, Micca OriGen+
Power Supply Enermax Platimax 850w
Mouse Nixeus REVEL-X
Keyboard Tesoro Excalibur
Software Windows 10 Home 64-bit
Benchmark Scores Faster than the tortoise; slower than the hare.
I don't recall them having any Itanium version even close to efficient enough to consider that.
It could have been but Intel gave up on it a long time ago. It uses a lot fewer transistors to execute a task than x86 does. It left most optimizations to the software and compiler rather than processor itself.
 
Joined
Apr 19, 2013
Messages
296 (0.07/day)
System Name Darkside
Processor R7 3700X
Motherboard Aorus Elite X570
Cooling Deepcool Gammaxx l240
Memory Thermaltake Toughram DDR4 3600MHz CL18
Video Card(s) Gigabyte RX Vega 64 Gaming OC
Storage ADATA & WD 500GB NVME PCIe 3.0, many WD Black 1-3TB HD
Display(s) Samsung C27JG5x
Case Thermaltake Level 20 XL
Audio Device(s) iFi xDSD / micro iTube2 / micro iCAN SE
Power Supply EVGA 750W G2
Mouse Corsair M65
Keyboard Corsair K70 LUX RGB
Benchmark Scores Not sure, don't care
Wow, I had completely forgotten about Itanic, much like the industry! :p
 
Joined
Nov 25, 2012
Messages
247 (0.06/day)
Suck Eggs HP.

You bought Compaq and killed Alpha :p

Now your Itanic has sunk.
 
Joined
Sep 15, 2007
Messages
3,946 (0.63/day)
Location
Police/Nanny State of America
Processor OCed 5800X3D
Motherboard Asucks C6H
Cooling Air
Memory 32GB
Video Card(s) OCed 6800XT
Storage NVMees
Display(s) 32" Dull curved 1440
Case Freebie glass idk
Audio Device(s) Sennheiser
Power Supply Don't even remember
Suck Eggs HP.

You bought Compaq and killed Alpha :p

Now your Itanic has sunk.

HP spent billions trying to keep itanic afloat with intel...HP is dumb with a long history of blowing cash lol
Their execs were all having too many drug parties at intel, apparently. You'd have to be higher than a weather balloon to invest in itanium.
 
Joined
Mar 23, 2016
Messages
4,844 (1.52/day)
Processor Core i7-13700
Motherboard MSI Z790 Gaming Plus WiFi
Cooling Cooler Master RGB something
Memory Corsair DDR5-6000 small OC to 6200
Video Card(s) XFX Speedster SWFT309 AMD Radeon RX 6700 XT CORE Gaming
Storage 970 EVO NVMe M.2 500GB,,WD850N 2TB
Display(s) Samsung 28” 4K monitor
Case Phantek Eclipse P400S
Audio Device(s) EVGA NU Audio
Power Supply EVGA 850 BQ
Mouse Logitech G502 Hero
Keyboard Logitech G G413 Silver
Software Windows 11 Professional v23H2

FordGT90Concept

"I go fast!1!11!1!"
Joined
Oct 13, 2008
Messages
26,259 (4.44/day)
Location
IA, USA
System Name BY-2021
Processor AMD Ryzen 7 5800X (65w eco profile)
Motherboard MSI B550 Gaming Plus
Cooling Scythe Mugen (rev 5)
Memory 2 x Kingston HyperX DDR4-3200 32 GiB
Video Card(s) AMD Radeon RX 7900 XT
Storage Samsung 980 Pro, Seagate Exos X20 TB 7200 RPM
Display(s) Nixeus NX-EDG274K (3840x2160@144 DP) + Samsung SyncMaster 906BW (1440x900@60 HDMI-DVI)
Case Coolermaster HAF 932 w/ USB 3.0 5.25" bay + USB 3.2 (A+C) 3.5" bay
Audio Device(s) Realtek ALC1150, Micca OriGen+
Power Supply Enermax Platimax 850w
Mouse Nixeus REVEL-X
Keyboard Tesoro Excalibur
Software Windows 10 Home 64-bit
Benchmark Scores Faster than the tortoise; slower than the hare.
HP spent billions trying to keep itanic afloat with intel...HP is dumb with a long history of blowing cash lol
Their execs were all having too many drug parties at intel, apparently. You'd have to be higher than a weather balloon to invest in itanium.
Before AMD64 rolled out, IA-64 made a lot of sense as the future of computing. It still does in some regards but people would rather have backwards compatibility in processors than an instruction set for the 21st century.
 
Joined
Sep 15, 2007
Messages
3,946 (0.63/day)
Location
Police/Nanny State of America
Processor OCed 5800X3D
Motherboard Asucks C6H
Cooling Air
Memory 32GB
Video Card(s) OCed 6800XT
Storage NVMees
Display(s) 32" Dull curved 1440
Case Freebie glass idk
Audio Device(s) Sennheiser
Power Supply Don't even remember
Before AMD64 rolled out, IA-64 made a lot of sense as the future of computing. It still does in some regards but people would rather have backwards compatibility in processors than an instruction set for the 21st century.

Low performance = failure no matter what.
 
Joined
Jun 10, 2014
Messages
2,987 (0.78/day)
Processor AMD Ryzen 9 5900X ||| Intel Core i7-3930K
Motherboard ASUS ProArt B550-CREATOR ||| Asus P9X79 WS
Cooling Noctua NH-U14S ||| Be Quiet Pure Rock
Memory Crucial 2 x 16 GB 3200 MHz ||| Corsair 8 x 8 GB 1333 MHz
Video Card(s) MSI GTX 1060 3GB ||| MSI GTX 680 4GB
Storage Samsung 970 PRO 512 GB + 1 TB ||| Intel 545s 512 GB + 256 GB
Display(s) Asus ROG Swift PG278QR 27" ||| Eizo EV2416W 24"
Case Fractal Design Define 7 XL x 2
Audio Device(s) Cambridge Audio DacMagic Plus
Power Supply Seasonic Focus PX-850 x 2
Mouse Razer Abyssus
Keyboard CM Storm QuickFire XT
Software Ubuntu
Itanium only made sense on paper. By the time it entered the market in 2001, it was already dead in the water. It was crushed by x86 designs of the day; Pentium III, Pentium 4 and Athlon, and in datacenter market they were crushed by RISC processors like Power and Sparc. Itanium had been in development since the late 80s, and the design choices were largely made in a bubble without too much realism.
 
Joined
Jul 10, 2010
Messages
1,234 (0.23/day)
Location
USA, Arizona
System Name SolarwindMobile
Processor AMD FX-9800P RADEON R7, 12 COMPUTE CORES 4C+8G
Motherboard Acer Wasp_BR
Cooling It's Copper.
Memory 2 x 8GB SK Hynix/HMA41GS6AFR8N-TF
Video Card(s) ATI/AMD Radeon R7 Series (Bristol Ridge FP4) [ACER]
Storage TOSHIBA MQ01ABD100 1TB + KINGSTON RBU-SNS8152S3128GG2 128 GB
Display(s) ViewSonic XG2401 SERIES
Case Acer Aspire E5-553G
Audio Device(s) Realtek ALC255
Power Supply PANASONIC AS16A5K
Mouse SteelSeries Rival
Keyboard Ducky Channel Shine 3
Software Windows 10 Home 64-bit (Version 1607, Build 14393.969)
Then, in a couple months Itanic revives as IA-128 and has partial compatibility with RISC-V.
 
Joined
Nov 25, 2012
Messages
247 (0.06/day)
HP spent billions trying to keep itanic afloat with intel...HP is dumb with a long history of blowing cash lol
Their execs were all having too many drug parties at intel, apparently. You'd have to be higher than a weather balloon to invest in itanium.

HP
Killed PA-Risc
And Dec Alpha
to bring out the Itanic with intel..

Even microsoft saw a sinking ship and pulled out years ago.

Can't remember how many billion the itanic it cost.
And how many % of the server and even workstation market they where meant to get.
Even before AM64 they where behind target then AM64 came out and it was pretty much game over.

Actually the intellectual property of Alpha was bought by Intel.

Either way intel HP created the Itanic ....
 

FordGT90Concept

"I go fast!1!11!1!"
Joined
Oct 13, 2008
Messages
26,259 (4.44/day)
Location
IA, USA
System Name BY-2021
Processor AMD Ryzen 7 5800X (65w eco profile)
Motherboard MSI B550 Gaming Plus
Cooling Scythe Mugen (rev 5)
Memory 2 x Kingston HyperX DDR4-3200 32 GiB
Video Card(s) AMD Radeon RX 7900 XT
Storage Samsung 980 Pro, Seagate Exos X20 TB 7200 RPM
Display(s) Nixeus NX-EDG274K (3840x2160@144 DP) + Samsung SyncMaster 906BW (1440x900@60 HDMI-DVI)
Case Coolermaster HAF 932 w/ USB 3.0 5.25" bay + USB 3.2 (A+C) 3.5" bay
Audio Device(s) Realtek ALC1150, Micca OriGen+
Power Supply Enermax Platimax 850w
Mouse Nixeus REVEL-X
Keyboard Tesoro Excalibur
Software Windows 10 Home 64-bit
Benchmark Scores Faster than the tortoise; slower than the hare.
Low performance = failure no matter what.
Only benchmark I could find:

Xeon 20.1474609375
Itanium 8.8193359375
Opteron 11.516927083333333333333333333333
Itanium 13.1015625

That's per core. Itanium 2 is nothing to scoff at.

8-Core Itanium Poulson: 3.1 billion transistors
8-Core Xeon Nehalem-EX: 2.3 billion transistors

Interesting article about Poulson (newest Itanium architecture): https://www.realworldtech.com/poulson/

Itanium had 20% of the TOP 500 super computers back in 2004. IA-64 gained traction because x86 lacked memory addressing space. x86-64 reversed that pattern because of backwards compatibility/not having to find Itanium software developers.

12 instructions per clock, 8 cores, and 16 threads at the end of 2012. It was a monster.
 
Last edited:
Joined
Jun 10, 2014
Messages
2,987 (0.78/day)
Processor AMD Ryzen 9 5900X ||| Intel Core i7-3930K
Motherboard ASUS ProArt B550-CREATOR ||| Asus P9X79 WS
Cooling Noctua NH-U14S ||| Be Quiet Pure Rock
Memory Crucial 2 x 16 GB 3200 MHz ||| Corsair 8 x 8 GB 1333 MHz
Video Card(s) MSI GTX 1060 3GB ||| MSI GTX 680 4GB
Storage Samsung 970 PRO 512 GB + 1 TB ||| Intel 545s 512 GB + 256 GB
Display(s) Asus ROG Swift PG278QR 27" ||| Eizo EV2416W 24"
Case Fractal Design Define 7 XL x 2
Audio Device(s) Cambridge Audio DacMagic Plus
Power Supply Seasonic Focus PX-850 x 2
Mouse Razer Abyssus
Keyboard CM Storm QuickFire XT
Software Ubuntu
Then, in a couple months Itanic revives as IA-128 and has partial compatibility with RISC-V.
There is no reason for making a 128-bit ISA, at least not yet anyway. Current x86 architectures have partial support for up to 512-bit through AVX, which for the time being is a much more flexible and smart way of getting good performance without adding massive complexity to the design. I see no reason why the entire core should be extended to 128-bit, at least not for the next decade.
 
Joined
Feb 3, 2017
Messages
3,806 (1.32/day)
Processor Ryzen 7800X3D
Motherboard ROG STRIX B650E-F GAMING WIFI
Memory 2x16GB G.Skill Flare X5 DDR5-6000 CL36 (F5-6000J3636F16GX2-FX5)
Video Card(s) INNO3D GeForce RTX™ 4070 Ti SUPER TWIN X2
Storage 2TB Samsung 980 PRO, 4TB WD Black SN850X
Display(s) 42" LG C2 OLED, 27" ASUS PG279Q
Case Thermaltake Core P5
Power Supply Fractal Design Ion+ Platinum 760W
Mouse Corsair Dark Core RGB Pro SE
Keyboard Corsair K100 RGB
VR HMD HTC Vive Cosmos
64-bit calculations were largely a secondary concern. Move to 64-bit was largely dictated by memory - more specifically address space - of 32-bit becoming too small. 32 bits allows for address space of 4 GB and especially coupled with things like memory-mapping for things like GPUs - that want a large part of that address space - it just ran out faster than expected. Workarounds in form of things like PAE proved to be insufficient to address the inherent limitation.

Yes, the actual address space support today is more like 40-bit (2^40 ~ 1 TB) or 52-bit (2^52 ~ 4.5 PB ~ 4500 TB) for physical and 48-bit for virtual (2^48 ~ 280 TB) not the full 64-bit, but moving that up is a fairly minor change in terms of architecture and it'll take a while until we exhaust the 64-bit address space (2^64 = 16 EB ~ 16.7 million TB).

64-bit needs the data path, ALUs (integer, which is used for address calculations), registers, address and data buses to be 64-bit which doubled almost everything in a CPU or a CPU core compared to 32-bit CPUs. Doubling all that again to 128-bit does not sound like something CPUs would benefit from - today and in general use. For an example on that, see what happened to Intel's FP units in terms of size, power and heat when they doubled their size from 128-bit to 256-bit for AVX2 in Haswell.
 
Last edited:
Joined
Jun 10, 2014
Messages
2,987 (0.78/day)
Processor AMD Ryzen 9 5900X ||| Intel Core i7-3930K
Motherboard ASUS ProArt B550-CREATOR ||| Asus P9X79 WS
Cooling Noctua NH-U14S ||| Be Quiet Pure Rock
Memory Crucial 2 x 16 GB 3200 MHz ||| Corsair 8 x 8 GB 1333 MHz
Video Card(s) MSI GTX 1060 3GB ||| MSI GTX 680 4GB
Storage Samsung 970 PRO 512 GB + 1 TB ||| Intel 545s 512 GB + 256 GB
Display(s) Asus ROG Swift PG278QR 27" ||| Eizo EV2416W 24"
Case Fractal Design Define 7 XL x 2
Audio Device(s) Cambridge Audio DacMagic Plus
Power Supply Seasonic Focus PX-850 x 2
Mouse Razer Abyssus
Keyboard CM Storm QuickFire XT
Software Ubuntu
64-bit calculations were largely a secondary concern. Move to 64-bit was largely dictated by memory - more specifically address space - of 32-bit becoming too small.
You are confusing register width with address width.
64-bit computing have nothing to do with 64-bit address width.

Physical Address Extension (PAE) to address beyond 4 GB was supported since Pentium Pro (1995).
 
Joined
Feb 3, 2017
Messages
3,806 (1.32/day)
Processor Ryzen 7800X3D
Motherboard ROG STRIX B650E-F GAMING WIFI
Memory 2x16GB G.Skill Flare X5 DDR5-6000 CL36 (F5-6000J3636F16GX2-FX5)
Video Card(s) INNO3D GeForce RTX™ 4070 Ti SUPER TWIN X2
Storage 2TB Samsung 980 PRO, 4TB WD Black SN850X
Display(s) 42" LG C2 OLED, 27" ASUS PG279Q
Case Thermaltake Core P5
Power Supply Fractal Design Ion+ Platinum 760W
Mouse Corsair Dark Core RGB Pro SE
Keyboard Corsair K100 RGB
VR HMD HTC Vive Cosmos
Addresses have a tendency to go through integer units for various purposes, address generation for example. Operations to do on addresses are pretty much integer operations so in modern x86 processors integer units practically double for addressing. They are not directly related but eventually they collide. Or did I get this completely wrong?

PAE is a workaround. It is often not enough and has downsides, not least of which is enabling support for it on every level.
 
Last edited:
Joined
Jun 10, 2014
Messages
2,987 (0.78/day)
Processor AMD Ryzen 9 5900X ||| Intel Core i7-3930K
Motherboard ASUS ProArt B550-CREATOR ||| Asus P9X79 WS
Cooling Noctua NH-U14S ||| Be Quiet Pure Rock
Memory Crucial 2 x 16 GB 3200 MHz ||| Corsair 8 x 8 GB 1333 MHz
Video Card(s) MSI GTX 1060 3GB ||| MSI GTX 680 4GB
Storage Samsung 970 PRO 512 GB + 1 TB ||| Intel 545s 512 GB + 256 GB
Display(s) Asus ROG Swift PG278QR 27" ||| Eizo EV2416W 24"
Case Fractal Design Define 7 XL x 2
Audio Device(s) Cambridge Audio DacMagic Plus
Power Supply Seasonic Focus PX-850 x 2
Mouse Razer Abyssus
Keyboard CM Storm QuickFire XT
Software Ubuntu
Addresses have a tendency to go through integer units for various purposes, address generation for example. They are not directly related but eventually they collide.

PAE is a workaround. It is often not enough and has downsides, not least of which is enabling support for it on every level.
To have register width lower than address width requires more operations, but is not uncommon. Nearly all the early computers did this, Intel 8086 16-bit register width 20-bit address width, 80286 was 16-bit / 24-bit addressing. MOS 6502 was a 8-bit CPU with 16-bit addressing, used in Commodore 64, Atari 2600, Apple II, NES and many more.

PAE was supported on Windows, MacOS(x86), Linux and all the major BSDs. Windows 8 and 10 32-bit actually requires to run in PAE mode, so it's used much more than you think.

The reason why PAE is unknown to most people, is that they switched to 64-bit OS and hardware long before they hit the 4 GB limit.
 
Joined
Feb 3, 2017
Messages
3,806 (1.32/day)
Processor Ryzen 7800X3D
Motherboard ROG STRIX B650E-F GAMING WIFI
Memory 2x16GB G.Skill Flare X5 DDR5-6000 CL36 (F5-6000J3636F16GX2-FX5)
Video Card(s) INNO3D GeForce RTX™ 4070 Ti SUPER TWIN X2
Storage 2TB Samsung 980 PRO, 4TB WD Black SN850X
Display(s) 42" LG C2 OLED, 27" ASUS PG279Q
Case Thermaltake Core P5
Power Supply Fractal Design Ion+ Platinum 760W
Mouse Corsair Dark Core RGB Pro SE
Keyboard Corsair K100 RGB
VR HMD HTC Vive Cosmos
To have register width lower than address width requires more operations, but is not uncommon. Nearly all the early computers did this, Intel 8086 16-bit register width 20-bit address width, 80286 was 16-bit / 24-bit addressing. MOS 6502 was a 8-bit CPU with 16-bit addressing, used in Commodore 64, Atari 2600, Apple II, NES and many more.
With drive for efficiency and simultaneously widening the compute the different address width (at least to the larger side) seem to be uncommon in current architectures, no?

I remember PAE very well. It needed support from motherboard, BIOS, operating system and depending on circumstances, application. That was a lot of fun :)
Are you sure about 32-bit Windows 8 and 10 requiring PAE? They do support it and can benefit from it but I remember trying to turn on PAE manually on Windows 8 (and failing due to stupid hardware).
 

Aquinus

Resident Wat-man
Joined
Jan 28, 2012
Messages
13,171 (2.80/day)
Location
Concord, NH, USA
System Name Apollo
Processor Intel Core i9 9880H
Motherboard Some proprietary Apple thing.
Memory 64GB DDR4-2667
Video Card(s) AMD Radeon Pro 5600M, 8GB HBM2
Storage 1TB Apple NVMe, 4TB External
Display(s) Laptop @ 3072x1920 + 2x LG 5k Ultrafine TB3 displays
Case MacBook Pro (16", 2019)
Audio Device(s) AirPods Pro, Sennheiser HD 380s w/ FIIO Alpen 2, or Logitech 2.1 Speakers
Power Supply 96w Power Adapter
Mouse Logitech MX Master 3
Keyboard Logitech G915, GL Clicky
Software MacOS 12.1
A lot of that came from the fact that the compilers had to do all the hard work. Little do people know but in your common x86-64 chip there's a lot of optimization of the CPU instructions going on behind the scenes at the silicon level before even one instruction is executed. There was none of that happening with Itanium, all of that had to be done at the compiler level which they generally weren't able to do.
Actually not. Itanium features explicit parallel instructions, and compilers are limited to working with just a few instructions within a scope, there is no way a compiler could be able to properly structure the code and memory to leverage this. It's kind of similar to SIMD(like AVX), the compiler can vectorize small patterns of a few instructions, but can never restructure larger code, so if you want proper utilization of SIMD you need to use intrinsics which are basically mapped directly to assembly. No compiler will ever be able to do this automatically.
Actually, it did. It's not easy to write a compiler to take advantage of 128 general-purpose registers in an effective way for every workload imaginable. x86 only had 8 and x86-64 bumped that to 16. Theoretically it could be really fast, but it can only be as fast as the compiler and how it determines what data goes where. The nice thing with having a bunch of general-purpose registers is because you don't need to load and store data to and from memory as often and accessing registers is faster than accessing cache, however, there are consequences to getting evicted from registers incorrectly which is more and more likely as you have to manage a larger number of registers. The reality is that you can only look so far ahead so, I suspect that a lot of the time those registers are getting loaded and stored far more often then they should be, mainly because you have to figure it out ahead of time if some data is going to be used soon or way later and the cost of getting that wrong is significant.

Just saying. IA-64 is good on paper, but performance completely relies on implementation of both the software and the compiler and the compiler is actually responsible for a lot than one for x86-64 is.
 
Last edited:
Joined
Jun 10, 2014
Messages
2,987 (0.78/day)
Processor AMD Ryzen 9 5900X ||| Intel Core i7-3930K
Motherboard ASUS ProArt B550-CREATOR ||| Asus P9X79 WS
Cooling Noctua NH-U14S ||| Be Quiet Pure Rock
Memory Crucial 2 x 16 GB 3200 MHz ||| Corsair 8 x 8 GB 1333 MHz
Video Card(s) MSI GTX 1060 3GB ||| MSI GTX 680 4GB
Storage Samsung 970 PRO 512 GB + 1 TB ||| Intel 545s 512 GB + 256 GB
Display(s) Asus ROG Swift PG278QR 27" ||| Eizo EV2416W 24"
Case Fractal Design Define 7 XL x 2
Audio Device(s) Cambridge Audio DacMagic Plus
Power Supply Seasonic Focus PX-850 x 2
Mouse Razer Abyssus
Keyboard CM Storm QuickFire XT
Software Ubuntu
With drive for efficiency and simultaneously widening the compute the different address width (at least to the larger side) seem to be uncommon in current architectures, no?
I didn't quite get that one.

Having to do multiple operations to access memory is of course a disadvantage, but not a huge one. I remember most recompiled software got like a 5-10% improvement, due to easier memory access and faster integer math combined.

I remember PAE very well. It needed support from motherboard, BIOS, operating system and depending on circumstances, application. That was a lot of fun :)

Are you sure about 32-bit Windows 8 and 10 requiring PAE? They do support it and can benefit from it but I remember trying to turn on PAE manually on Windows 8 (and failing due to stupid hardware).
I haven't run any Windows in 32-bit since XP, but from what I've read does NX bit require it, which is enabled on all modern operating systems for security reasons.

Nevertheless, I was one of the early adapters of 64-bit OS's, not because of memory, but because I wanted that extra 5-10% performance. Linux did have an extra advantage here, since the entire software libraries were made available in 64-bit almost immediately. And it was a larger uplift than many might think. Most 32-bit software (even on Windows) was compiled with i386 ISA, yes that means 80386 compatible features only. Some heavier applications was of course compiled with later ISA versions, but most software were not. Linux software also usually assumed SSE2 support along with "AMD64", so the difference could be quite substantial in edge cases.

Actually, it did. It's not easy to write a compiler to take advantage of 128 general-purpose registers in an effective way for every workload imaginable.
The problem from the compiler side is that the code, regardless of language, needs to be structured in a way that the compiler can basically saturate these resources.

If you write even C/C++ code without considerations, not even the best compiler imaginable can restructure the overall code for it to be efficient for Itanium.
This is basically the same problem we have with writing for AVX, and the reason why all efficient AVX code uses intrinsics, which is "almost" assembly.

x86 only had 8 and x86-64 bumped that to 16. Theoretically it could be really fast, but it can only be as fast as the compiler and how it determines what data goes where. The nice thing with having a bunch of general-purpose registers is because you don't need to load and store data to and from memory as often and accessing registers is faster than accessing cache<snip>
In theory, having many registers is beneficial. At machine code level, x86 code does a lot of moving around between registers (which usually is completely wasted cycles, ARM does a lot more…). So having more registers (even if only on the ISA level) can eliminate operations and therefore be beneficial, I have no issues so far.

But, keep in mind that ISA and microarchitecture are two different things. x86 on the ISA level is not superscalar, while every modern microarchitecture is. What registers a CPU have available on an execution port is entirely dependent on their architecture, and it varies. And this is the sort of thing that a CPU is actually able to optimize within the instruction window.

Having too many general purpose registers on the microarchitecture level will get challenging, because it complicates the pipeline and is likely to introduce latency.

So to sum up, I'm all for having more registers on the ISA level, but on the microarchitecture level it should be entirely up to the designer. Current x86 designs have 4(Skylake)/4+2?(Zen) execution ports for ALUs, etc. and vector units. As this increases in the future, I would expect improvements on the ISA level could help simplify the work for the front-end in the CPUs.
 
Last edited:

bug

Joined
May 22, 2015
Messages
13,836 (3.96/day)
Processor Intel i5-12600k
Motherboard Asus H670 TUF
Cooling Arctic Freezer 34
Memory 2x16GB DDR4 3600 G.Skill Ripjaws V
Video Card(s) EVGA GTX 1060 SC
Storage 500GB Samsung 970 EVO, 500GB Samsung 850 EVO, 1TB Crucial MX300 and 2TB Crucial MX500
Display(s) Dell U3219Q + HP ZR24w
Case Raijintek Thetis
Audio Device(s) Audioquest Dragonfly Red :D
Power Supply Seasonic 620W M12
Mouse Logitech G502 Proteus Core
Keyboard G.Skill KM780R
Software Arch Linux + Win10
The thing is, there were some good ideas powering Itanium. But it has been doomed for years.

For example, you what happens with x86/x86-64 when it tries to speed up code execution? It tries to predict whether a code path will be chosen and execute it ahead of time using idle resources. The problem is if it turns out the prediction was wrong, the pipeline has to be flushed and new instructions brought in. You know what Itanium does/did? It doesn't try to predict anything, it will execute both branches of a conditional statement and pick whichever is needed when the time comes.
Intel wasn't nuts in coming up with Itanium. It's just that everybody chose x86-64 instead.
 
Joined
Jun 10, 2014
Messages
2,987 (0.78/day)
Processor AMD Ryzen 9 5900X ||| Intel Core i7-3930K
Motherboard ASUS ProArt B550-CREATOR ||| Asus P9X79 WS
Cooling Noctua NH-U14S ||| Be Quiet Pure Rock
Memory Crucial 2 x 16 GB 3200 MHz ||| Corsair 8 x 8 GB 1333 MHz
Video Card(s) MSI GTX 1060 3GB ||| MSI GTX 680 4GB
Storage Samsung 970 PRO 512 GB + 1 TB ||| Intel 545s 512 GB + 256 GB
Display(s) Asus ROG Swift PG278QR 27" ||| Eizo EV2416W 24"
Case Fractal Design Define 7 XL x 2
Audio Device(s) Cambridge Audio DacMagic Plus
Power Supply Seasonic Focus PX-850 x 2
Mouse Razer Abyssus
Keyboard CM Storm QuickFire XT
Software Ubuntu
For example, you what happens with x86/x86-64 when it tries to speed up code execution? It tries to predict whether a code path will be chosen and execute it ahead of time using idle resources. The problem is if it turns out the prediction was wrong, the pipeline has to be flushed and new instructions brought in. You know what Itanium does/did? It doesn't try to predict anything, it will execute both branches of a conditional statement and pick whichever is needed when the time comes.
It doesn't really matter if you try to execute both branches of a conditional, or you do speculative execution. Either way you're pretty much screwed after three or more conditionals coming within a few instructions, as the problem grows exponentially.

One of the fundamental problems for CPUs is that the CPU have less context than the author of the code does. If you for example write a function where the value of a variable is determined by one or more conditionals, but the remaining control flow is unchanged. But by the time the code is turned into machine code this information is usually lost; all the CPU sees is calculation, conditionals, access patterns etc. within just a tiny instruction window. There are a few cases where a compiler can optimize certain code into operations like conditional move, which eliminates what I call "false branching", but usually compiler optimizations like this require the code to be in a very specific form to detect it correctly, unless the coder uses intrinsics. This is an area where x86 could improve a lot, with of course some changes in compilers and coding practices to go along with it.

Ultimately code executions comes down to cache misses and branching, and dealing with these in a sensible manner will determine the speed of the code, regardless of programming language. There is not going to be a wonderful new CPU ISA which solves this automatically. Unfortunately, most code today consists of more conditionals, function calls and random access patterns than than code which actually does something, and code like this will never be fast.
 

bug

Joined
May 22, 2015
Messages
13,836 (3.96/day)
Processor Intel i5-12600k
Motherboard Asus H670 TUF
Cooling Arctic Freezer 34
Memory 2x16GB DDR4 3600 G.Skill Ripjaws V
Video Card(s) EVGA GTX 1060 SC
Storage 500GB Samsung 970 EVO, 500GB Samsung 850 EVO, 1TB Crucial MX300 and 2TB Crucial MX500
Display(s) Dell U3219Q + HP ZR24w
Case Raijintek Thetis
Audio Device(s) Audioquest Dragonfly Red :D
Power Supply Seasonic 620W M12
Mouse Logitech G502 Proteus Core
Keyboard G.Skill KM780R
Software Arch Linux + Win10
It doesn't really matter if you try to execute both branches of a conditional, or you do speculative execution. Either way you're pretty much screwed after three or more conditionals coming within a few instructions, as the problem grows exponentially.
It should work ok for the small instruction windows that fits into the pipeline at any given moment. But I'm just speculating (see what I did there?).

One of the fundamental problems for CPUs is that the CPU have less context than the author of the code does.
Yeah, there's never going to be a universal fix for this. Just more or less efficient solutions spread between the compiler and the CPU.
 
Joined
Jul 24, 2009
Messages
1,002 (0.18/day)
Maybe future will eventually get to that, but I think it was product simply too much ahead of its time.

Bit like tessellation in R9800 Pro. :)
 

bug

Joined
May 22, 2015
Messages
13,836 (3.96/day)
Processor Intel i5-12600k
Motherboard Asus H670 TUF
Cooling Arctic Freezer 34
Memory 2x16GB DDR4 3600 G.Skill Ripjaws V
Video Card(s) EVGA GTX 1060 SC
Storage 500GB Samsung 970 EVO, 500GB Samsung 850 EVO, 1TB Crucial MX300 and 2TB Crucial MX500
Display(s) Dell U3219Q + HP ZR24w
Case Raijintek Thetis
Audio Device(s) Audioquest Dragonfly Red :D
Power Supply Seasonic 620W M12
Mouse Logitech G502 Proteus Core
Keyboard G.Skill KM780R
Software Arch Linux + Win10
Maybe future will eventually get to that, but I think it was product simply too much ahead of its time.

Bit like tessellation in R9800 Pro. :)
There was something definitely wrong with execution, it wasn't just the product. Look at ARM and how they had no trouble jumpstarting a new architecture from scratch. Ironically, one that has grown to 64 bits, too.
It's a done deal though, it only matters to historians and future business decisions how.
 
Top