Intel Officially Sinks the Itanic, Future of IA-64 Architecture Uncertain

efikkan · Feb 1, 2019

bug said:
It should work ok for the small instruction windows that fits into the pipeline at any given moment.

The CPU can really only see the instruction stream within a short window. Any branching will of course increase the potential instruction streams beyond the conditional, and gets worse if there are multiple conditionals, even if it's technically the same in both branches, it still creates new branches anywhere it occurs, so two conditionals may give up to 4 branches, 3 up to 8, and so on. This gets really hard with data dependencies, and if the CPU tries to execute things out-of-order. You will quickly need more resources on die than is realistic.

Also keep in mind that the CPU can't see beyond a memory access until it's dereferenced, and the same with any memory access with a data dependency, like

Code:

variable = array[some_number + some_other_number];

The CPU will try to execute these out-of-order as early as possible, but that will only save a few clock cycles of idle. The cost of a cache miss is up to ~400 clocks for Skylake, and for a misprediction it's up to 19 cycles for the flush plus any delays from fetching the new instructions, which can be even a instruction cache miss if it's a long jump! The instruction window for Skylake is 224, and I believe it can decode 6 instructions or so per cycle, so it doesn't take a lot before it's virtually "walking blindly". And as you can see, even a single random memory access can't be found in time to prefetch it, and often there are multiple data dependencies in a chain, leaving the CPU stalled most of the time. The only memory accesses it can do ahead of time without a stall are linear accesses, where it guesses beyond the instruction window. Something as simple as a function call or a pointer dereference will in most cases cause a cache miss. The same with innocent looking conditionals, like:

Code:

if (a && b && (c > 2)) {}

Put something like this inside a loop and you'll kill performance very quickly. Even worse, function calls with inheritance in OOP; while it might suit your coding desires, doing it in a critical part of the code can easily make a peformance difference of >100×.

bug said:
But I'm just speculating (see what I did there?).

TheGuruStud · Feb 2, 2019

FordGT90Concept said:
Only benchmark I could find:

Xeon 20.1474609375
Itanium 8.8193359375
Opteron 11.516927083333333333333333333333
Itanium 13.1015625

That's per core. Itanium 2 is nothing to scoff at.

8-Core Itanium Poulson: 3.1 billion transistors
8-Core Xeon Nehalem-EX: 2.3 billion transistors

Interesting article about Poulson (newest Itanium architecture): https://www.realworldtech.com/poulson/

Itanium had 20% of the TOP 500 super computers back in 2004. IA-64 gained traction because x86 lacked memory addressing space. x86-64 reversed that pattern because of backwards compatibility/not having to find Itanium software developers.

12 instructions per clock, 8 cores, and 16 threads at the end of 2012. It was a monster.

Pretty sad in real apps from what I recall. Also, rendered obsolete by intel themselves with nehalem the next year.

FordGT90Concept · Feb 2, 2019

HP paid to keep it going for at least 18 years. It had its uses. Latest Itanium 2 processors actually came out in 2017.

efikkan · Feb 2, 2019

Actual development of Itanium was discontinued shortly after the launch of Itanium 2. Intel did have long-term commitments though, so they kept tweaking it a bit for some time.

FordGT90Concept · Feb 2, 2019

That would be wrong (see link above). Itanium 2 debuted in 2002 on 180nm. Poulson (released in 2012) was a huge makeover for the architecture on the 32nm node. Kittson (released in 2017) was supposed to be a 22nm node shrink of Poulson but, for reasons unknown, 22nm was abandoned and Kittson was produced on a matured 32nm node.

Processor	AMD Ryzen 9 5900X \|\|\| Intel Core i7-3930K
Motherboard	ASUS ProArt B550-CREATOR \|\|\| Asus P9X79 WS
Cooling	Noctua NH-U14S \|\|\| Be Quiet Pure Rock
Memory	Crucial 2 x 16 GB 3200 MHz \|\|\| Corsair 8 x 8 GB 1333 MHz
Video Card(s)	MSI GTX 1060 3GB \|\|\| MSI GTX 680 4GB
Storage	Samsung 970 PRO 512 GB + 1 TB \|\|\| Intel 545s 512 GB + 256 GB
Display(s)	Asus ROG Swift PG278QR 27" \|\|\| Eizo EV2416W 24"
Case	Fractal Design Define 7 XL x 2
Audio Device(s)	Cambridge Audio DacMagic Plus
Power Supply	Seasonic Focus PX-850 x 2
Mouse	Razer Abyssus
Keyboard	CM Storm QuickFire XT
Software	Ubuntu

Processor	OCed 5800X3D
Motherboard	Asucks C6H
Cooling	Air
Memory	32GB
Video Card(s)	OCed 6800XT
Storage	NVMees
Display(s)	32" Dull curved 1440
Case	Freebie glass idk
Audio Device(s)	Sennheiser
Power Supply	Don't even remember

System Name	BY-2021
Processor	AMD Ryzen 7 5800X (65w eco profile)
Motherboard	MSI B550 Gaming Plus
Cooling	Scythe Mugen (rev 5)
Memory	2 x Kingston HyperX DDR4-3200 32 GiB
Video Card(s)	AMD Radeon RX 7900 XT
Storage	Samsung 980 Pro, Seagate Exos X20 TB 7200 RPM
Display(s)	Nixeus NX-EDG274K (3840x2160@144 DP) + Samsung SyncMaster 906BW (1440x900@60 HDMI-DVI)
Case	Coolermaster HAF 932 w/ USB 3.0 5.25" bay + USB 3.2 (A+C) 3.5" bay
Audio Device(s)	Realtek ALC1150, Micca OriGen+
Power Supply	Enermax Platimax 850w
Mouse	Nixeus REVEL-X
Keyboard	Tesoro Excalibur
Software	Windows 10 Home 64-bit
Benchmark Scores	Faster than the tortoise; slower than the hare.

Processor	AMD Ryzen 9 5900X \|\|\| Intel Core i7-3930K
Motherboard	ASUS ProArt B550-CREATOR \|\|\| Asus P9X79 WS
Cooling	Noctua NH-U14S \|\|\| Be Quiet Pure Rock
Memory	Crucial 2 x 16 GB 3200 MHz \|\|\| Corsair 8 x 8 GB 1333 MHz
Video Card(s)	MSI GTX 1060 3GB \|\|\| MSI GTX 680 4GB
Storage	Samsung 970 PRO 512 GB + 1 TB \|\|\| Intel 545s 512 GB + 256 GB
Display(s)	Asus ROG Swift PG278QR 27" \|\|\| Eizo EV2416W 24"
Case	Fractal Design Define 7 XL x 2
Audio Device(s)	Cambridge Audio DacMagic Plus
Power Supply	Seasonic Focus PX-850 x 2
Mouse	Razer Abyssus
Keyboard	CM Storm QuickFire XT
Software	Ubuntu

System Name	BY-2021
Processor	AMD Ryzen 7 5800X (65w eco profile)
Motherboard	MSI B550 Gaming Plus
Cooling	Scythe Mugen (rev 5)
Memory	2 x Kingston HyperX DDR4-3200 32 GiB
Video Card(s)	AMD Radeon RX 7900 XT
Storage	Samsung 980 Pro, Seagate Exos X20 TB 7200 RPM
Display(s)	Nixeus NX-EDG274K (3840x2160@144 DP) + Samsung SyncMaster 906BW (1440x900@60 HDMI-DVI)
Case	Coolermaster HAF 932 w/ USB 3.0 5.25" bay + USB 3.2 (A+C) 3.5" bay
Audio Device(s)	Realtek ALC1150, Micca OriGen+
Power Supply	Enermax Platimax 850w
Mouse	Nixeus REVEL-X
Keyboard	Tesoro Excalibur
Software	Windows 10 Home 64-bit
Benchmark Scores	Faster than the tortoise; slower than the hare.

Intel Officially Sinks the Itanic, Future of IA-64 Architecture Uncertain

efikkan

TheGuruStud

FordGT90Concept

"I go fast!1!11!1!"

efikkan

FordGT90Concept

"I go fast!1!11!1!"