• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Intel Officially Sinks the Itanic, Future of IA-64 Architecture Uncertain

Joined
Jun 10, 2014
Messages
2,987 (0.78/day)
Processor AMD Ryzen 9 5900X ||| Intel Core i7-3930K
Motherboard ASUS ProArt B550-CREATOR ||| Asus P9X79 WS
Cooling Noctua NH-U14S ||| Be Quiet Pure Rock
Memory Crucial 2 x 16 GB 3200 MHz ||| Corsair 8 x 8 GB 1333 MHz
Video Card(s) MSI GTX 1060 3GB ||| MSI GTX 680 4GB
Storage Samsung 970 PRO 512 GB + 1 TB ||| Intel 545s 512 GB + 256 GB
Display(s) Asus ROG Swift PG278QR 27" ||| Eizo EV2416W 24"
Case Fractal Design Define 7 XL x 2
Audio Device(s) Cambridge Audio DacMagic Plus
Power Supply Seasonic Focus PX-850 x 2
Mouse Razer Abyssus
Keyboard CM Storm QuickFire XT
Software Ubuntu
It should work ok for the small instruction windows that fits into the pipeline at any given moment.
The CPU can really only see the instruction stream within a short window. Any branching will of course increase the potential instruction streams beyond the conditional, and gets worse if there are multiple conditionals, even if it's technically the same in both branches, it still creates new branches anywhere it occurs, so two conditionals may give up to 4 branches, 3 up to 8, and so on. This gets really hard with data dependencies, and if the CPU tries to execute things out-of-order. You will quickly need more resources on die than is realistic.

Also keep in mind that the CPU can't see beyond a memory access until it's dereferenced, and the same with any memory access with a data dependency, like
Code:
variable = array[some_number + some_other_number];
The CPU will try to execute these out-of-order as early as possible, but that will only save a few clock cycles of idle. The cost of a cache miss is up to ~400 clocks for Skylake, and for a misprediction it's up to 19 cycles for the flush plus any delays from fetching the new instructions, which can be even a instruction cache miss if it's a long jump! The instruction window for Skylake is 224, and I believe it can decode 6 instructions or so per cycle, so it doesn't take a lot before it's virtually "walking blindly". And as you can see, even a single random memory access can't be found in time to prefetch it, and often there are multiple data dependencies in a chain, leaving the CPU stalled most of the time. The only memory accesses it can do ahead of time without a stall are linear accesses, where it guesses beyond the instruction window. Something as simple as a function call or a pointer dereference will in most cases cause a cache miss. The same with innocent looking conditionals, like:
Code:
if (a && b && (c > 2)) {}
Put something like this inside a loop and you'll kill performance very quickly. Even worse, function calls with inheritance in OOP; while it might suit your coding desires, doing it in a critical part of the code can easily make a peformance difference of >100×.

But I'm just speculating (see what I did there?).
;)
 
Joined
Sep 15, 2007
Messages
3,946 (0.63/day)
Location
Police/Nanny State of America
Processor OCed 5800X3D
Motherboard Asucks C6H
Cooling Air
Memory 32GB
Video Card(s) OCed 6800XT
Storage NVMees
Display(s) 32" Dull curved 1440
Case Freebie glass idk
Audio Device(s) Sennheiser
Power Supply Don't even remember
Only benchmark I could find:

Xeon 20.1474609375
Itanium 8.8193359375
Opteron 11.516927083333333333333333333333
Itanium 13.1015625

That's per core. Itanium 2 is nothing to scoff at.

8-Core Itanium Poulson: 3.1 billion transistors
8-Core Xeon Nehalem-EX: 2.3 billion transistors

Interesting article about Poulson (newest Itanium architecture): https://www.realworldtech.com/poulson/

Itanium had 20% of the TOP 500 super computers back in 2004. IA-64 gained traction because x86 lacked memory addressing space. x86-64 reversed that pattern because of backwards compatibility/not having to find Itanium software developers.

12 instructions per clock, 8 cores, and 16 threads at the end of 2012. It was a monster.

Pretty sad in real apps from what I recall. Also, rendered obsolete by intel themselves with nehalem the next year.
 

FordGT90Concept

"I go fast!1!11!1!"
Joined
Oct 13, 2008
Messages
26,259 (4.45/day)
Location
IA, USA
System Name BY-2021
Processor AMD Ryzen 7 5800X (65w eco profile)
Motherboard MSI B550 Gaming Plus
Cooling Scythe Mugen (rev 5)
Memory 2 x Kingston HyperX DDR4-3200 32 GiB
Video Card(s) AMD Radeon RX 7900 XT
Storage Samsung 980 Pro, Seagate Exos X20 TB 7200 RPM
Display(s) Nixeus NX-EDG274K (3840x2160@144 DP) + Samsung SyncMaster 906BW (1440x900@60 HDMI-DVI)
Case Coolermaster HAF 932 w/ USB 3.0 5.25" bay + USB 3.2 (A+C) 3.5" bay
Audio Device(s) Realtek ALC1150, Micca OriGen+
Power Supply Enermax Platimax 850w
Mouse Nixeus REVEL-X
Keyboard Tesoro Excalibur
Software Windows 10 Home 64-bit
Benchmark Scores Faster than the tortoise; slower than the hare.
HP paid to keep it going for at least 18 years. It had its uses. Latest Itanium 2 processors actually came out in 2017.
 
Joined
Jun 10, 2014
Messages
2,987 (0.78/day)
Processor AMD Ryzen 9 5900X ||| Intel Core i7-3930K
Motherboard ASUS ProArt B550-CREATOR ||| Asus P9X79 WS
Cooling Noctua NH-U14S ||| Be Quiet Pure Rock
Memory Crucial 2 x 16 GB 3200 MHz ||| Corsair 8 x 8 GB 1333 MHz
Video Card(s) MSI GTX 1060 3GB ||| MSI GTX 680 4GB
Storage Samsung 970 PRO 512 GB + 1 TB ||| Intel 545s 512 GB + 256 GB
Display(s) Asus ROG Swift PG278QR 27" ||| Eizo EV2416W 24"
Case Fractal Design Define 7 XL x 2
Audio Device(s) Cambridge Audio DacMagic Plus
Power Supply Seasonic Focus PX-850 x 2
Mouse Razer Abyssus
Keyboard CM Storm QuickFire XT
Software Ubuntu
Actual development of Itanium was discontinued shortly after the launch of Itanium 2. Intel did have long-term commitments though, so they kept tweaking it a bit for some time.
 

FordGT90Concept

"I go fast!1!11!1!"
Joined
Oct 13, 2008
Messages
26,259 (4.45/day)
Location
IA, USA
System Name BY-2021
Processor AMD Ryzen 7 5800X (65w eco profile)
Motherboard MSI B550 Gaming Plus
Cooling Scythe Mugen (rev 5)
Memory 2 x Kingston HyperX DDR4-3200 32 GiB
Video Card(s) AMD Radeon RX 7900 XT
Storage Samsung 980 Pro, Seagate Exos X20 TB 7200 RPM
Display(s) Nixeus NX-EDG274K (3840x2160@144 DP) + Samsung SyncMaster 906BW (1440x900@60 HDMI-DVI)
Case Coolermaster HAF 932 w/ USB 3.0 5.25" bay + USB 3.2 (A+C) 3.5" bay
Audio Device(s) Realtek ALC1150, Micca OriGen+
Power Supply Enermax Platimax 850w
Mouse Nixeus REVEL-X
Keyboard Tesoro Excalibur
Software Windows 10 Home 64-bit
Benchmark Scores Faster than the tortoise; slower than the hare.
That would be wrong (see link above). Itanium 2 debuted in 2002 on 180nm. Poulson (released in 2012) was a huge makeover for the architecture on the 32nm node. Kittson (released in 2017) was supposed to be a 22nm node shrink of Poulson but, for reasons unknown, 22nm was abandoned and Kittson was produced on a matured 32nm node.
 
Top