"Indirector" is Intel's Latest Branch Predictor Vulnerability, But Patch is Already Out

AleksandarK · Jul 3, 2024

Researchers from the University of California, San Diego, have unveiled a significant security vulnerability affecting Intel Raptor Lake and Alder Lake processors. The newly discovered flaw, dubbed "Indirector," exposes weaknesses in the Indirect Branch Predictor (IBP) and Branch Target Buffer (BTB), potentially allowing attackers to execute precise Branch Target Injection (BTI) attacks. The published study provides a detailed look into the intricate structures of the IBP and BTB within recent Intel processors, showcasing Spectre-style attach. For the first time, researchers have mapped out the size, structure, and precise functions governing index and tag hashing in these critical components. Particularly concerning is the discovery of previously unknown gaps in Intel's hardware defenses, including IBPB, IBRS, and STIBP. These findings suggest that even the latest security measures may be insufficient to protect against sophisticated attacks.

The research team developed a tool called "iBranch Locator," which can efficiently identify and manipulate specific branches within the IBP. This tool enables highly precise BTI attacks, potentially compromising security across various scenarios, including cross-process and cross-privilege environments. One of the most alarming implications of this vulnerability is its ability to bypass Address Space Layout Randomization (ASLR), a crucial security feature in modern operating systems. By exploiting the IBP and BTB, attackers could potentially break ASLR protections, exposing systems to a wide range of security threats. Experts recommend several mitigation strategies, including more aggressive use of Intel's IBPB (Indirect Branch Prediction Barrier) feature. However, the performance impact of this solution—up to 50% in some cases—makes it impractical for frequent domain transitions, such as those in browsers and sandboxes. In a statement for Tom's Hardware, Intel noted the following: "Intel reviewed the report submitted by academic researchers and determined previous mitigation guidance provided for issues such as IBRS, eIBRS and BHI are effective against this new research and no new mitigations or guidance is required."

The company was notified in February and had room to fix it. Intel also notified its system vendors, so the security layers protecting against this are in place. Here is the BHI mitigation, and here is the IBRS/eIBRS mitigation.

View at TechPowerUp Main Site | Source

Daven · Jul 3, 2024

'Branch' is spelled 'Brach' in the article title.

Frank_100 · Jul 3, 2024

Daven said:
'Branch' is spelled 'Brach' in the article title.

DutchTraveller · Jul 3, 2024

Hyperthreading is often a factor in these security vulnerabilities.
So getting rid of it looks like a promising way to address this and increase ST performance too.

In programs that make very frequent calls to the operating system this should make a big difference.

R-T-B · Jul 3, 2024

DutchTraveller said:
Hyperthreading is often a factor in these security vulnerabilities.

Not in this one, reading the paper.

It's all branch predictor mayhem.

trsttte · Jul 4, 2024

DutchTraveller said:
So getting rid of it looks like a promising way to address this and increase ST performance too.

I'm doubtfull of the performance gains it would provide for single threaded apps, but either way in the real world it's ever more rare for an application to be single threaded and it's even rarer - or let's say impossible unless you're talking embedded socs - for it to be running alone in a system. Hyperthreading exists because it provides a huge efficiency boost and it's kind of nuts to suggest killing it.

chrcoluk · Jul 4, 2024

Disclosing is like a tutorial how to exploit it.

Curious how much the patch nerfs the CPUs. Wont update the bios but of course windows update might force the microcode.

Reread the first post, looks like existing software mitigations already exist for it.

R-T-B · Jul 4, 2024

chrcoluk said:
Disclosing is like a tutorial how to exploit it.

That's how software research is done, yes.

kondamin · Jul 4, 2024

R-T-B said:
That's how software research is done, yes.

Unless you are a contractor for governments, then you bundle it in to hacking tools and don’t speak a word about it until some scandal reveals it to be used for unethical stuff in the best case

DutchTraveller · Jul 4, 2024

trsttte said:
I'm doubtfull of the performance gains it would provide for single threaded apps, but either way in the real world it's ever more rare for an application to be single threaded and it's even rarer - or let's say impossible unless you're talking embedded socs - for it to be running alone in a system. Hyperthreading exists because it provides a huge efficiency boost and it's kind of nuts to suggest killing it.

For servers hyperthreading is worth it because it maximizes throughput and you should optimize your application for that.
But there is Amdahl's Law to take into account: there are often parts that are limited by the single-threaded performance.

I know a large commercial and very expensive application that uses an in-memory database but most of the many theads accessing that database need to lock it. The end result is that it is very dependent on the single-threaded performance.
On the desktop the situation is different, when cpu's with hyperthreading came out there were applications (mostly games) where it was advised to turn off hyperthreading.
We will see how it works out in practice when Lunar Lake comes out..

trsttte · Jul 4, 2024

DutchTraveller said:
For servers hyperthreading is worth it because it maximizes throughput and you should optimize your application for that.

The exact same thing happens in a desktop, if you open task manager you'll notice you have around 2000 threads running at any given moment. They vary wildly in the workload and resources they require but if just 1% is very active that's already 20 threads.

Hyperthreading is almost always worth it and it's very easy to explain why: imagine a laundry service only starting a new washing cycle once the previous load comes out of the dryers. A core has many different parts to it, some of them can even be redundant (i.e. to handle higher bit counts or whatever) and it doesn't make sense to have them all doing nothing when the first operation is almost out the door.

R-T-B · Jul 4, 2024

kondamin said:
Unless you are a contractor for governments, then you bundle it in to hacking tools and don’t speak a word about it until some scandal reveals it to be used for unethical stuff in the best case

That wouldn't be a software researcher now would it? That would be someone with an agenda.

chrcoluk · Jul 4, 2024

trsttte said:
The exact same thing happens in a desktop, if you open task manager you'll notice you have around 2000 threads running at any given moment. They vary wildly in the workload and resources they require but if just 1% is very active that's already 20 threads.

Hyperthreading is almost always worth it and it's very easy to explain why: imagine a laundry service only starting a new washing cycle once the previous load comes out of the dryers. A core has many different parts to it, some of them can even be redundant (i.e. to handle higher bit counts or whatever) and it doesn't make sense to have them all doing nothing when the first operation is almost out the door.

The laundry aspect I feel is a bad comparison compared to the scenario you painted with all the background threads.

Where HTT is at its best is if all available cores on the system are heavily loaded, but they have gaps for i/o wait. Assigning two threads per core allows a second thread to do processing whilst the first thread is waiting. This is why HTT shines in a few workloads such as software encoding but doesnt do much for general desktop use.

Your example with the washing machine is that, the machine whilst in use is busy and cannot be used by anyone else, on a windows desktop with 2000 threads, these threads will typically only be active for very short periods of time. The contention on a low utilised CPU means it can handle multiple threads fine.

One advantage of having two classes of CPU cores in a CPU is one set of cores can have all the background stuff sent to it, which reduces thread contention for the foreground interactive app. So e.g. if you have say 2000 threads on a system, what happens if 1992 of those threads are all on the e-cores and a 8 threaded game has 8 p-cores all to itself, thats vastly superior to any legacy HTT one class CPU system.

Whether HTT is worth with it all of this going on I suppose is down to opinion, do I think an extra 10% performance for an extra 50% of power is worth it? Usually no. But for some people, they want extra performance no matter the cost, they dont care if it makes the CPU throttle due to excessive temps or requires an extra 100w of power. Every % of throughput matters. Of course HTT has extra costs now with all the security problems as well. All the security mitigations I have seen for HTT is effectively disabling HTT to mitigate it.

I think when we talk between us its down to opinions, but I think when a CPU manufacturer decides the direction needs to change, its more than an opinion, they make the chips, they know the cost in terms of silicon space, the trade offs, the gains from removing HTT and so forth, so if intel are removing HTT, then its likely a net gain from it.

I think the only workloads I have seen HTT benefit is in the server space, or on desktop CPU based encoding. Otherwise its just benchmarks. I think it can help on highly threaded games, if your physical core count is below 8, but above that the benefit seems within margin of error. HTT can actually reduce performance as well, a 2 threaded game will run better if its using 2 physical cores compared to 2 logical cores on the same physical core.

R-T-B · Jul 4, 2024

You guys do realize this has nothing to do with hyperthreading, right?

trsttte · Jul 4, 2024

chrcoluk said:
The laundry aspect I feel is a bad comparison compared to the scenario you painted with all the background threads.

Where HTT is at its best is if all available cores on the system are heavily loaded, but they have gaps for i/o wait. Assigning two threads per core allows a second thread to do processing whilst the first thread is waiting. This is why HTT shines in a few workloads such as software encoding but doesnt do much for general desktop use.

Your example with the washing machine is that, the machine whilst in use is busy and cannot be used by anyone else, on a windows desktop with 2000 threads, these threads will typically only be active for very short periods of time. The contention on a low utilised CPU means it can handle multiple threads fine.

One advantage of having two classes of CPU cores in a CPU is one set of cores can have all the background stuff sent to it, which reduces thread contention for the foreground interactive app. So e.g. if you have say 2000 threads on a system, what happens if 1992 of those threads are all on the e-cores and a 8 threaded game has 8 p-cores all to itself, thats vastly superior to any legacy HTT one class CPU system.

Whether HTT is worth with it all of this going on I suppose is down to opinion, do I think an extra 10% performance for an extra 50% of power is worth it? Usually no. But for some people, they want extra performance no matter the cost, they dont care if it makes the CPU throttle due to excessive temps or requires an extra 100w of power. Every % of throughput matters. Of course HTT has extra costs now with all the security problems as well. All the security mitigations I have seen for HTT is effectively disabling HTT to mitigate it.

I think when we talk between us its down to opinions, but I think when a CPU manufacturer decides the direction needs to change, its more than an opinion, they make the chips, they know the cost in terms of silicon space, the trade offs, the gains from removing HTT and so forth, so if intel are removing HTT, then its likely a net gain from it.

I think the only workloads I have seen HTT benefit is in the server space, or on desktop CPU based encoding. Otherwise its just benchmarks. I think it can help on highly threaded games, if your physical core count is below 8, but above that the benefit seems within margin of error. HTT can actually reduce performance as well, a 2 threaded game will run better if its using 2 physical cores compared to 2 logical cores on the same physical core.

No need to tell me, it was just a very layman explanation to be more easily understood

Scrizz · Jul 7, 2024

trsttte said:
No need to tell me, it was just a very layman explanation to be more easily understood

sounded more like pipelining than HTT

System Name	Pioneer
Processor	Ryzen 9 9950X
Motherboard	GIGABYTE Aorus Elite X670 AX
Cooling	Noctua NH-D15 + A whole lotta Sunon, Phanteks and Corsair Maglev blower fans...
Memory	64GB (2x 32GB) G.Skill Flare X5 @ DDR5-6000 CL30
Video Card(s)	XFX RX 7900 XTX Speedster Merc 310
Storage	Intel 5800X Optane 800GB boot, +2x Crucial P5 Plus 2TB PCIe 4.0 NVMe SSDs
Display(s)	55" LG 55" B9 OLED 4K Display
Case	Thermaltake Core X31
Audio Device(s)	TOSLINK->Schiit Modi MB->Asgard 2 DAC Amp->AKG Pro K712 Headphones or HDMI->B9 OLED
Power Supply	FSP Hydro Ti Pro 850W
Mouse	Logitech G305 Lightspeed Wireless
Keyboard	WASD Code v3 with Cherry Green keyswitches + PBT DS keycaps
Software	Gentoo Linux x64 / Windows 11 Enterprise IoT 2024

System Name	Main PC
Processor	13700k
Motherboard	Asrock Z690 Steel Legend D4 - Bios 13.02
Cooling	Noctua NH-D15S
Memory	32 Gig 3200CL14
Video Card(s)	4080 RTX SUPER FE 16G
Storage	1TB 980 PRO, 2TB SN850X, 2TB DC P4600, 1TB 860 EVO, 2x 3TB WD Red, 2x 4TB WD Red
Display(s)	LG 27GL850
Case	Fractal Define R4
Audio Device(s)	Soundblaster AE-9
Power Supply	Antec HCG 750 Gold
Software	Windows 10 21H2 LTSC

System Name	Pioneer
Processor	Ryzen 9 9950X
Motherboard	GIGABYTE Aorus Elite X670 AX
Cooling	Noctua NH-D15 + A whole lotta Sunon, Phanteks and Corsair Maglev blower fans...
Memory	64GB (2x 32GB) G.Skill Flare X5 @ DDR5-6000 CL30
Video Card(s)	XFX RX 7900 XTX Speedster Merc 310
Storage	Intel 5800X Optane 800GB boot, +2x Crucial P5 Plus 2TB PCIe 4.0 NVMe SSDs
Display(s)	55" LG 55" B9 OLED 4K Display
Case	Thermaltake Core X31
Audio Device(s)	TOSLINK->Schiit Modi MB->Asgard 2 DAC Amp->AKG Pro K712 Headphones or HDMI->B9 OLED
Power Supply	FSP Hydro Ti Pro 850W
Mouse	Logitech G305 Lightspeed Wireless
Keyboard	WASD Code v3 with Cherry Green keyswitches + PBT DS keycaps
Software	Gentoo Linux x64 / Windows 11 Enterprise IoT 2024

System Name	Pioneer
Processor	Ryzen 9 9950X
Motherboard	GIGABYTE Aorus Elite X670 AX
Cooling	Noctua NH-D15 + A whole lotta Sunon, Phanteks and Corsair Maglev blower fans...
Memory	64GB (2x 32GB) G.Skill Flare X5 @ DDR5-6000 CL30
Video Card(s)	XFX RX 7900 XTX Speedster Merc 310
Storage	Intel 5800X Optane 800GB boot, +2x Crucial P5 Plus 2TB PCIe 4.0 NVMe SSDs
Display(s)	55" LG 55" B9 OLED 4K Display
Case	Thermaltake Core X31
Audio Device(s)	TOSLINK->Schiit Modi MB->Asgard 2 DAC Amp->AKG Pro K712 Headphones or HDMI->B9 OLED
Power Supply	FSP Hydro Ti Pro 850W
Mouse	Logitech G305 Lightspeed Wireless
Keyboard	WASD Code v3 with Cherry Green keyswitches + PBT DS keycaps
Software	Gentoo Linux x64 / Windows 11 Enterprise IoT 2024

System Name	Main PC
Processor	13700k
Motherboard	Asrock Z690 Steel Legend D4 - Bios 13.02
Cooling	Noctua NH-D15S
Memory	32 Gig 3200CL14
Video Card(s)	4080 RTX SUPER FE 16G
Storage	1TB 980 PRO, 2TB SN850X, 2TB DC P4600, 1TB 860 EVO, 2x 3TB WD Red, 2x 4TB WD Red
Display(s)	LG 27GL850
Case	Fractal Define R4
Audio Device(s)	Soundblaster AE-9
Power Supply	Antec HCG 750 Gold
Software	Windows 10 21H2 LTSC

"Indirector" is Intel's Latest Branch Predictor Vulnerability, But Patch is Already Out

AleksandarK

News Editor

Daven

Frank_100

DutchTraveller

New Member

R-T-B

trsttte

chrcoluk

R-T-B

kondamin

DutchTraveller

New Member

trsttte

R-T-B

chrcoluk

R-T-B

trsttte

Scrizz

System Name	:)
Processor	Intel 13700k
Motherboard	Gigabyte z790 UD AC
Cooling	Noctua NH-D15
Memory	64GB GSKILL DDR5
Video Card(s)	Gigabyte RTX 4090 Gaming OC
Storage	960GB Optane 905P U.2 SSD + 4TB PCIe4 U.2 SSD
Display(s)	Alienware AW3423DW 175Hz QD-OLED + AOC Agon Pro AG276QZD2 240Hz QD-OLED
Case	Fractal Design Torrent
Audio Device(s)	MOTU M4 - JBL 305P MKII w/2x JL Audio 10 Sealed --- X-Fi Titanium HD - Presonus Eris E5 - JBL 4412
Power Supply	Silverstone 1000W
Mouse	Roccat Kain 122 AIMO
Keyboard	KBD67 Lite / Mammoth75
VR HMD	Reverb G2 V2
Software	Win 11 Pro