Tuesday, January 23rd 2018

Intel's Patch for Meltdown, Spectre "Complete and Utter Garbage:" Linus Torvalds

Linus Torvalds, creator of Linux, the most popular datacenter operating system, proclaimed Intel's patches for the recent Meltdown and Spectre CPU vulnerabilities "complete and utter garbage." Torvalds continues to work on the innermost code of Linux, and has been closely associated with kernel patches that are supposed to work in conjunction with updated CPU microcode to mitigate the two vulnerabilities that threaten to severely compromise security of data-centers and cloud-computing service providers.

Torvalds, in a heated public chain-mail with David Woodhouse, an Amazon engineer based out of the UK, called Intel's fix "insane" and questioned its intent behind making the patch "toggle-able" (any admin can disable the patch to a seemingly cataclysmic vulnerability, which can bring down a Fortune 500 company). Torvalds also takes issue with redundant fixes to vulnerabilities already patched by Google Project Zero "retpoline" technique. Later down in the thread, Woodhouse admits that there's no good reason for Intel's patches to be an "opt-in." Intel commented on this exchange with a vanilla-flavored potato: "We take the feedback of industry partners seriously. We are actively engaging with the Linux community, including Linus, as we seek to work together on solutions."
Source: TechCrunch
Add your own comment

16 Comments on Intel's Patch for Meltdown, Spectre "Complete and Utter Garbage:" Linus Torvalds

#1
RejZoR
Well, if you can just simply toggle the patch, malware can do that too. And then siphon data through a cache exploit undetected lol
Posted on Reply
#2
Death Star
I'm not defending Intel's handling of this catastrophe in the slightest, but there are a few pertinent followup e-mails in the chain, which at least offer a bit of additional explanation:

lkml.iu.edu/hypermail/linux/kernel/1801.2/05282.html
On Sun, 2018-01-21 at 14:27 -0800, Linus Torvalds wrote:
> On Sun, Jan 21, 2018 at 2:00 PM, David Woodhouse <dwmw2@xxxxxxxxxxxxx> wrote:
> >>
> >> The patches do things like add the garbage MSR writes to the kernel
> >> entry/exit points. That's insane. That says "we're trying to protect
> >> the kernel". We already have retpoline there, with less overhead.
> >
> > You're looking at IBRS usage, not IBPB. They are different things.
>
> Ehh. Odd intel naming detail.
>
> If you look at this series, it very much does that kernel entry/exit
> stuff. It was patch 10/10, iirc. In fact, the patch I was replying to
> was explicitly setting that garbage up.
>
> And I really don't want to see these garbage patches just mindlessly
> sent around.

I think we've covered the technical part of this now, not that you like
it â not that any of us *like* it. But since the peanut gallery is
paying lots of attention it's probably worth explaining it a little
more for their benefit.

This is all about Spectre variant 2, where the CPU can be tricked into
mispredicting the target of an indirect branch. And I'm specifically
looking at what we can do on *current* hardware, where we're limited to
the hacks they can manage to add in the microcode.

The new microcode from Intel and AMD adds three new features.

One new feature (IBPB) is a complete barrier for branch prediction.
After frobbing this, no branch targets learned earlier are going to be
used. It's kind of expensive (order of magnitude ~4000 cycles).

The second (STIBP) protects a hyperthread sibling from following branch
predictions which were learned on another sibling. You *might* want
this when running unrelated processes in userspace, for example. Or
different VM guests running on HT siblings.

The third feature (IBRS) is more complicated. It's designed to be
set when you enter a more privileged execution mode (i.e. the kernel).
It prevents branch targets learned in a less-privileged execution mode,
BEFORE IT WAS MOST RECENTLY SET, from taking effect. But it's not just
a 'set-and-forget' feature, it also has barrier-like semantics and
needs to be set on *each* entry into the kernel (from userspace or a VM
guest). It's *also* expensive. And a vile hack, but for a while it was
the only option we had.

Even with IBRS, the CPU cannot tell the difference between different
userspace processes, and between different VM guests. So in addition to
IBRS to protect the kernel, we need the full IBPB barrier on context
switch and vmexit. And maybe STIBP while they're running.

Then along came Paul with the cunning plan of "oh, indirect branches
can be exploited? Screw it, let's not have any of *those* then", which
is retpoline. And it's a *lot* faster than frobbing IBRS on every entry
into the kernel. It's a massive performance win.

So now we *mostly* don't need IBRS. We build with retpoline, use IBPB
on context switches/vmexit (which is in the first part of this patch
series before IBRS is added), and we're safe. We even refactored the
patch series to put retpoline first.

But wait, why did I say "mostly"? Well, not everyone has a retpoline
compiler yet... but OK, screw them; they need to update.

Then there's Skylake, and that generation of CPU cores. For complicated
reasons they actually end up being vulnerable not just on indirect
branches, but also on a 'ret' in some circumstances (such as 16+ CALLs
in a deep chain).

The IBRS solution, ugly though it is, did address that. Retpoline
doesn't. There are patches being floated to detect and prevent deep
stacks, and deal with some of the other special cases that bite on SKL,
but those are icky too. And in fact IBRS performance isn't anywhere
near as bad on this generation of CPUs as it is on earlier CPUs
*anyway*, which makes it not quite so insane to *contemplate* using it
as Intel proposed.

That's why my initial idea, as implemented in this RFC patchset, was to
stick with IBRS on Skylake, and use retpoline everywhere else. I'll
give you "garbage patches", but they weren't being "just mindlessly
sent around". If we're going to drop IBRS support and accept the
caveats, then let's do it as a conscious decision having seen what it
would look like, not just drop it quietly because poor Davey is too
scared that Linus might shout at him again. :)

I have seen *hand-wavy* analyses of the Skylake thing that mean I'm not
actually lying awake at night fretting about it, but nothing concrete
that really says it's OK.

If you view retpoline as a performance optimisation, which is how it
first arrived, then it's rather unconventional to say "well, it only
opens a *little* bit of a security hole but it does go nice and fast so
let's do it".

But fine, I'm content with ditching the use of IBRS to protect the
kernel, and I'm not even surprised. There's a *reason* we put it last
in the series, as both the most contentious and most dispensable part.
I'd be *happier* with a coherent analysis showing Skylake is still OK,
but hey-ho, screw Skylake.

The early part of the series adds the new feature bits and detects when
it can turn KPTI off on non-Meltdown-vulnerable Intel CPUs, and also
supports the IBPB barrier that we need to make retpoline complete. That
much I think we definitely *do* want. There have been a bunch of us
working on this behind the scenes; one of us will probably post that
bit in the next day or so.

I think we also want to expose IBRS to VM guests, even if we don't use
it ourselves. Because Windows guests (and RHEL guests; yay!) do use it.

If we can be done with the shouty part, I'd actually quite like to have
a sensible discussion about when, if ever, we do IBPB on context switch
(ptraceability and dumpable have both been suggested) and when, if
ever, we set STIPB in userspace.
Posted on Reply
#3
R-T-B
This is also best taken in context: Linus is only referencing the submitted linux kernel patches. Windows and microcode patches need not apply here.
Posted on Reply
#4
Steevo
Haha. Primitive hardware functions designed to make Intel faster are multi-floor screen doors in a submarine.
Posted on Reply
#5
Darmok N Jalad
RejZoRWell, if you can just simply toggle the patch, malware can do that too. And then siphon data through a cache exploit undetected lol
Woohoo! The return of the “Turbo” button on PCs!
Posted on Reply
#6
AltCapwn
Since I've installed the fix on all our enterprise PCs, we have a fruck load of random issues. A lot of PC did slow down too. I'm angry at Intel right now.
Posted on Reply
#7
R-T-B
altcapwnSince I've installed the fix on all our enterprise PCs, we have a fruck load of random issues. A lot of PC did slow down too. I'm angry at Intel right now.
The meltdown fix should be pretty problem free.

I assume you are talking about the microcode fix for spectre?
Posted on Reply
#8
AltCapwn
R-T-BThe meltdown fix should be pretty problem free.

I assume you are talking about the microcode fix for spectre?
Yes exactly.
Posted on Reply
#9
Papahyooie
I just want to reiterate: "Vanilla-flavored potato."
Posted on Reply
#10
Aquinus
Resident Wat-man
R-T-BI assume you are talking about the microcode fix for spectre?
Important takeaway is this part of Linus' response:
Linus TorvaldsThat's part of the big problem here. The speculation control cpuid stuff shows that Intel actually seems to plan on doing the right thing for meltdown (the main question being _when_). Which is not a huge surprise, since it should be easy to fix, and it's a really honking big hole to drive through. Not doing the right thing for meltdown would be completely unacceptable.

So the IBRS garbage implies that Intel is _not_ planning on doing the right thing for the indirect branch speculation.

Honestly, that's completely unacceptable too.
Big Edit: More or less, I read that as Intel not making a microcode update for the indirect branch speculation stuff. I don't do kernel and system dev but, I can kind of understand what they're talking about when I read through the thread (which I did.) The main problem seems that there isn't a clear way to solve this issue if it's going to be fixed at the OS level in the kernel instead of as a microcode update. On one hand you have a hole that, depending on the context in which is has been run, may be a security vulnerability. However on the other hand, doing a microcode update very well could mean a substantial performance hit, possibly one even bigger than retpoline (which isn't in places where it doesn't have to be, mind you.)

So I see it like this: Intel could fix it with a microcode update but, that will cost more performance across the board but, will patch the hole for good or it could be left up to kernel and software developers to determine if and when protections from this kind of exploit are required. I personally think that's a big ask of the application development community because we (and I say this as an application dev,) that I don't want to be thinking about when I need to protect hardware from an attack and I think Linus is thinking the same thing.

Honestly, I don't care what the performance hit is. Intel needs to man up and fix this instead of trying to pass the buck. It's a problem that they need to own up to and I would hold AMD and ARM to similar standards. I understand that these things happen. At work I've spent the last several days fixing bugs and they happen more than you realize, but if something makes it to production, you fix it as quickly as possible. If it hurts performance, that can be part of the next release (for CPUs that would be next gen,) but you have to freaking fix it.

So, rant over, tl;dr: Intel needs to fix this, regardless of the performance hit. Not doing a microcode update for this is unacceptable as Linus suggests.
Posted on Reply
#11
Katanai
AquinusSo, rant over, tl;dr: Intel needs to fix this, regardless of the performance hit.
This would be unacceptable for me and a lot of people. Think about it this way: if you lose 5-10% performance on your CPU its like you changed out your CPU for an older generation chip. Even so, let's say it might work for you and me but how about servers with hundreds of CPU's in them? The performance loss would be massive. Maybe a company changed out 256 CPU's in a cluster to a newer generation so they get 10% CPU increase. What now? You take all that back? Give them the money they spent back then. Believe me, they will ask for it, in court...
Posted on Reply
#12
R-T-B
KatanaiBelieve me, they will ask for it, in court...
They already are. However for any serious company hosting datacenter, not patching is simply not an option. You'd have your servers completely at the mercy of your users.
Posted on Reply
#13
hat
Enthusiast
I wonder when we can expect a hardware fix that works properly without degrading performance? Personally I have no issues waiting for whatever comes after the current generation...
Posted on Reply
#14
londiste
As the referenced email thread states, the patches in question are for Spectre (apparently for Variant 2 of it).
RejZoRWell, if you can just simply toggle the patch, malware can do that too. And then siphon data through a cache exploit undetected lol
Again, if you have this kind of access to the operating system kernel, you have no need for something like Spectre or Meltdown.
R-T-BThis is also best taken in context: Linus is only referencing the submitted linux kernel patches. Windows and microcode patches need not apply here.
One has to wonder if Microsoft is fighting back at Intel in the same way. Microcode patches these kernel updates rely on, are common for both/all operating systems.
AquinusHonestly, I don't care what the performance hit is. Intel needs to man up and fix this instead of trying to pass the buck. It's a problem that they need to own up to and I would hold AMD and ARM to similar standards. I understand that these things happen. At work I've spent the last several days fixing bugs and they happen more than you realize, but if something makes it to production, you fix it as quickly as possible. If it hurts performance, that can be part of the next release (for CPUs that would be next gen,) but you have to freaking fix it.

So, rant over, tl;dr: Intel needs to fix this, regardless of the performance hit. Not doing a microcode update for this is unacceptable as Linus suggests.
Microcode does get updated either way. The features these kernel patches rely on should come (or get updated) with the microcode updates.
AMD and ARM are an interesting question here. Were their patches for this good?
Posted on Reply
#15
Melvis
LOL That just makes me laugh when I read what Linus wrote, what a dude! Stick it to them man and make them fix there shit, intel....the company no one can trust.
Posted on Reply
#16
xenocide
AquinusHonestly, I don't care what the performance hit is. Intel needs to man up and fix this instead of trying to pass the buck. It's a problem that they need to own up to and I would hold AMD and ARM to similar standards. I understand that these things happen. At work I've spent the last several days fixing bugs and they happen more than you realize, but if something makes it to production, you fix it as quickly as possible. If it hurts performance, that can be part of the next release (for CPUs that would be next gen,) but you have to freaking fix it.

So, rant over, tl;dr: Intel needs to fix this, regardless of the performance hit. Not doing a microcode update for this is unacceptable as Linus suggests.
The "fix" for this is designing a new CPU from the ground up. I'm confused by your post because you seem to acknowledge that, but also ran about how they aren't doing enough. There's not much they can do, it's an architectural problem, and they are trying to find the least negative solution to mitigate the security threat--since the places most at risk for this are datacenters which are using dozens or thousands of Intel CPU's at a time. We've already seen what can happen when all those CPU's take a hit--various games had servers that had to be taken offline the first time they pushed out a patch that involved a 5-10% performance hit. If they pushed out a roughly thrown together patch with a 25% performance hit, they would probably crash half the Internet...
Posted on Reply
Add your own comment
Jun 1st, 2024 16:15 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts