Re: [PATCH] x86-64: espfix for 64-bit mode *PROTOTYPE*

From: H. Peter Anvin
Date: Tue Apr 22 2014 - 13:10:34 EST


On 04/22/2014 10:00 AM, Andrew Lutomirski wrote:
>>
>> Yes, you can very much trigger GP deliberately.
>>
>> The way to do it is to just make an invalid segment descriptor on the
>> iret stack. Or make it a valid 16-bit one, but make it a code segment
>> for the stack pointer, or read-only, or whatever. All of which is
>> trivial to do with a sigretun system call. But you can do it other
>> ways too - enter with a SS that is valid, but do a load_ldt() system
>> call that makes it invalid, so that by the time you exit it is no
>> longer valid etc.
>>
>> There's a reason we mark that "iretq" as taking faults with that
>>
>> _ASM_EXTABLE(native_iret, bad_iret)
>>
>> and that "bad_iret" creates a GP fault.
>>
>> And that's a lot of kernel stack. The whole initial GP fault path,
>> which goes to the C code that finds the exception table etc. See
>> do_general_protection_fault() and fixup_exception().
>
> My point is that it may be safe to remove the special espfix fixup
> from #PF, which is probably the most performance-critical piece here,
> aside from iret itself.
>

It *might* even be plausible to do full manual sanitization, so that the
IRET cannot fault, but I have to admit to that being somewhat daunting,
especially given the thread/process distinction. I wasn't actually sure
about the status of the LDT on the thread vs process scale (the GDT is
per-CPU, but has some entries that are context-switched per *thread*,
but I hadn't looked at the LDT recently.)

As for Andy's questions:

> What happens on the IST entries? If I've read your patch right,
> you're still switching back to the normal stack, which looks
> questionable.

No, in that case %rsp won't point into the espfix region, and the switch
will be bypassed. We will resume back into the espfix region on IRET,
which is actually required e.g. if we take an NMI in the middle of the
espfix setup.

> Also, if you want to same some register abuse on each exception entry,
> could you check the saved RIP instead of the current RSP? I.e. use
> the test instruction with offset(%rsp)? Maybe there are multiple
> possible values, though, and just testing some bits doesn't help.

I don't see how that would work.

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/