Re: [RFC 3/4] x86/signal/64: Re-add support for SS in the 64-bit signal context

From: Andy Lutomirski
Date: Wed Oct 14 2015 - 12:41:19 EST


On Oct 13, 2015 10:10 PM, "Stas Sergeev" <stsp@xxxxxxx> wrote:
>
> 13.10.2015 04:04, Andy Lutomirski ÐÐÑÐÑ:
> > + * UC_SIGCONTEXT_SS will be set when delivering 64-bit or x32 signals on
> > + * kernels that save SS in the sigcontext. All kernels that set
> > + * UC_SIGCONTEXT_SS will correctly restore at least the low 32 bits of esp
> > + * regardless of SS (i.e. they implement espfix).
> Is this comment relevant? I think neither signal delivery
> nor sigreturn were affected by esp corruption, or were they?

Every IRET from the kernel to 16-bit mode was affected. That includes
interrupt return, non-signaling page fault return, and all signal
returns. So this could be a hint that, if you see a stack fault (#SS
or page fault) that you don't need to try to work around the
corruption issue.

> I guess you suggest to use that flag as the detection
> for espfix, but I don't think this is relevant: you may
> need to know about espfix also outside of a signal handler.
> In fact, I don't think espfix needs any run-time detection,
> because then the stack fault will simply not happen, and that's all.
> I think it is a matter of compile-time detection only.

True. In any event, there's no code involved - this is just an
observation that all kernels with the new flag have espfix64.

>
> > + *
> > + * Kernels that set UC_SIGCONTEXT_SS will also set UC_STRICT_RESTORE_SS
> > + * when delivering a signal that came from 64-bit code.
> > + *
> > + * Sigreturn modifies its behavior depending on the UC_STRICT_RESTORE_SS
> > + * flag. If UC_STRICT_RESTORE_SS is set, then the SS value in the
> > + * signal context is restored verbatim. If UC_STRICT_RESTORE_SS is not
> > + * set, the CS value in the signal context refers to a 64-bit code
> > + * segment, and the signal context's SS value is invalid, it will be
> > + * replaced by an flat 32-bit selector.
> > +
> > + * This behavior serves two purposes. It ensures that older programs
> > + * that are unaware of the signal context's SS slot and either construct
> > + * a signal context from scratch or that catch signals from segmented
> > + * contexts and change CS to a 64-bit selector won't crash due to a bad
> > + * SS value. It also ensures that signal handlers that do not modify
> > + * the signal context at all return back to the exact CS and SS state
> > + * that they came from.
> Do you need a second flag for this?
> IIRC non-restoring was needed because:
> 1. dosemu saves SS to different place
> 2. If you save it yourself, dosemu can invalidate it, but not replace
> with the right one because of 1.
> IMHO to solve this, you need _either_ the second flag or
> the heuristic, but not both.
>
> With new flag:
> Just don't set it by default, and the new progs will set it themselves.
> Old progs are unaffected.
> When it is set, SS should always be restored.
> I prefer this approach.
>

The down side is that, if we do it that way, returning from a signal
due to a bad SS will silently fix it, possibly with bad effects. That
seems suboptimal. Most of the code is handling the flag on return,
anyway -- setting it is straightforward.

> With heuristic:
> Save SS yourself on delivery, and, if it happens invalid on sigreturn -
> replace it with better one.

Then new DOSEMU will have to set the flag to get the expected
behavior, which seems unfortunate. It also makes tests like
sigreturn_64 much harder to write, which is unfortunate because, if
there's an exploitable bug, attackers will still exploit it by some
more roundabout means.

> Old progs are unaffected because they use iret anyway, and that iret
> happens _after_ sigreturn.

Old progs = just DOSEMU here, I think.

> New progs will never leave invalid SS in the right sigcontext slot.
>
> So why have you choose to have both the new flag UC_STRICT_RESTORE_SS
> and the heuristic?
>
> > This is a bit risky, and another option would be to do nothing at
> > all.
> Andy, could you please stop pretending there are no other solutions?
> You do not have to like them. You do not have to implement them.
> But your continuous re-assertions that they do not exist, make me
> feel a bit uncomfortable after I spelled them many times.
>
> > Stas, what do you think? Could you test this?
> I think I'll get to testing this only at a week-end.
> In a mean time, the question about a safety of leaving LDT SS
> in 64bit mode still makes me wonder. Perhaps, instead of re-iterating
> this here, you can describe this all in the patch comments? Namely:
> - How will LDT SS interact with nested signals

The kernel doesn't think about nested signals. If the inner signal is
delivered while SS is in the LDT, the kernel will try to keep it as is
and will stick whatever was in SS when the signal happened in the
inner saved context. On return to the outer signal, it'll restore it
following the UC_STRICT_RESTORE_SS rules.

> - with syscalls

64-bit syscalls change SS to some default flat value as a side-effect.
(Actually, IIRC, 64-bit syscalls change it specifically to __USER_DS,
but, on Xen, 64-bit fast syscall returns may silently flip it to a
different flat selector.)

> - with siglongjmp()

siglongjmp is a glibc thing. It should work the same way it always
did. If it internally does a syscall (sigprocmask or whatever), that
will override SS.

> - with another thread. Do we have a per-thread or a per-process LDT
> these days? If LDT is per-process, my question is what will happen
> if another thread invalidates an LDT entry while we are in 64bit mode.
> If LDT is per-thread, there is no such question.

The LDT is per-process. If you have some SS value loaded and another
thread invalidates it, then you get a signal delivered. On 64-bit
kernels, it's the same SIGSEGV you'd get if you tried to directly load
the bad SS value using IRET from user mode and, on 32-bit kernels,
it's SIGILL. On kernels before 4.2 and that don't have the fix
backported (IIRC), the signal may be non-deterministically deferred.

>
> > If SS starts out invalid (this can happen if the signal was caused
> > by an IRET fault or was delivered on the way out of set_thread_area
> > or modify_ldt), then IRET to the signal handler can fail, eventually
> > killing the task.
> Is this signal-pecific? I.e. the return from IRQs happens via iret too.
> So if we are running with invalid SS in 64bit mode, can the iret from
> IRQ also cause the problem?
>

On new kernels, you can't run with invalid SS under any conditions.
On old kernels, you could, but only due to a modify_ldt race, and, if
you did that, then you could get non-determinstically killed.

>
> On an off-topic: there was recently a patch from you that
> disables vm86() by mmap_min_addr. I've found that dosemu, when
> started as root, could override mmap_min_addr. I guess this will
> no longer work, right? Not a big regression, just something to
> know and document.

As root, mmap_min_addr isn't enforced. Calling mmap and then dropping
privileges would still keep the old mappings around. We could
potentially rig it so that calling vm86 and then dropping privileges
allows you to keep using vm86 even after dropping privileges.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/