FSGSBASE ABI considerations

From: Andy Lutomirski
Date: Sun Jul 30 2017 - 23:08:45 EST


Hi all-

Chang wants to get the FSGSBASE patches in. Here's a bit on a brain
dump on what I think the relevant considerations are and why I haven't
sent out my patches.

----- Background -----

Setting CR4.FSGSBASE has two major advantages and one major
disadvantage. The major advantages are:

- We can avoid some WRMSR instructions in the context switch path,
which makes a rather large difference.

- User code can use the new RD/WR FS/GS BASE instructions.
Apparently some users really want this for, umm, userspace threading.
Think Java.

The major disadvantage is that user code can use the new instructions.
Now userspace is going to do totally stupid shite like writing some
nonzero value to GS and then doing WRGSBASE or like linking some
idiotic library that uses WRGSBASE into a perfectly innocent program
like dosemu2 and resulting in utterly nonsensical descriptor state.

In Windows, supposedly the scheduler reserves the right to do
arbitrarily awful things to you if you use WRFSBASE or WRGSBASE
inappropriately. Andi took a similar approach in his original
FSGSBASE patches. I think this is wrong and we need to have sensible,
documented, and tested behavior for what happens when you use the new
instructions.

For simplicity, the text below talks about WRGSBASE and ignores
WRFSBASE. The ABI considerations are identical, even if the kernel
implementation details are different.

----- Requirements -----

In my book, there's only one sensible choice for what happens when you
are scheduled out and back in on a Linux system with FSGSBASE enabled:
all of your descriptors end up *exactly* the way they were when you
scheduled out.

ptrace users need to keep working. It would be nice if existing gdb
versions continue to work right when user code uses WRGSBASE, but it
might be okay if a new ptrace interface is needed. The existing
regset ABI is exactly backwards from what it needs to be to make this
easy.

----- interaction with modify_ldt() -----

The first sticking point we'll hit is modify_ldt() and, in particular,
what happens if you call modify_ldt() to change the base of a segment
that is ioaded into gs by another thread in the same mm.

Our current behavior here is nonsensical: on 32-bit kernels, FS would
be fully refreshed on other threads and GS might be depending on
compiler options. On 64-bit kernels, neither FS nor GS is immediately
refreshed. Historically, we didn't refresh anything reliably. On the
bright side, this means that existing modify_ldt() users are (AFAIK)
tolerant of somewhat crazy behavior.

On an FSGSBASE-enabled system, I think we need to provide
deterministic, documented, tested behavior. I can think of three
plausible choices:

1a. modify_ldt() immediately updates FSBASE and GSBASE all threads
that reference the modified selector.

1b. modify_ldt() immediatley updates FSBASE and GSBASE on all threads
that reference the LDT.

2. modify_ldt() leaves FSBASE and GSBASE alone on all threads.

(2) is trivial to implement, whereas (1a) and (1b) are a bit nasty to
implement when FSGSBASE is on.

The tricky bit is that 32-bit kernels can't do (2), so, if we want
modify_ldt() to behave the same on 32-bit and 64-bit kernels, we're
stuck with (1). (I think we can implement (2) with acceptable
performance on 64-bit non-FSGSBASE kernels if we wanted to.)

Thoughts?

----- Interaction with ptrace -----

struct user_regs_struct looks like this:

...
unsigned long fs_base;
unsigned long gs_base;
unsigned long ds;
unsigned long es;
unsigned long fs;
unsigned long gs;
...

This means that, when gdb saves away a regset and reloads it using
PTRACE_SETREGS or similar, the effect is to load gs_base and then load
gs. If gs != 0, this will blow away gs_base. Without FSGSBASE, this
doesn't matter so much. With FSGSBASE, it means that using gdb to do,
say, 'print func()' may corrupt gsbase.

What, if anything, should we do about this? One option would be to
make gs_base be accurate all the time (it currently isn't) and teach
PTRACE_SETREGS to restore in the opposite order despite the struct
layout.

Thoughts?