Re: [PATCH] x86/ptrace: Remove questionable TS_COMPAT usage in ptrace

From: Andy Lutomirski
Date: Sun Jun 19 2016 - 18:10:23 EST


On Sat, Jun 18, 2016 at 10:02 AM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
> On Jun 18, 2016 6:56 AM, "Pedro Alves" <pedro@xxxxxxxxxx> wrote:
>>
>> On 06/18/2016 11:21 AM, Andy Lutomirski wrote:
>> > A 32-bit tracer can set a tracee's TS_COMPAT flag by poking orig_ax.
>> >
>> > - If the tracee is stopped in a 32-bit syscall, this has no effect
>> > as TS_COMPAT will already be set.
>> >
>> > - If the tracee is stopped on entry to a 64-bit syscall, this could
>> > cause problems: in_compat_syscall() etc will be out of sync with
>> > the actual syscall table in use. I can imagine this bre aking
>> > audit. (It can't meaningfully break seccomp, as a malicious
>> > tracer can already fully bypass seccomp.) I could also imagine it
>> > subtly confusing the implementations of a few syscalls.
>> >
>> > - If the tracee is stopped in a non-syscall context, then TS_COMPAT
>> > will end up set in user mode, which isn't supposed to happen. The results
>> > are likely to be similar to the 64-bit syscall entry case.
>> >
>> > Fix it by deleting the code.
>> >
>> > Here's my understanding of the previous intent. Suppose a
>> > 32-bit tracee makes a syscall that returns one of the -ERESTART
>> > codes. A 32-bit tracer saves away its register state. The tracee
>> > resumes, returns from the system call, and gets stopped again for a
>> > non-syscall reason (e.g. a signal). Then the tracer tries to roll
>> > back the tracee's state by writing all of the saved registers back.
>> >
>> > The result of this sequence of events will be that the tracee's
>> > registers' low bits match what they were at the end of the syscall
>> > but TS_COMPAT will be clear. This will cause syscall_get_error() to
>> > return a *positive* number, because we zero-extend registers poked
>> > by 32-bit tracers instead of sign-extending them. As a result, with
>> > this change, syscall restart won't happen, whereas, historically, it
>> > would have happened.
>> >
>> > As far as I can tell, this corner case isn't very important, and I
>>
>> I believe it's actually very much very important for gdb, for restoring
>> the inferior state when the user calls a function in the inferior, with:
>>
>> (gdb) print foo()
>>
>> Some background here:
>>
>> http://linux-kernel.vger.kernel.narkive.com/fqrh4xKK/patch-x86-ptrace-sign-extend-eax-with-orig-eax-0
>
> Yuck. I should have dug in to the history. Why not just
> unconditionally sign-extend eax when set by a 32-bit tracer?
>
> Do you know how to acquire a copy of erestartsys-trap.c? The old
> links appear to be broken.
>
> Also, while I have your attention: when gdb restores old state like
> this, does it do it with individual calls to PTRACE_POKEUSER or does
> it use SETREGSET or similar to do it all at once? I'm asking because
> I have some other code (fsgsbase) that's on hold until I can figure
> out how to keep it from breaking gdb if and when gdb writes to fs and
> fs_base.

OK, I studied this a bit. What a mess! Here's what's going on AFAICT:

1. For reasons that probably made sense, the kernel delivers signals
and triggers ptrace before handling syscall restart. This means that
-ERESTART_RESTARTBLOCK, etc is visible to userspace. We could
plausibly get away with changing that, but it seems quite risky.

2. As a result of (1), gdb (quite reasonably) expects that it can
snapshot user state on signal delivery, adjust regs to call a
function, and then restore user state.

3. Presumably as a result of (2), we do syscall restart if indicated
by the register state on ptrace resume even if we're *not* resuming a
syscall.

4. Also as a result of (1), gdb expects that writing -1 to orig_eax
via POKEUSER or similar will *disable* syscall restart, which is
necessary to get function calling on syscall exit to work.

The combination of (1) and (4) means that we need to skip syscall
restart if orig_eax == -1 (in a 32-bit signed sense). The combination
of (1) and (2) means that we need to enable syscall restart if
orig_eax > 0 (in a 32-bit signed sense) and eax contains a
-ERESTARTxyz code (again in a signed sense). And, for extra fun, we
need to make sure that, if we set __NR_restart_syscall, we set the
correct value based on the syscall that we're restarting.

The latter bit is a mess and is probably broken on current kernels for
64-bit gdb attached to a 32-bit process. (Is it? All of this stuff
is a bit of a pain to test.)

Here's my proposal:

Step 1: for 4.7 and for -stable, introduce TS_I386_REGS_POKED. Set it
in putreg32. Use it in syscall_get_error, get_nr_restart_syscall,
etc. Clear it in do_signal. This gets rid of the arch confusion
while (hopefully) preserving current behavior.

Step 2: Optionally, for 4.8, consider cleaning up the whole mess.
First, change putreg32 to sign-extend orig_eax and eax and just get
rid of the TS_COMPAT checks in syscall_get_nr and syscall_get_error.
Then we either change the 64-bit -ERESTART_BLOCK to a different number
or encode the desired restart bitness in restart_block or similar.

I wonder if we could actually get away with doing syscall restart
processing before ptrace invocation.

--Andy