Re: [PATCH] entry: Fix missed trap after single-step on system call return

From: Andy Lutomirski
Date: Wed Feb 03 2021 - 13:19:27 EST


On Wed, Feb 3, 2021 at 10:10 AM Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> On Wed, Feb 3, 2021 at 10:00 AM Gabriel Krisman Bertazi
> <krisman@xxxxxxxxxxxxx> wrote:
> >
> > Does the patch below follows your suggestion? I'm setting the
> > SYSCALL_WORK shadowing TIF_SINGLESTEP every time, instead of only when
> > the child is inside a system call. Is this acceptable?
>
> Looks sane to me.
>
> My main worry would be about "what about the next system call"? It's
> not what Kyle's case cares about, but let me just give an example:
>
> - task A traces task B, and starts single-stepping. Task B was *not*
> in a system call at this point.
>
> - task B happily executes one instruction at a time, takes a TF
> fault, everything is good
>
> - task B now does a system call. That will disable single-stepping
> while in the kernel
>
> - task B returns from the system call. TF will be set in eflags, but
> the first instruction *after* the system call will execute unless we
> go through the system call exit path
>
> So I think the tracer basically misses one instruction when single-stepping.

I was hoping you wouldn't ask this :)

The x86 architecture is fundamentally a bit busted here. If we return
from a system call with SYSRET and TF is set in R11, then SYSRET
traps, which means that #DB is delivered before executing a user
instruction. I have been asking Intel for quite a while to document
this, and they said they did, but I still can't find it. IRET is the
opposite: if we return from a system call with IRET and TF is set on
the stack, we execute one user instruction and then trap.

So if we want to reliably single-step a system call and trap after the
system call, we just need to synthesize a trap on the way out. Doing
this and getting all the nasty corners (e.g. sigreturn setting TF,
sigreturn *clearing* TF, signal delivery as part of the syscall,
ptrace mucking with TF) etc right might be nontrivial.

I suspect the behavior back in the bad old asm-entry-path days was at
best inconsistent.

--Andy