execve-under-ptrace API bug (was Re: Ptrace documentation, draft #3)

From: Denys Vlasenko
Date: Sun May 29 2011 - 23:28:29 EST


On Wednesday 25 May 2011 16:32, Tejun Heo wrote:
> > 1.x execve under ptrace.
> >
> ...
> > ** we get death notification: leader died: **
> > PID0 exit(0) = ?
> > ** we get syscall-entry-stop in thread 1: **
> > PID1 execve("/bin/foo", "foo" <unfinished ...>
> > ** we get syscall-entry-stop in thread 2: **
> > PID2 execve("/bin/bar", "bar" <unfinished ...>
> > ** we get PTRACE_EVENT_EXEC for PID0, we issue PTRACE_SYSCALL **
> > ** we get syscall-exit-stop for PID0: **
> > PID0 <... execve resumed> ) = 0
> >
> > ??? Question: WHICH execve succeeded? Can tracer figure it out?
>
> Hmmm... I don't know. Maybe we can set ptrace message to the original
> tid?

The problem with execve is bigger than merely reporting this pid.

Consider how strace tracks its tracees. Currently, it remembers
their pids - sometimes by remembering clone's return values!
This is hopelessly broken wrt pid namespaces.

So I looked at removing all pid tracking from strace, because
it uses pids only for some (extremely fragile) workarounds
for old kernel bugs, such as: it suspends waitpid's in tracees
until there is a child it can wait for; it detaches from
a tracee if it gets signaled with a fatal signal or calls exit;
and similar madness.

There are many bugs in strace in this area, because it cannot
properly emulate a lot of things (such as signal interrupting
waitpid, waitpid(-PGID), etc).

Therefore I plan to delete this madness.

The idea is that strace can simply create a new tracee's data
structure when it sees a new, never before seen pid popping up
from waitpid - this means that [v]fork/clone created a child,
and now it is traced too. It does not need to know beforehand
about its pid. It does not need to know who is whose parent
or sibling.

This works (I have a patch against a somewhat older strace),
but now in light of this "interesting" execve-under-ptrace
behavior it appears to have a flaw: all threads except the
execve'ing one disappear without any notification to strace,
therefore strace doesn't know which tracee data ("struct tcb"
in strace-speak) need to be dropped!

I am not sure current strace handles this correctly either.
I will be very surprised if it does.

I think the API needs fixing. Tracee must never disappear like that
on execve (or in any other case). They must always deliver a
WIFEXITED or WIFSIGNALED notification, allowing tracer to know
that they are gone. We probably also need to document how are these
"I died on execve" notifications are ordered wrt PTRACE_EVENT_EXEC
stop in execve-ing thread.

Ideas?


--
vda


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/