Re: [PATCH 3/8] job control: Fix ptracer wait(2) hang and explainnotask_error clearing

From: Oleg Nesterov
Date: Mon Mar 21 2011 - 11:28:59 EST


On 03/08, Tejun Heo wrote:
>
> static void *nooper(void *arg)
> {
> pause();
> return NULL;
> }
>
> int main(void)
> {
> const struct timespec ts1s = { .tv_sec = 1 };
> pid_t tracee, tracer;
> siginfo_t si;
>
> tracee = fork();
> if (tracee == 0) {
> pthread_t thr;
>
> pthread_create(&thr, NULL, nooper, NULL);
> nanosleep(&ts1s, NULL);
> printf("tracee exiting\n");
> pthread_exit(NULL); /* let subthread run */
> }
>
> tracer = fork();
> if (tracer == 0) {
> ptrace(PTRACE_ATTACH, tracee, NULL, NULL);
> while (1) {
> if (waitid(P_PID, tracee, &si, WSTOPPED) < 0) {
> perror("waitid");
> break;
> }
> ptrace(PTRACE_CONT, tracee, NULL,
> (void *)(long)si.si_status);
> }
> return 0;
> }
>
> waitid(P_PID, tracer, &si, WEXITED);
> kill(tracee, SIGKILL);
> return 0;
> }
>
> Before the patch, after the tracee becomes a zombie, the tracer's
> waitid(WSTOPPED) never returns and the program doesn't terminate.

Yes. The program should terminate after nooper() exits. Agreed, this
looks strange.

> After the patch, tracee exiting triggers waitid() to fail.
>
> tracee exiting
> waitid: No child processes

OK, but

> @@ -1550,17 +1550,41 @@ static int wait_consider_task(struct wait_opts *wo, int ptrace,
> return 0;
> }
>
> - /*
> - * We don't reap group leaders with subthreads.
> - */
> - if (p->exit_state == EXIT_ZOMBIE && !delay_group_leader(p))
> - return wait_task_zombie(wo, p);
> + /* slay zombie? */
> + if (p->exit_state == EXIT_ZOMBIE) {
> + /* we don't reap group leaders with subthreads */
> + if (!delay_group_leader(p))
> + return wait_task_zombie(wo, p);
>
> - /*
> - * It's stopped or running now, so it might
> - * later continue, exit, or stop again.
> - */
> - wo->notask_error = 0;
> + /*
> + * Allow access to stopped/continued state via zombie by
> + * falling through. Clearing of notask_error is complex.
> + *
> + * When !@ptrace:
> + *
> + * If WEXITED is set, notask_error should naturally be
> + * cleared. If not, subset of WSTOPPED|WCONTINUED is set,
> + * so, if there are live subthreads, there are events to
> + * wait for. If all subthreads are dead, it's still safe
> + * to clear - this function will be called again in finite
> + * amount time once all the subthreads are released and
> + * will then return without clearing.
> + *
> + * When @ptrace:
> + *
> + * Stopped state is per-task and thus can't change once the
> + * target task dies, so notask_error should be cleared only
> + * if WCONTINUED is set.
> + */
> + if (likely(!ptrace) || (wo->wo_flags & WCONTINUED))
> + wo->notask_error = 0;

I don't understand this part. Suppose that this task is not traced and
its real parent does do_wait(WEXITED). We shouldn't return -ECHLD in
this case (EXIT_ZOMBIE && delay_group_leader()).

If the task is traced and debugger does do_wait(WEXITED) we should not
return -ECHLD too.

So I think this patch is wrong in its current state. Hmm, I just noticed
the comment says "If WEXITED is set, notask_error should ...", but this
is not what the code does?





But the main problem is, I do not think do_wait() should block in this
case, and thus I am starting to think this patch is not "complete".

I mean, we are going to add the user-visible change, and I agree the
current behaviour looks strange. But if we change this code, perhaps
the tracer should ignore delay_group_leader() at all (unless we are
going to do wait_task_zombie) ?

Your test-case could use waitid(WEXITED) instead WSTOPPED with the same
result, it should hang. Why it hangs? The tracee is dead, we can't do
ptrace(PTRACE_DETACH), and we can do nothing until other threads exit.
This looks equally strange.

IOW. Assuming that ptrace == T and WEXITED is set, perhaps we should
do something like this pseudo-code

if (p->exit_state == EXIT_ZOMBIE) {
if (!delay_group_leader(p))
return wait_task_zombie(wo, p);

ptrace_unlink();
wait_task_zombie(WNOWAIT);
}

However. This is another user-visible change, we need another discussion
even if I am right. In particular, it is not clear what should we do
if parent == real_parent. And probably this can confuse gdb, but iirc
gdb already have the problems with the dead leader anyway.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/