Re: [PATCH 3/8] job control: Fix ptracer wait(2) hang and explainnotask_error clearing

From: Oleg Nesterov
Date: Tue Mar 22 2011 - 15:17:33 EST


On 03/21, Tejun Heo wrote:
>
> On Mon, Mar 21, 2011 at 04:19:41PM +0100, Oleg Nesterov wrote:
> > But the main problem is, I do not think do_wait() should block in this
> > case, and thus I am starting to think this patch is not "complete".

Just in case... But of course I didn't mean this patch should be
updated to handle the EXIT_ZOMBIE case.

> > Your test-case could use waitid(WEXITED) instead WSTOPPED with the same
> > result, it should hang. Why it hangs? The tracee is dead, we can't do
> > ptrace(PTRACE_DETACH), and we can do nothing until other threads exit.
> > This looks equally strange.
> >
> > IOW. Assuming that ptrace == T and WEXITED is set, perhaps we should
> > do something like this pseudo-code
> >
> > if (p->exit_state == EXIT_ZOMBIE) {
> > if (!delay_group_leader(p))
> > return wait_task_zombie(wo, p);
> >
> > ptrace_unlink();
> > wait_task_zombie(WNOWAIT);
> > }
> >
> > However. This is another user-visible change, we need another discussion
> > even if I am right. In particular, it is not clear what should we do
> > if parent == real_parent. And probably this can confuse gdb, but iirc
> > gdb already have the problems with the dead leader anyway.
>
> Interesting point. Yeah, I agree. wait(WEXITED) from the ptracer
> should only wait for the tracee itself, not the group. When they are
> one and the same, I don't think we need to do anything differently
> from now.
>
> If we change the behavior that way, it would also fit better with the
> rest of the new behavior where the real parent and ptracer have
> separate roles when wait(2)ing for stopped states.
>
> The question is how the change would affect the existing users.

Yes, of course. Perhaps we can never do this.

> When
> the debugee is a direct child, nothing will change.

Actually, I think this is the most problematic case... Perhaps
it would be safer to add WEXITED_THREAD for ptrace. I dunno.

> When attaching to
> a separate group, I don't think it even matters. Does gdb handle
> group leader any differently from the rest when attached to an
> unrelated group?

gdb certainly has some problems with the dead leaders. But I can't
recall what exactly. Will try to check later...

In any case, I only tried to discuss what else we can do with the
current strange semantics. When it comes to ptrace, group_leader
should not represent the whole process.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/