Re: [PATCH 6/6] clone4: Introduce new CLONE_FD flag to get task exit notification via fd

From: Josh Triplett
Date: Sat Mar 14 2015 - 16:14:19 EST


On Sat, Mar 14, 2015 at 08:47:21PM +0100, Oleg Nesterov wrote:
> On 03/14, Oleg Nesterov wrote:
> >
> > On 03/14, Josh Triplett wrote:
> > >
> > > On Sat, Mar 14, 2015 at 11:38:29AM -0700, Thiago Macieira wrote:
> > > > On Saturday 14 March 2015 15:32:35 Oleg Nesterov wrote:
> > > > > It is not clear to me what do_wait() should do with ->autoreap child, even
> > > > > ignoring ptrace.
> > > > >
> > > > > Just suppose that real_parent has a single "autoreap" child. Should
> > > > > wait(NULL) hanf then?
> > > >
> > > > It should ignore the child that is set to autoreap. wait(NULL) should return -
> > > > ECHILD, indicating there are no children waiting to be reaped.
> > >
> > > Right. And I don't think the current code does this. I think we need
> > > to change wait_consider_task to early-return for ->autoreap just as it
> > > does for task_state == EXIT_DEAD.
> >
> > No. This EXIT_DEAD is absolutely different. And this is another indication
> > that you might use it wrongly ;)
> >
> > What we actually want is BUG_ON(task_state == EXIT_DEAD) here. We do not
> > want the EXIT_DEAD tasks in ->children/ptraced lists. These EXIT_DEAD tasks
> > complicate the exit/wait/reparent paths.
> >
> > However, currently this is TODO. The main problem is the locking in
> > wait_task_zombie(), we can set EXIT_DEAD and remove the task from list
> > under read_lock().
>
> Let me clarify in case I confused you.
>
> The EXIT_DEAD check in do_wait() paths doesn't mean "autoreap". It means
> that this thread/process (depending on ptrace) was already reaped. It was
> reaped by our sub-thread, or it was reaped because we ignore SIGCHLD, or
> other reasons. This doesn't matter.
>
> In short, EXIT_DEAD means: we have to keep this thread on lists until the
> task which set this state calls release_task().

That much I already understood from reading through the code, since
exit_notify doesn't set task_state to EXIT_DEAD until the task is
actually completely dead. When wait_consider_task sees p->task_state ==
EXIT_DEAD, that task isn't eligible for waiting at all.

What I was proposing was that a task that isn't yet dead, but that is
going to be autoreaped, is not eligible for waiting either. All the
various wait* familiy of system calls should pretend it doesn't exist at
all, because returning an autoreaped task from a wait* call introduces a
race condition if the parent tries to *do* anything with the returned
PID. If you launch a process with CLONE_FD, you need to manage it
exclusively with that fd, not with the wait* family of system calls.

That also implies that the child-stop and child-continued mechanisms
(do_notify_parent_cldstop, WSTOPPED, WCONTINUED) should ignore the task
too. In the future there could be a flag to clone4 that lets you get
stop and continue notifications through the file descriptor.

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/