Re: init's children list is long and slows reaping children.

From: Bill Davidsen
Date: Wed Apr 11 2007 - 16:02:57 EST


Oleg Nesterov wrote:
On 04/10, Eric W. Biederman wrote:

I'm trying to remember what the story is now. There is a nasty
race somewhere with reparenting, a threaded parent setting SIGCHLD to
SIGIGN, and non-default signals that results in an zombie that no one
can wait for and reap. It requires being reparented twice to trigger.

reparent_thread:

...

/* If we'd notified the old parent about this child's death,
* also notify the new parent.
*/
if (!traced && p->exit_state == EXIT_ZOMBIE &&
p->exit_signal != -1 && thread_group_empty(p))
do_notify_parent(p, p->exit_signal);

We notified /sbin/init. If it ignores SIGCHLD, we should release the task.
We don't do this.

The best fix I believe is to cleanup the forget_original_parent/reparent_thread
interaction and factor out this "exit_state == EXIT_ZOMBIE && exit_signal == -1"
checks.

As long as the original parent is preserved for getppid(). There are programs out there which communicate between the parent and child with signals, and if the original parent dies, it undesirable to have the child getppid() and start sending signals to a program not expecting them. Invites undefined behavior.

--
Bill Davidsen <davidsen@xxxxxxx>
"We have more to fear from the bungling of the incompetent than from
the machinations of the wicked." - from Slashdot

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/