Re: [PATCH 0/2] fix the traced mt-exec deadlock

From: Eric W. Biederman
Date: Thu Mar 02 2017 - 20:12:11 EST


Oleg Nesterov <oleg@xxxxxxxxxx> writes:

> Eric,
>
> our discussion was a bit confusing, and it seems that we did not
> fully convince each other. So let me ask what do you finally think
> about this fix.
>
> Let me repeat. Even if I do not agree with some of your objections,
> I do agree that 1/2 does not look nice and clean. And we seem to
> agree that either way, with or without this fix, we need more changes
> in this area.
>
> But we need a simple and backportable fix for stable trees, say for
> rhel7. This bug was reported many times, and this is the simplest
> solution I was able to find.

I am not 100% convinced that we need a backportable solution, but
regardless my experience is that good clean solutions are almost always
easier to backport that something we just settle for.

The patch below needs a little more looking and testing but arguably
it is sufficient.

It implements autoreaping for non-leader threads always. We might want
to limit this to the case of exec. For exec there is a strong argument
that no one cares as the exit_code/status returned by these exiting
threads has always been 0. Maybe that is ok. Exit_success but it
certainly seems wrong in the case of exec. It would really be better to
have an exit status that you went away with exec.

I know it has always been 0 as fs/exec.c does not set
sig->group_exit_code so the value initialized in copy_signal from
kmem_cache_zalloc is used. AKA 0.

This change has very little to do with the cred_guard_mutex. Just a
question of which semantics we want to export.

In my preliminary testing this seems a viable.

My big problem with your previous patch is that it is in no way
obviously correct. Nor is it well explained. All signs that someone is
pushing too hard and something better can be achieved if someone steps
back a little.

When the following patch can be made to solve the same problem we
certainly need an explanation in the commit comment on why we are using
much more code for the same job.

diff --git a/kernel/exit.c b/kernel/exit.c
index 5cfbd595f918..5a432e0dcf4a 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -690,7 +690,9 @@ static void exit_notify(struct task_struct *tsk, int group_dead)
thread_group_empty(tsk) &&
!ptrace_reparented(tsk) ?
tsk->exit_signal : SIGCHLD;
- autoreap = do_notify_parent(tsk, sig);
+ do_notify_parent(tsk, sig);
+ /* Autoreap threads even when ptraced */
+ autoreap = !thread_group_leader(tsk);
} else if (thread_group_leader(tsk)) {
autoreap = thread_group_empty(tsk) &&
do_notify_parent(tsk, tsk->exit_signal);
@@ -699,8 +701,6 @@ static void exit_notify(struct task_struct *tsk, int group_dead)
}

tsk->exit_state = autoreap ? EXIT_DEAD : EXIT_ZOMBIE;
- if (tsk->exit_state == EXIT_DEAD)
- list_add(&tsk->ptrace_entry, &dead);

/* mt-exec, de_thread() is waiting for group leader */
if (unlikely(tsk->signal->notify_count < 0))
@@ -711,6 +711,8 @@ static void exit_notify(struct task_struct *tsk, int group_dead)
list_del_init(&p->ptrace_entry);
release_task(p);
}
+ if (autoreap)
+ release_task(tsk);
}

#ifdef CONFIG_DEBUG_STACK_USAGE

Eric