Re: [RFC][PATCH] exec: Don't wait for ptraced threads to be reaped.

From: Oleg Nesterov
Date: Mon Apr 03 2017 - 14:37:43 EST


Eric,

I see another series from you, but I simply failed to force myself to read
it carefully. Because at first glance it makes me really sad, I do dislike
it even if it is correct. Yes, yes, sure, I can be wrong. Will try tomorrow.

On 04/02, Eric W. Biederman wrote:
>
> Oleg Nesterov <oleg@xxxxxxxxxx> writes:
>
> > Anyway, Eric, even if we can and want to do this, why we can't do this on
> > top of my fix?
>
> Because your reduction in scope of cred_guard_mutex is fundamentally
> broken and unnecessary.

And you never explained why it is wrong, or I failed to understand you.

> > I simply fail to understand why you dislike it that much. Yes it is not
> > pretty, I said this many times, but it is safe in that it doesn't really
> > change the current behaviour.
>
> No it is not safe. And it promotes wrong thinking which is even more
> dangerous.

So please explain why it is not safe and why it is dangerous.

Just in case, if you mean flush_signal_handlers() outside of cred_guard_mutex,
please explain what I have missed in case you still think this is wrong.

> I reviewed the code and cred_guard_mutex needs to cover what it covers.

I strongly, strongly disagree. Its scope is unnecessary huge, we should narrow
it in any case, even if the current code was not bugy. But this is almost
offtopic, lets discuss this separately.

> > I am much more worried about 2/2 you didn't argue with, this patch _can_
> > break something and this is obviously not good even if PTRACE_EVENT_EXIT
> > was always broken.
>
> I don't know who actually useses PTRACE_O_TRACEEXIT so I don't actually
> know what the implications of changing it are. Let's see...

And nobody knows ;) This is the problem, even the clear ptrace bugfix can
break something, this happened before and we had to revert the obviously-
correct patches; the bug was already used as feature.

> If delivering a second SIGKILL
...
> So userspace can absolutely kill a processes in PTRACE_EVENT_EXIT
> before the tracers find it.
>
> Therefore we are only talking a quality of implementation issue
> if we actually stop and wait for the tracer or not.

Oh, this is another story, needs another discussion. We really need some
changes in this area, we need to distinguish SIGKILL sent from user-space
and (say) from group-exit, and we need to decide when should we stop.

But at least I think the tracee should never stop if SIGKILL comes from
user space. And yes ptrace_stop() is ugly and wrong, just look at the
arch_ptrace_stop_needed() check. The problem, again, is that any fix will
be user-visible.

Oleg.