Re: Threads stuck in zap_pid_ns_processes()

From: Guenter Roeck
Date: Sat May 13 2017 - 10:34:43 EST


On 05/12/2017 01:03 PM, Eric W. Biederman wrote:
Guenter Roeck <linux@xxxxxxxxxxxx> writes:

Hi Eric,

On Fri, May 12, 2017 at 12:33:01PM -0500, Eric W. Biederman wrote:
Guenter Roeck <linux@xxxxxxxxxxxx> writes:

Hi Eric,

On Fri, May 12, 2017 at 08:26:27AM -0500, Eric W. Biederman wrote:
Vovo Yang <vovoy@xxxxxxxxxx> writes:

On Fri, May 12, 2017 at 7:19 AM, Eric W. Biederman
<ebiederm@xxxxxxxxxxxx> wrote:
Guenter Roeck <linux@xxxxxxxxxxxx> writes:

What I know so far is
- We see this condition on a regular basis in the field. Regular is
relative, of course - let's say maybe 1 in a Milion Chromebooks
per day reports a crash because of it. That is not that many,
but it adds up.
- We are able to reproduce the problem with a performance benchmark
which opens 100 chrome tabs. While that is a lot, it should not
result in a kernel hang/crash.
- Vovo proviced the test code last night. I don't know if this is
exactly what is observed in the benchmark, or how it relates to the
benchmark in the first place, but it is the first time we are actually
able to reliably create a condition where the problem is seen.

Thank you. I will be interesting to hear what is happening in the
chrome perfomance benchmark that triggers this.

What's happening in the benchmark:
1. A chrome renderer process was created with CLONE_NEWPID
2. The process crashed
3. Chrome breakpad service calls ptrace(PTRACE_ATTACH, ..) to attach to every
threads of the crashed process to dump info
4. When breakpad detach the crashed process, the crashed process stuck in
zap_pid_ns_processes()

Very interesting thank you.

So the question is specifically which interaction is causing this.

In the test case provided it was a sibling task in the pid namespace
dying and not being reaped. Which may be what is happening with
breakpad. So far I have yet to see kernel bug but I won't rule one out.


I am trying to understand what you are looking for. I would have thought
that both the test application as well as the Chrome functionality
described above show that there are situations where zap_pid_ns_processes()
can get stuck and cause hung task timeouts in conjunction with the use of
ptrace().

Your last sentence seems to suggest that you believe that the kernel might
do what it is expected to do. Assuming this is the case, what else would
you like to see ? A test application which matches exactly the Chrome use
case ? We can try to provide that, but I don't entirely understand how
that would change the situation. After all, we already know that it is
possible to get a thread into this condition, and we already have one
means to reproduce it.

Replacing TASK_UNINTERRUPTIBLE with TASK_INTERRUPTABLE works for both the
test application and the Chrome benchmark. The thread is still stuck in
zap_pid_ns_processes(), but it is now in S (sleep) state instead of D,
and no longer results in a hung task timeout. It remains in that state
until the parent process terminates. I am not entirely happy with it
since the processes are still stuck and may pile up over time, but at
least it solves the immediate problem for us.

Question now is what to do with that solution. We can of course apply
it locally to Chrome OS, but I would rather have it upstream - especially
since we have to assume that any users of Chrome on Linux, or more
generically anyone using ptrace in conjunction with CLONE_NEWPID, may
experience the same problem. Right now I have no idea how to get there,
though. Can you provide some guidance ?

Apologies for not being clear. I intend to send a pull request with the

No worries.

the TASK_UINTERRUPTIBLE to TASK_INTERRUPTIBLE change to Linus in the
next week or so with a Cc stable and an appropriate Fixes tag. So the
fix can be backported.

Great, that is good to know.

I have a more comprehensive change queued I will probably merge for 4.13
already but it just changes what kind of zombies you see. It won't
remove the ``stuck'' zombies.

So what I am looking for now is:
Why are things getting stuck in your benchmark?

- Is it a userspace bug?

In which case we can figure out what userspace (aka breakpad) needs
to do to avoid the problem.

I spent some time trying to understand what breakpad is doing. I don't
really see how it can be blamed. It attaches itself to a crashing thread,
gets the information it is looking for, and detaches itself. At the
same time, whatever it does, it seems to me that userspace should not
be able to create such a situation in the first place.

I suspect what is happening is that the thread breakpad is attached to
is killed by zap_pid_ns_processes. At which point PTRACE_DETACH won't
work because the thread has been killed and waitpid(thread_pid,...)
needs to be used instead.


I think you nailed it. If I drop CLONE_NEWPID from the reproducer I get
a zombie process.

I guess the only question left is if zap_pid_ns_processes() should (or could)
somehow detect that situation and return instead of waiting forever.
What do you think ?

Thanks,
Guenter