Unkillable processes due to PTRACE_TRACEME again

From: Dmitry Vyukov
Date: Fri Dec 02 2016 - 13:49:08 EST


Hello,

There was an issue discussed a year ago which leads to
unkillalble/unwaitable zombie processes due to PTRACE_TRACEME:
https://groups.google.com/d/msg/syzkaller/uGzwvhlCXAw/E-cfY2ejAgAJ
and I though it has been fixed by "wait/ptrace: assume __WALL if the
child is traced":
https://groups.google.com/d/msg/syzkaller/caeB1ebZWAs/gcbvcM2HDAAJ

I am not on 2caceb3294a78c389b462e7e236a4e744a53a474 (Dec 1). And see
the same unwaitable zombie processes.

Reproducer is:

// autogenerated by syzkaller (http://github.com/google/syzkaller)
#include <pthread.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/syscall.h>
#include <sys/wait.h>
#include <sys/ptrace.h>
#include <signal.h>

void *thr(void *arg)
{
ptrace(PTRACE_TRACEME, 0, 0, 0);
}

int main()
{
int pid = fork();
if (pid == 0) {
pthread_t th;
pthread_create(&th, 0, thr, 0);
usleep(100000);
exit(0);
}
usleep(200000);
kill(pid, SIGKILL);
int status = 0;
waitpid(pid, &status, __WALL);
return 0;
}

It creates the following process tree:

root 15730 0.0 0.0 1156 4 pts/0 S+ 18:33 0:00 |
\_ ./a.out
root 15731 0.0 0.0 0 0 pts/0 Zl+ 18:33 0:00 |
\_ [a.out] <defunct>

Both threads of the child processes terminated:

# ls -l /proc/15731/task/
total 0
dr-xr-xr-x 7 root root 0 Dec 2 18:33 15731
dr-xr-xr-x 7 root root 0 Dec 2 18:33 15732
root@dvyukov-z840:~# cat /proc/15731/status
Name: a.out
State: Z (zombie)
Tgid: 15731
Ngid: 0
Pid: 15731
PPid: 15730
TracerPid: 0
Uid: 0 0 0 0
Gid: 0 0 0 0
FDSize: 0
Groups: 0
NStgid: 15731
NSpid: 15731
NSpgid: 15730
NSsid: 6670
Threads: 2
SigQ: 1/3098
SigPnd: 0000000000000000
ShdPnd: 0000000000000100
SigBlk: 0000000000000000
SigIgn: 0000000000000000
SigCgt: 0000000180000000
CapInh: 0000000000000000
CapPrm: 0000003fffffffff
CapEff: 0000003fffffffff
CapBnd: 0000003fffffffff
CapAmb: 0000000000000000

# cat /proc/15732/status
Name: a.out
State: Z (zombie)
Tgid: 15731
Ngid: 0
Pid: 15732
PPid: 15730
TracerPid: 15730
Uid: 0 0 0 0
Gid: 0 0 0 0
FDSize: 0
Groups: 0
NStgid: 15731
NSpid: 15732
NSpgid: 15730
NSsid: 6670
Threads: 2
SigQ: 1/3098
SigPnd: 0000000000000000
ShdPnd: 0000000000000100
SigBlk: 0000000000000000
SigIgn: 0000000000000000
SigCgt: 0000000180000000
CapInh: 0000000000000000
CapPrm: 0000003fffffffff
CapEff: 0000003fffffffff
CapBnd: 0000003fffffffff
CapAmb: 0000000000000000


The parent process is blocked in waitpid(__WALL):

# cat /proc/15730/stack
[<ffffffff8140fbbb>] do_wait+0x34b/0xb30 kernel/exit.c:1574
[< inline >] SYSC_wait4 kernel/exit.c:1684
[<ffffffff814143bd>] SyS_wait4+0x20d/0x340 kernel/exit.c:1653
[<ffffffff81009a24>] do_syscall_64+0x2f4/0x940 arch/x86/entry/common.c:280
[<ffffffff881a3dcd>] entry_SYSCALL64_slow_path+0x25/0x25
arch/x86/entry/entry_64.S:251


Shouldn't __WALL cause the wait to return?

Again, it's easy to change the repro so that the process first
reparents to init, and then does PTRACE_TRACEME. Which will cause a
forever hanged processes.

Thanks