Re: A peculiarity in ptrace/waitpid behavior

From: Oleg Nesterov
Date: Fri Mar 20 2015 - 12:27:52 EST


Hi Pavel,

let me add lkml, we should not discuss this offlist.

On 03/20, Pavel Labath wrote:
>
> 1) we get a waitpid() notification that the tracee got SIGUSR1
> 2) we do a ptrace(GETSIGINFO) to get more info
> 3) eventually we decide to restart the tracee with PTRACE_CONT, passing it
> SIGUSR1
> 4) immediately after that we get another waitpid notification, again with
> SIGUSR1, even though the thread had received no additional signals
> 5) we again try to a GETSIGINFO, however this time it fails with ESRCH.
> Therefore, we assume that the thread has died

I found a similar bug by code inspection some time ago. I even have
a fix, but I need to think more... And I even wrote the test-case ;)
see below.

But so far I can't say if you hit the same problem or not. If you can
reproduce the problem, perhaps I can send you debugging patch?

Oleg.

#include <stdio.h>
#include <unistd.h>
#include <sys/wait.h>
#include <sys/ptrace.h>
#include <sys/syscall.h>
#include <assert.h>

#define tkill(pid, sig) \
syscall(__NR_tkill, pid, sig)

void run_test(void)
{
int pid, stat;

pid = fork();
if (!pid) {
assert(ptrace(PTRACE_TRACEME, 0,0,0) == 0);
raise(SIGSTOP);
assert(0);
}

assert(pid == wait(&stat) && stat == 0x137f);

tkill(pid, SIGTRAP); /* should not be reported */
tkill(pid, SIGKILL);
assert(pid == wait(&stat));
if (stat == 0x9)
return;

printf("unexpected wait: stat=%x\n", stat);
kill(0, SIGKILL);
}

int main(void)
{
int i = 8; /* random */

while (--i)
if (!fork())
break;

for (;;)
run_test();

return 0;
}

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/