Re: 2.6.14-rc1 wait()/SIG_CHILD bevahiour

From: Roland McGrath
Date: Mon Sep 19 2005 - 18:35:16 EST


The test program is buggy. Here is one clue:

elm3b29:~ # strace -p 30023
Process 30023 attached - interrupt to quit
futex(0x2aaaaaddf118, FUTEX_WAIT, 2, NULL

It's not anywhere near wait4. It's deadlocked in the rand() call inside
rand_delay, called from sigchld_handler. You cannot safely call rand
inside a signal handler, for exactly this reason. The signal came during
another rand call and attempted to reenter. If this sort of deadlock is
the failure mode of your real-world case, then it is probably an
application bug. If this deadlock is just a mistake in your test program
here, then you'll need to give us a corrected test program to pursue
whatever real kernel issue you may have.


Thanks,
Roland
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/