Possible bug in wait4(), 2.1.126-129 ?

Ion Badulescu (ionut@moisil.cs.columbia.edu)
Sun, 22 Nov 1998 08:13:00 -0500 (EST)


Hi Linus,

I'm getting ECHILD from a wait4() syscall, in a situation which should
never trigger such an error. Unless I misunderstood the wait4(2) man page:

ERRORS
ECHILD No unwaited-for child process as specified does exist.

The parent process is a perl script which calls a few external programs
using the backticks syntax (`foo`). However, the setup required for the
error to occur is really weird:

- it doesn't happen at all with 2.0.x, regardless of libc
- it doesn't happen with libc5 under 2.1.x
- it doesn't happen with glibc under 2.1.x if the script is run from the
prompt
- it happens reliably with glibc under 2.1.x if the script is run from
cron, directly or under strace

Attached is the (shortened) strace of the script. Note the wait4()
syscall and the error it returns. The perl script is something like:

`cmp -s /tmp/.mrs_8036 /etc/local/root.rdist.hosts`
die "could not move $tmp_filename to $out_filename: $!" if ($? != 0);

>From a quick look at sys_wait4(), it appears that the only case when
ECHILD can be returned is when the pid is not found in the current
process' children list _or_ when that weird __WCLONE condition is true.
ps -ax doesn't show the child at all, the zombie is gone, even before the
script finishes.

The system is i386 UP, running a UP 2.1.129. I'm getting the same error on
an UltraSPARC UP, running UP 2.1.126. The i386 kernel is patched for the
UP flu, vanilla otherwise.

Any ideas?

Thanks a lot,
Ion

-- 
  It is better to keep your mouth shut and be thought a fool,
            than to open it and remove all doubt.

(my pid is 8036)

pipe([3, 4]) = 0 fork() = 8045 [pid 8036] close(4) = 0 [pid 8036] fcntl(3, F_GETFL) = 0 (flags O_RDONLY) [pid 8036] fstat(3, {st_mode=010, st_size=0, ...}) = 0 [pid 8036] mmap(0, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x4002a000 [pid 8036] lseek(3, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek) [pid 8036] read(3, <unfinished ...> [pid 8045] close(3) = 0 [pid 8045] dup2(4, 1) = 1 [pid 8045] close(4) = 0 [pid 8045] execve("/sbin/cmp", ["cmp", "-s", "/tmp/.mrs_8036", "/etc/local/root.rdist.hosts"], [/* 13 vars */]) = -1 ENOENT (No such file or directory) [pid 8045] execve("/bin/cmp", ["cmp", "-s", "/tmp/.mrs_8036", "/etc/local/root.rdist.hosts"], [/* 13 vars */]) = -1 ENOENT (No such file or directory) [pid 8045] execve("/usr/sbin/cmp", ["cmp", "-s", "/tmp/.mrs_8036", "/etc/local/root.rdist.hosts"], [/* 13 vars */]) = -1 ENOENT (No such file or directory) [pid 8045] execve("/usr/bin/cmp", ["cmp", "-s", "/tmp/.mrs_8036", "/etc/local/root.rdist.hosts"], [/* 13 vars */]) = 0 [pid 8045] brk(0) = 0x804ad70 [pid 8045] open("/etc/ld.so.preload", O_RDONLY) = -1 ENOENT (No such file or directory) [pid 8045] open("/etc/ld.so.cache", O_RDONLY) = 3 [pid 8045] fstat(3, {st_mode=0, st_size=0, ...}) = 0 [pid 8045] mmap(0, 17900, PROT_READ, MAP_PRIVATE, 3, 0) = 0x4000b000 [pid 8045] close(3) = 0 [pid 8045] open("/lib/libc.so.6", O_RDONLY) = 3 [pid 8045] mmap(0, 4096, PROT_READ, MAP_PRIVATE, 3, 0) = 0x40010000 [pid 8045] munmap(0x40010000, 4096) = 0 [pid 8045] mmap(0, 670580, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0x40010000 [pid 8045] mprotect(0x400a1000, 76660, PROT_NONE) = 0 [pid 8045] mmap(0x400a1000, 28672, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0x90000) = 0x400a1000 [pid 8045] mmap(0x400a8000, 47988, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x400a8000 [pid 8045] close(3) = 0 [pid 8045] personality(0 /* PER_??? */) = 0 [pid 8045] getpid() = 8045 [pid 8045] open(ptrace: umoven: Input/output error 0xbffffee6, O_RDONLY) = 3 [pid 8045] fstat(3, {st_mode=0, st_size=0, ...}) = 0 [pid 8045] open(ptrace: umoven: Input/output error 0xbffffef5, O_RDONLY) = 4 [pid 8045] fstat(4, {st_mode=0, st_size=0, ...}) = 0 [pid 8045] lseek(3, 0, SEEK_CUR) = 0 [pid 8045] lseek(4, 0, SEEK_CUR) = 0 [pid 8045] brk(0) = 0x804ad70 [pid 8045] brk(0x804bd88) = 0x804bd88 [pid 8045] brk(0x804c000) = 0x804c000 [pid 8045] brk(0x804e000) = 0x804e000 [pid 8045] read(3, "somehost.cs.columbia.edu\n", 4096) = 22 [pid 8045] read(3, "", 4074) = 0 [pid 8045] read(4, "somehost.cs.columbia.edu\n", 4096) = 22 [pid 8045] read(4, "", 4074) = 0 [pid 8045] close(3) = 0 [pid 8045] close(4) = 0 [pid 8045] _exit(0) = ? <... read resumed> "", 4096) = 0 --- SIGCHLD (Child exited) --- close(3) = 0 munmap(0x4002a000, 4096) = 0 sigaction(SIGHUP, {SIG_IGN}, {SIG_DFL}) = 0 sigaction(SIGINT, {SIG_IGN}, {SIG_DFL}) = 0 sigaction(SIGQUIT, {SIG_IGN}, {SIG_DFL}) = 0 wait4(8045, 0xbffffa20, 0, NULL) = -1 ECHILD (No child processes)

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/