Problem: shell prompt doesn't return although the invoked program calls _exit().

From: Ishikawa (ishikawa@yk.rim.or.jp)
Date: Wed May 17 2000 - 13:18:38 EST


Hi,

I am writing this message to a few e-mail aliases.

This is because I could not figure out what is the cause of the problem.

Does anyone have an idea what causes this problem?

Observed Platform.
Debian Gnu/Linux 2.2.14, 2.2.15, 2.2.16pre2

Observed problem.

A particular program, called `prog' in the following,
invoked like the following manner from shell command line doesn't return

to the shell prompt. It never returns.

 ./prog -q < inputfile > outputfile 2>&1

(The program compiled on Solaris does return Solaris 7 for x86.
It may not mean much, but what puzzled me most is that
it returns to the shell on Linux back in early April and March if my
memory serves correctly.)

Other important observation:

After a prolong period of time by which time the program ought to have
finished, I try to figure out what goes on by looking at the system
call trace process by strace -p PID.
Then as soon as I run strace, suddenly the program exits(!)
and shell prompt appears. On the other hand, Strace itself
prints out Panic message.
eg.
    ishikawa@standard$ strace -p 3008
    PANIC: attached pid 3008 exited

   (Just prior to this, I noticed that the ps printed out the
     "prog" surrounded in "[]" as in
       3008 pts/1 SW 0:00 [prog]
     But this usage of [] is not always. I often see it as (prog).
    )

However, if I run strace well before the program reaches the place it
calls
exit(), the strace shows that the program DOES call "_exit() ", and
strace exits there.
But the execution of the original `prog' didn't return to the original
shell command line
prompt.

Eg.
    strace output: (I ran strace with -e trace=\!nanosleep,time,read -p
PID
              the program calls nanosleep,time,read very often.
            ...
        write(3, "\3", 1) = 1
        write(3, ";", 1) = 1
        write(2, "122, ", 5) = 5
        write(2, "\nreached END command. Quitting i"..., 46) = 46
        _exit(0) = ?
     ishikawa@standard$

Please note that _exit() is called. However, the original
shell window looks like the following. (I invoked the
program via make.)

   ishikawa@standard$ make test2-regression
   ./prog < input.dat > output.dat 2>&1

At this stage, the shell prompt doesn't appear yet.
If I type control-C here, the following lines are printed.
It is as if the child processes are gone without notifying
the invoking shell. Since the above invocation is through
make, the messages are printed by make.

  make: *** wait: No child processes. Stop.
  make: *** Waiting for unfinished jobs....
  make: *** wait: No child processes. Stop.

The non-returning symptom is just the same if I skip make and
type the command line exactly directly manually to the shell.

  Versions of software.

   Here I report the version info on my home PC.
   However, the problem was seen initially on a linux 2.2.{13,14 }PC
with the
   latest stable (just before official freeze?) packages from Debian.
   I saw the similar problem with very slightly older "make" and "bash"
on a different PC,
   and updated them to the same version here, and the problem persisted
there, too.

  make : 3.78.1
   ishikawa@standard$ /bin/sh --version
   GNU bash, version 2.03.0(1)-release (i386-pc-linux-gnu)
   Copyright 1998 Free Software Foundation, Inc.

   libc.so.6 is based on 2.1.3 and I obtained it via Debian
distribution.
   strings - /usr/lib/libc.a | egrep -i version contained the following
   two lines among others.
  GNU C Library stable release version 2.1.3, by Roland McGrath et al.
  Compiled by GNU CC version 2.95.2 20000313 (Debian GNU/Linux).

   ldd ./prog
        /lib/libsafe.so.1 => /lib/libsafe.so.1 (0x40014000)
        libc.so.6 => /lib/libc.so.6 (0x4001b000)
        libdl.so.2 => /lib/libdl.so.2 (0x400f8000)
        /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000)

linux:
ishikawa@standard$ uname -a
Linux standard 2.2.16pre2 #52 SMP Wed May 17 02:

The program in question calls
MANY, MANY {nanosleep, time, read} system calls in this order
during its lifetime if this matters at all.
The program also opens two serial ports for read/write.

I know this type of error is not often seen: otherwise, linux
won't be usable. That is why I mention some characteristics of
the program that may be relevant here.

Is it possible that there is some sort of a problem of
signal from child not passed to the parent very well under
certain conditions?

Oh, on a different PC where this problem was first noticed, I did
check that the kernel image under
root and /boot directories matches that of the boot image (the one used
for booting from lilo, etc..). So I am fairly confident this problem
occurs
when the correct images are present and used.

Any tips in figuring out the cause of the problem is appreciated.
I can post relevant information such as versions of software
packages not mentioned here (if I can figure out how to find them out.)

Please cc: me since I am not a subscriber of the respective
mailing lists.
(I am writing to bug-glibc since it may be that the _exit() may not
work well under certain esoteric conditions.
bug-bash is included since I wonder if there is a problem
in file i/o redirection (2>&1) that may be relevant to this problem.
kernel for obvious reasons.

Thank you in advance for your attention.

Happy Hacking,

chiaki ishikawa

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Tue May 23 2000 - 21:00:13 EST