Re: Problem: shell prompt doesn't return although the invoked program calls _exit().

From: Ishikawa (ishikawa@yk.rim.or.jp)
Date: Sat May 20 2000 - 09:49:57 EST


I could produce a short program that reproduces the symptom.

Ishikawa wrote:

> Hi,
>
> I am writing this message to a few e-mail aliases.
>
> This is because I could not figure out what is the cause of the problem.
>
> Does anyone have an idea what causes this problem?
>
> Observed Platform.
> Debian Gnu/Linux 2.2.14, 2.2.15, 2.2.16pre2
>
> Observed problem.
>
> A particular program, called `prog' in the following,
> invoked like the following manner from shell command line doesn't return
>
> to the shell prompt. It never returns.
>
> ./prog -q < inputfile > outputfile 2>&1

Hi,

I could produce a short version of the C program that showed
the symptom.
Looks to me there is a problem in handling opened tty ports when
_exit() is called.

Bash doesn't seem to be the cause of the problem. I am CC:ing to bug-bash
to
let you know this.

The C code is about 285 lines of code without the lengthy beginning comment
which
I attach below.
Please drop me a line if you need to take a look at the source code for
debugging, curiosity, etc..

My suggested fix:
I think kernel ought to close the tty ports forcibly if closing is
requested from within _exit(). [And losing the
written data which still lay in the buffer. But this is the
program's intention. Exiting cleanly is probably more important concern
here. ]

Happy Hacking!

Chiaki Ishikawa

--- begin quote ---

*
 *
 * $Id: newtest.c,v 1.1 2000/05/20 14:38:32 ishikawa Exp ishikawa $
 *
 * You need to have two UNUSED serial ports.
 * You must not connect anything to it.
 *
 * (Actually you can connect the two ports with
 * a cross cable (null-modem cable) and the
 * the resulting symptom, that the calling shell
 * does not return to the shell prompt, doesn't seem
 * to appear even on Linux !!!
 * On solaris, the (original) program exits cleanly,
 * without such a cable.
 *
 * Overview:
 *
 * This program opens the two serial ports for
 * read/write. (/dev/ttyS[01]. You need to make these
 * world-read/writable if this program is run from normal
 * user account.)
 *
 * This program, then, sets the termios characteristics of the
 * serial ports for 8bit, even parity, one stop bit
 * in the raw mode ( no processing at all.).
 * Flow control is hardware control, etc..
 *
 * Then it enters a loop.
 * For each loop step,
 * it calls usleep to sleep for a short period of time.
 *
 * usleep() is a library function. It calls nanosleep, a system
 * function. My reference to nanosleep() in previous postings might have
 * been a little confusing. The source code doesn't mention nanosleep() at
 * all. If you need to screen the verbose trace from strace, then you
 * need to say something like, strace -e trace=\!nanosleep,read -p PID.
 *
 * Back to the description of the program.
 * Then it calls time() to check the wall clock.
 * At each of these iterations, it
 * tries to read a character from each of the port (if any.).
 * The read() wouldn't block since termios has been set up in such a
 * way, read returns immediately if no character is available, and
 * returns min(available chars, requested chars) if any.
 * It also writes one byte to the port.
 * It then updates the notion of the relative time since
 * the beginning of the program invocation. If one second
 * has passed since the last update, it prints the duration (sec).
 *
 * This iteration is repeated for two minutes and
 * the program calls exit(0).
 *
 * The problem symptom is this:
 * Here after the program calls exit(0),
 * the calling shell prompt doesn't return if no cross cable is
 * connected between the serial ports when this program is executed.
 * This happens on Linux.
 * This problem doesn't happen on Solaris 7 for x86.
 *
 * On linux, ps output shows something like this: Note that
 * newtest appears inside a pair of "[]".
 * I am running the shell inside Emacs shell buffer.
 *
 * 378 ttyp3 S 0:00 /bin/bash -i
 * 582 ttyp3 SW 0:00 [newtest] <--- here!
 * 584 ttyp1 R 0:00 ps axg
 *
 * ps axglw showed
 *
 * 000 1001 378 351 0 0 2448 1280 wait4 S ttyp3 0:00
/bin/bash -i
 * 004 1001 582 378 0 0 0 0 tty_wa SW ttyp3 0:00
[newtest]
 *
 * At this stage, the output from the program is like this, and
 * the shell prompt has not returned yet.
 * ...., 119, 120, 120 sec. quitting...

 * By monitoring the system calls executed by this program using
 * strace, I know that _exit(0) has been called by then.
 *
 * After a lot of experimenting, I have found out that
 * if the two ports are connected via cross cable, the
 * shell prompt returns(!).
 * That I found no problem back in early April and March was
 * probably I had cable hooked up to these ports back then.
 *
 * But again, the problem didn't happen on Solaris 7 for x86 (without
 * any cable at all).
 * For solaris, you need to change the name of the tty device.
 *
 * I am not sure what the "tty_wa" in the "ps axglw" output means.
 * Waiting for something?
 * But, since _exit(0) by means of exit(0) has been called,
 * shouldn't the process exit immediately and SIGCHLD be
 * passed to the parent immediately, too?
 *
 * From, linux man page for _exit(2)
 * --- begin quote ---
 * DESCRIPTION
 * _exit terminates the calling process immediately. Any open
 * file descriptors belonging to the process are closed; any
 * children of the process are inherited by process 1, init,
 * and the process's parent is sent a SIGCHLD signal.
 *
 * status is returned to the parent process as the process's
 * exit status, and can be collected using one of the wait
 * family of calls.
 * --- end quote ---
 *
 * (OK, I see there must be a problem in
 * closing of the file descriptors for ttys? Hmm... )
 * Shouldn't we forcibly close the tty in this case when _exit()
 * request such actions?
 *
 * [ This program is a very shortened version of
 * a program to explain the event-driven programming, in
 * which an event-type is the arrival of a certain packet
 * from a device connected to serial port.
 * The intention was to produce a skelton code that can
 * be shown to programmers who might later need to port
 * the skelton code to DOS(aga!), very simple embedded OS, and
 * other OSs. (No select call, for example, for portability
 * reasons. )
 * ]
 */

--- end quote ---

PS: writing to the serial port (unconnected) seems to trigger the problem.
The data presumably lay waiting in the buffer associated with the serial
line.
I tested the above program on linux 2.2.16pre3 (Alan Cox's pre-patch didn't
update
the uname -a output: it still says 2.2.16pre2 when in fact it is pre3.)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Tue May 23 2000 - 21:00:19 EST