Re: A fifo and signal bug

Nick Holloway (Nick.Holloway@alfie.demon.co.uk)
Sun, 22 Nov 1998 12:15:45 GMT


In list.linux-kernel you write:
> hjl, please fix "sleep()" in glibc first. If it still fails for you, then
> I can look at it, right now I can see it failing in the strace on the
> sleep().

I replaced sleep with a select(0,NULL,NULL,NULL,{WAITTIME,0}), to remove
the possibility of it being the glibc nanosleep.

I have been sticking in printk's all over the place, and I see why the
test script is failing (at least for me -- 2.1.129 UP 486).

The child enters fifo_open, and blocks. After a while, the STOP signal
comes in, and it falls out of the system call with ERESTARTSYS. Quite
correctly, the PIPE_READERS and PIPE_RD_OPENERS are reset back to zero.

In do_signal, the signal is to be delivered to the child. The relevant
fragment of code (line 680) is:

current->state = TASK_STOPPED;
current->exit_code = signr;
if (!(current->p_pptr->sig->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDSTOP))
notify_parent(current, SIGCHLD);
schedule();
continue;

The code to restart the system call on ERESTARTSYS will be entered on
leaving the loop via the continue. _However_, the schedule does not
return until much later (when the parent send the kill signal after
failing to open the fifo).

This appears to be a result of the first thing that schedule does -- it
removes the task from the run queue, because it is at state TASK_STOPPED.

A small change (made by somebody that may have missed the big picture
entirely), is to only call schedule if we were not called when within
a system call. Potentially this test would need extending so that
additionally eax is one of the RESTART{NOHAND,SYS,NOINTR}.

With this change, the open_fifo syscall gets restarted for the stopped
child process, and the test passes.

--- arch/i386/kernel/signal.c-dist Sun Nov 22 12:04:36 1998
+++ arch/i386/kernel/signal.c Sun Nov 22 12:05:13 1998
@@ -682,7 +682,8 @@
current->exit_code = signr;
if (!(current->p_pptr->sig->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDSTOP))
notify_parent(current, SIGCHLD);
- schedule();
+ if (regs->orig_eax < 0)
+ schedule();
continue;

case SIGQUIT: case SIGILL: case SIGTRAP:

The big question that I can't explain is why the test works for Linus,
but not for others (unless this is an SMP/UP thing).

-- 
 `O O'  | Home: Nick.Holloway@alfie.demon.co.uk  http://www.alfie.demon.co.uk/
// ^ \\ | Work: Nick.Holloway@parallax.co.uk

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/