Ever since the asynchronous NFS client code was installed (sometime in
the 1.3.x series), I had modem getty processes (whose executables lie
on an NFS-mounted system) mysteriously hang in the disk wait state.
Now, I finally tracked down the problem. write_chan() from the serial
driver code does:
...
add_wait_queue(&tty->write_wait, &wait);
while (1) {
current->state = TASK_INTERRUPTIBLE;
...
if(...)
else
c = tty->driver.write(tty, 1, b, nr);
...
schedule();
}
current->state = TASK_RUNNING;
remove_wait_queue(&tty->write_wait, &wait);
...
But when tty->driver.write() is rs_write(), it does a memcpy_fromfs()
(or whatever it is now called in 2.1; the problem probably remains the
same) which may have to swap-in a page via nfs_readpage(), which only
works when current->state is TASK_RUNNING (the asynchronous NFS code
calls schedule()).
I've discussed this with Olaf Kirch, and he thinks that current->state
should possibly be only set _after_ calling the tty->driver.write()
handler, so as to maintain the invariant that every task can call
schedule() without looking at current->state first.
But if the serial code isn't the only place where this situation can
occur, perhaps the NFS code needs to be changed.
Regards,
Wolfram.
-- `Surf the sea, not double-u three...' Wolfram.Gloger@dent.med.uni-muenchen.de, Gloger@lrz.uni-muenchen.de