Re: problem with 114 sched.* changes (not the gcc one)

Ion Badulescu (ionut@moisil.cs.columbia.edu)
Fri, 21 Aug 1998 20:44:38 -0400 (EDT)


Hi Linus,

This is a quick update about the hanging amd problem introduced by the
scheduling changes in 2.1.114.

On Mon, 10 Aug 1998, Ion Badulescu wrote:

> On Mon, 10 Aug 1998, Linus Torvalds wrote:
>
> > Let me check one more thing - is this with KMOD enabled, and "nfsd" as a
> > kernel module? Does it also happen if nfsd is compiled into the kernel?
>
> KMOD is enabled, but nfs and nfsd are both compiled into the kernel. knfsd
> is not even active, as the mount is of /u which is served by the userspace
> amd NFS server.

I have two new elements to bring into the picture:

1. if an rpciod is already running at the time amd is started (i.e. after
doing a remote-server nfs mount by hand), amd will _not_ hang, in fact it
will run quite happily.

2. with the restoral of wchan in 2.1.117 I was able to trace the hang a
little further. The mount hangs in rpciod_up (net/sunrpc/sched.c) in
sleep_on(&rpciod_idle) whereas the rpciod itself is sleeping in
interruptible_sleep_on(&rpciod_idle).

My humble oppinion is that, with the scheduling change in 2.1.114, rpciod
will start running _before_ rpciod_up calls sleep_on() and therefore the
wake_up() call at the beginning of rpciod becomes ineffective because
nothing is sleeping on rpciod_idle yet. This is pure speculation, but it
kind of makes sense. I'm not sure what the right fix is though...

Thanks,
Ion

-- 
  It is better to keep your mouth shut and be thought a fool,
            than to open it and remove all doubt.

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.altern.org/andrebalsa/doc/lkml-faq.html