Re: [PATCH]: Push tcp rpc handling into task context

Ion Badulescu (ionut@moisil.cs.columbia.edu)
Sat, 15 Aug 1998 20:09:22 -0400 (EDT)


On Wed, 12 Aug 1998, Alan Cox wrote:

> Ok the "save tcp nfs" quest continues. This patch pushes the tcp rpc calls
> where they should be - in user context. That should now mean remaining tcp
> rpc bugs are real protocols screwups not interface abuse.
>
> Since rpciod is handling most tcp rpc calls anyway the impact of this
> appears minimal and I've copied a fair chunk of data around this way as
> well as built Gnome over it (Linux nfs client <-> Linux unfsd tcp).

Interesting... After I discovered that my unfsd was buggy (2.2beta34 is
completely broken when it comes to nfs/tcp) and I grabbed a redhat binary
of 2.2beta16-8, my mounts finally started going through, and reads _seem_
to be fine, but large writes still fail and hang the whole nfs filesystem
code.

Details:

client: linux-2.1.115 + Alan's NFS patch
server: linux-2.0.36/unfsd-2.2beta16-8
mount type: NFSv2/TCP
result: mounts and reads fine, hangs when writing a large file after some
apparently random amount of data (139264 bytes in one case, but it's not
constant). The kernel spits out the message "rpciod_down: waiting for pid
xxx to exit", where pid xxx is an rpciod kernel thread.

client: linux-2.1.115 + Alan's NFS patch
server: solaris-2.5.1
mount type: NFSv3/TCP
result: again, mounts and reads fine, hangs on writes -- this time after
61440 bytes. The same message appears on the console.

client: linux-2.1.115 + Alan's NFS patch
server: solaris-2.5.1
mount type: NFSv2/TCP
result: Same as above. Sometimes, it also prints "RPC: sendmsg returned
error 11" before the previous message.

client: solaris-2.5.1
server: linux-2.0.36/unfsd-2.2beta16-8
mount type: NFSv2/TCP
result: everything ok

>From the "Really Annoying Things" department:

- if a mount hangs, it will hang for ever; on solaris however a SIGINT
solves the problem very quickly.

- if _anything_ hangs in the NFS client code, if a server goes down... etc
etc, there is no way to back out without a reboot. Again, a SIGINT or a
SIGKILL on solaris solves the problem in a matter of seconds.

- when the above happens, NFS access it completely screwed, any NFS access
to any server will hang as well.

Ion

-- 
  It is better to keep your mouth shut and be thought a fool,
            than to open it and remove all doubt.

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.altern.org/andrebalsa/doc/lkml-faq.html