Re: NFS Data CORRUPTION Between Linux and SunOS 5.5.1

fbujanic (fbujanic@fikus.com)
Fri, 14 Aug 1998 15:43:06 -0400 (EDT)


I have noticed the same porblems as soon as moved into 2.1.11x kernels.
Everything worked fine before that. I complained and got an answer that
mount is broken... This was not too satisfactory for me so I tried all
versions on knfsd and user land nfs daemons. But all of them seem to
produce the same error (some better results with userland nfsd but is much
slower then knfsd). I have linux as an nfs server 2.1.86 and couple of
linux clients running 2.1.115. As soon as I downgrade the clients nfs
works fine. But I have noticed couple other problems especialy with
knfsd.

1. If you try to export/mount two direstories in the same subtree, it
doesnt work... eg. (try exporting and mounting) /usr/test1/ and
/usr/test2/. This used to work in earlier 2.1.x kernels.

2. All executables core dump when executed over nfs mounted filesystem.
(executed on a client machine). here is and strace of a coredump.

strace ./testd
mmap(0, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|0x20, 4294967295, 0) =
0x40007000
--- SIGSEGV (Segmentation fault) ---
+++ killed by SIGSEGV +++

I have tried mounting nfs filesystems with different rsize,wsize optins
but no go. I would appriciate if anybody could explain what is going on
and how to fix these problem. NFS is extremly important in the enviroment
I use.

Thanks
Fil

On Fri, 14 Aug 1998, Ben McCann wrote:

>Date: Fri, 14 Aug 1998 08:35:36 -0400
>From: Ben McCann <bmccann@indusriver.com>
>To: Bill Hawes <whawes@transmeta.com>,
Larry McVoy <lm@bitmover.com>
>Cc: Linux-kernel <linux-kernel@vger.rutgers.edu>
>Subject: Re: NFS Data CORRUPTION Between Linux and SunOS 5.5.1
>
>Sorry, I forgot to include a description of the corruption itself.
>I have build 'good' and 'bad' versions of the file and compared
>them. The corruption always follows the same pattern and multiple
>corruptions have been seen in the file:
>
>1. The corruption always begins on a 4096 byte aligned offset
>in the file (i.e. on a page boundary).
>
>2. 1, 2, or 3 bytes of ZERO are written at the beginning of the page
>and the rest of the page is SHIFTED by that amount. (When we first
>saw this we thought a SCSI controller was failing on the Sun
>server but we've not had any problems with data written via
>NFS to this Sun from a bunch of WinNT boxes we have here. And,
>as I said earlier, 2.1.84 works fine).
>
>3. The location of the smashed page or pages is random. The first
>is usually 4 or 5 megabytes into the file (which is 11M long) but
>occasionally it is only 56K into the file.
>
>4. The number of corrupted blocks in a 11M file is small, like
>5 or 10.
>
>
>Hope this provides a clue. I couldn't fathom why the data was
>SHIFTED because that implies the page was COPIED someplace.
>How many places in the NFS logic COPY entire pages? Perhaps that
>is a place to look.
>
>
>Now, a few questions:
>
>1. How do I vary the NFS block size? (Larry asked that I try that).
>
>2. How can I tell if I am using UDP versus TCP? I've done NOTHING
>to explicitly configure NFS. We just use RedHat 5.0 out of the box
>with the 2.1.X kernels.
>
>3. Given I can determine UDP vs. TCP, how do I change it to the
>other? Can I assume SunOS 5.5 supports both?
>
>
>I'll run the NFS debug log experiment today and send you both the
>diff's.
>
>Last, we have CONFIG_NFS_FS and CONFIG_NFSD setup as kernel modules
>and we have the RPM's 'nfs-server-2.2beta29-2' and
>'nfs-server-clients-2.2beta29-2' installed.
>
>-Ben McCann
>
>
>--
>Ben McCann Indus River Networks
> 31 Nagog Park
> Acton, MA, 01720
>email: bmccann@indusriver.com web: www.indusriver.com
>phone: (978) 266-8140 fax: (978) 266-8111
>
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to majordomo@vger.rutgers.edu
>Please read the FAQ at http://www.altern.org/andrebalsa/doc/lkml-faq.html
>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.altern.org/andrebalsa/doc/lkml-faq.html