In every case but one, the last 1, 2, or 3 bytes of the smashed page were
gone (i.e. shifted off the end of this 4096 byte 'shift register')
and the next page was perfectly intact. There was one exception, when
TWO adjacent pages were shifted TOGETHER. But that only happened once
and I'm only 97% certain I actually saw it.
>
> > 3. The location of the smashed page or pages is random. The first
> > is usually 4 or 5 megabytes into the file (which is 11M long) but
> > occasionally it is only 56K into the file.
> >
> > 4. The number of corrupted blocks in a 11M file is small, like
> > 5 or 10.
> >
> > Hope this provides a clue. I couldn't fathom why the data was
> > SHIFTED because that implies the page was COPIED someplace.
> > How many places in the NFS logic COPY entire pages? Perhaps that
> > is a place to look.
>
> When you say that the corruption is random, does this mean that sometimes the
> file is written correctly? It would be very helpful if you could capture
> (e.g. tcpdump) two sessions writing the file, one with the corruption and one
> when it's correct.
The corruption is random in that it smashes different pages on different
trials. However, the failure itself is about 100% repeatable.
I'll try to get some test runs with TCPDUMP next week. However, since I
can't get it to succeed with SunOS, it will have to be runs of NFS to
Linux versus NFS to SunOS. Is that information still usable?
-Ben McCann
-- Ben McCann Indus River Networks 31 Nagog Park Acton, MA, 01720 email: bmccann@indusriver.com web: www.indusriver.com phone: (978) 266-8140 fax: (978) 266-8111- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.altern.org/andrebalsa/doc/lkml-faq.html