I've attached mail which I posted to the linux kernel mailing list
about a month ago. I have some additional information to report
which I hope you can forward to the appropriate Linux NFS guru.
We see NFS data corruption between an Linux NFS client and a SunOS
NFS server. It occurs when running 'ld' which, I assume, does
extensive random access to the file. Under 2.1.102, our test case
fails with almost EVERY link with 'ld'. (BTW, it works fine with
2.1.84).
I was unable to reexamine this problem until this week so I thought
any further testing of 2.1.102 was silly given 2.1.121 has been
released. So, I've retested with 2.1.121 compiled for both UP and
SMP. The problem is MUCH better, but it still occurs. I ran 'ld'
over our test set of objects, writing the final executable to an
NFS mounted file system. I had 3 failures in 120 trials.
As before, the corruption always happens exactly on a 4K offset
in the file. The corruption takes a 4K block of the file and
shifts it down in memory 1, 2, or 3 bytes, inserting zeros at
the beginning of that page.
I read on the list that substantial cleanup has occurred in the
IP and NFS areas in the last 20 point releases. They've helped.
Can those developer's look at those changes, and this failure
mode, to guess where they might have missed one more fix?
-Ben McCann
--
Ben McCann Indus River Networks
31 Nagog Park
Acton, MA, 01720
email: bmccann@indusriver.com web: www.indusriver.com
phone: (978) 266-8140 fax: (978) 266-8111
--------------7C0C6B2DC3EC35AC68E8F1B0
Content-Type: message/rfc822
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Return-Path: <bmccann@indusriver.com>
Received: from indusriver.com (209.6.112.94) by mcfeeley.indusriver.com (Worldmail 1.3.167); 13 Aug 1998 18:12:48 -0400
Message-ID: <35D364D2.D3662ECD@indusriver.com>
Date: Thu, 13 Aug 1998 18:12:34 -0400
From: Ben McCann <bmccann@indusriver.com>
X-Mailer: Mozilla 4.05 [en] (Win95; I)
MIME-Version: 1.0
To: "linux-kernel@vger.rutgers.edu" <linux-kernel@vger.rutgers.edu>
Subject: NFS Data CORRUPTION Between Linux and SunOS 5.5.1
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
We use Linux 2.1.x for software development where Linux workstations
NFS mount filesystems on a Sun UltraSparc server. The Ultra runs
SunOs 5.5.1.
We ran 2.1.84 with no problems. We recently upgraded our build
environment to 2.1.102. (We've been using 2.1.102 in application
testing for a couple of months so we decided it was stable enough
to use for compiling and linking too).
Immediately after upgrading, we noticed that our executable files
were corrupted during the link phase of a build. Remember that
the objects and the executable are all stored on the UltraSparc
server. If we link under Linux 2.1.84 then there is no corruption
and if it is 2.1.102 then there IS corruption.
====> I've repeated this with 2.1.115 so the bug is still alive
====> in the latest edition of the kernel.
This is a very puzzling bug. We do NOT see corruption when we link
directly to the local hard drive and we don't see corruption when
we NFS mount another 2.1.102 Linux box and link on its file system.
The only corruption occurs when running 'ld' under 2.1.102 (or
2.1.115) and writing the executable to a SunOS 5.5.1 NFS server.
(BTW, we using GNU ld version 2.8.1 (with BFD linux-2.8.1.0.1)).
I can spend some time helping with a 'remote debug' of this problem
if there are tools, logs, debug switches, etc, that can be thrown
to gather data here. I also have a set of objects which I can probably
ship to a Linux developer to reproduce this bug. He/she just needs a
SunOS box handy. Alternatively, the NFS/TCP/UDP developer's can try
to track the source differences between 2.1.84 and 2.1.102.
IMHO, its a serious problem which needs attention.
-Ben McCann
-Ben McCann
--
Ben McCann Indus River Networks
31 Nagog Park
Acton, MA, 01720
email: bmccann@indusriver.com web: www.indusriver.com
phone: (978) 266-8140 fax: (978) 266-8111
--------------7C0C6B2DC3EC35AC68E8F1B0--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/