malcolmc> I have a little time in the next week or so to play with
malcolmc> this, so if you want to pass on those logging patches, I
malcolmc> can maybe have a go..
Hm. I have lost most of my patches, but that's OK because they did
not show anything interesting. Appended to this mesasge is the one
you want. Also in this message is a brain-dump of my attempts to
debug this thing.
The patch is to nfsd.c:nfsd_nfsproc_write_2(). The dispatcher in
nfs_dispatch.c calls this function when the RPC libraries indicate
that we have received a write request. All this patch does is to
treat any write error as being worthy of logging.
The logged info includes the error code (Note: NFS error code, not
Linux errno), the pseudo-inode of the request as received at the
beginning of the function, the pseudo-inode of the request as visible
at the end of the function, the IP address (in hex) of the request's
origin, the full path of the file which we tried to write, and the
amount of data which we tried to write.
This patch is to the current (from the updates) NFS server for RedHat
5.1, which is 2.2beta29. However, I did glance at beta34 or so once
and I do not think the code has changed much, so it should be easy to
apply this patch to whatever version you are using.
Here is how to use it:
Make your Solaris box generate the error. When Solaris sees an
"impossible" error return from a NFS write (e.g., "is a directory"),
it says "what the hell?" and logs a kernel message. That message
includes the file handle which the Solaris box tried to use. The file
handle is a 32 byte cookie generated by the Linux NFS server. Its
first four bytes are the file's pseudo-inode *in x86 (little-endian)
order*. The pseudo-inode uniquely identifies the file, more or less
(i.e., unless you get very unlucky... but that is a different problem
which has nothing to do with any of this).
Meanwhile, the Linux nfsd will log a message thanks to the patch, and
you can compare the pseudo-inode which the Sun thinks it was writing
to the one which nfsd_nfsproc_write_2() saw. You should find them to
be different. Note that the patch outputs the pseudo-inode as a
32-bit number in hex, which will show up on your screen as big-endian.
So remember: The value printed by the Sun is little-endian, and the
value printed by the PC is big-endian (how ironic).
You can find out the pseudo-inode for any file (or directory) by doing
"ls -i" on the file through NFS. This will let you identify with
certainty the actual file the Sun was trying to write (you should
already have a good guess based on what the Sun was doing at the
time).
So what does it all mean?
Well, nfsd_nfsproc_write_2() gets called directly after the RPC
libraries return with the RPC request. And the information in that
request is ALREADY corrupt. So something very very strange is going
on here. Maybe nfsd is doing something funny with memory. Maybe the
RPC libraries don't like some detail of some Linux system call's
semantics. Maybe there is an outright bug in the libraries or the
kernel.
Here are some things I found suspicious, but could not pin down as
causing any particular problem.
First, the use of alarm() in nfsd to get pseudo-concurrency. This is
the first thing I checked, and I can guarantee you it is not the
problem. (I rewrote everything to be completely serial and it made no
difference; also, strace() almost always shows no alarm happening
anywhere except during select().) So don't bother looking for
concurrency problems.
Second, this weirdness from the top of nfsd.c:
#undef NFS_MAXDATA
#define NFS_MAXDATA (16 * 1024)
This is weird because NFS_MAXDATA is defined in a header file as
8*1024, and that value is used elsewhere. So redefining it just for
nfsd.c seems like a disaster waiting to happen. I spent a while
trying to figure out whether this could cause our problem, but I did
not see it. (And very soon found myself tracing through the RPC code
in glibc, which is loads of not fun.)
Third, the code in nfsd_nfsproc_readdir_2() (the readdir handler)
looks very ominous to me in its handling of memory. But again, I
could not identify an actual error. (Also, running with MALLOC_CHECK_
set turned up nothing, for what little that's worth. Electric Fence,
anyone?)
And that's about it. Oh, one other thing: Don't bother trying to find
inconsistencies between nfs_prot_xdr.c and nfs_prot.h (like I did)
because both are machine-generated from nfs_prot.x with rpcgen.
Whoops.
Anyway, good luck!
- Pat
diff -u -r ./nfsd.c /scratch/patl/nfs-server-2.2beta29/nfsd.c
--- ./nfsd.c Mon Aug 31 17:10:59 1998
+++ /scratch/patl/nfs-server-2.2beta29/nfsd.c Thu Aug 6 14:16:03 1998
@@ -497,32 +497,50 @@
nfsstat status;
fhcache *fhc;
int fd;
+ psi_t arg_psi = ((svc_fh *)&argp->file)->psi;
fhc = auth_fh(rqstp, &(argp->file), &status, CHK_WRITE | CHK_NOACCESS);
if (fhc == NULL)
return status;
- if ((fd = fh_fd(fhc, &status, O_WRONLY)) < 0) {
- return ((int) status);
- }
+ if ((fd = fh_fd(fhc, &status, O_WRONLY)) < 0)
+ goto done;
+
errno = 0;
(void) lseek(fd, (long) argp->offset, L_SET);
if (errno == 0) { /* We should never fail. */
- if (write(fd, argp->data.data_val, argp->data.data_len) !=
- argp->data.data_len) {
- Dprintf(D_CALL, " Write failure, errno is %d.\n", errno);
- }
+ if (write(fd, argp->data.data_val, argp->data.data_len) !=
+ argp->data.data_len) {
+ Dprintf(D_CALL, " Write failure, errno is %d.\n", errno);
+ }
}
fd_inactive(fd);
if (errno)
- return (nfs_errno());
+ {
+ status = nfs_errno();
+ goto done;
+ }
/* Write record to syslog */
if (argp->offset == 0 && log_transfers)
nfsd_xferlog(rqstp, ">", fhc->path);
- return (fhc_getattr(fhc, &(result.attrstat.attrstat_u.attributes),
- NULL, rqstp));
+ status = fhc_getattr
+ (fhc, &(result.attrstat.attrstat_u.attributes), NULL, rqstp);
+ /* fall through */
+ done:
+ if (status != NFS_OK)
+ {
+ psi_t fh_psi = fhc->h.psi;
+ struct in_addr
+ caddr = svc_getcaller(rqstp->rq_xprt)->sin_addr;
+
+ Dprintf (L_ERROR,
+ "write failed: status %d apsi %08x fpsi %08x clnt %08x file %s len %d\n",
+ status, arg_psi, fh_psi, caddr, fhc->path, argp->data.data_len);
+ }
+
+ return ((int) status);
}
/* This used to be O_RDWR, but O_WRONLY is correct */
-
To unsubscribe from this list: send the line "unsubscribe linux-net" in
the body of a message to majordomo@vger.rutgers.edu