Re: [NFS] continuous logging from 2.2.14 knfsd

From: Patrick J. LoPresti (patl@cag.lcs.mit.edu)
Date: Wed Jan 12 2000 - 18:57:19 EST


Dave Higgen <dhiggen@valinux.com> writes:

> Basically this means that the server could not find a dentry
> corresponding to the file handle in any of the caches, AND was
> unable to reconstruct a dentry by walking up the tree from the
> located inode. It might be possible that you have a corrupted
> filesystem where a directory has become disconnected from the root
> somehow... if you have an opportunity, it would be useful to run
> fsck on the filesystem. (The device is major 0x30, minor 0x12, from
> the message.)

I will give this a try when I get the chance. We have seen no other
evidence of corruption, however; the same operation which caused the
problem succeeded the second time.

> It is strange that it repeats so much though; once the client has gotten
> a stale file handle back, one would expect it would discard its local NFS
> file handle. What's the client?

Also stock Linux 2.2.14.

> Alternatively there may be Yet Another Bug in the nfsd file handle
> processing... the sad truth is that the file handle processing code in
> ALL existing 2.2.X kernels is a Festering Heap of Wombat Turds.

This is my guess, actually. knfsd seems somewhat fragile, especially
on SMP systems (like our file server). Here are some later log
messages from the same problem. We received several hundred thousand
of these:

Jan 12 16:31:36 kernel: lookup_by_inode: ino 1080899 not found in elaborate
Jan 12 16:31:36 kernel: find_fh_dentry: 30:12/1080899 dir/1080856 not found!
Jan 12 16:31:36 kernel: lookup_by_inode: ino 1080899 not found in elaborate
Jan 12 16:31:36 kernel: find_fh_dentry: 30:12/1080899 dir/1080856 not found!
Jan 12 16:31:36 kernel: lookup_by_inode: ino 1080899 not found in elaborate
Jan 12 16:31:36 kernel: find_fh_dentry: 30:12/1080899 dir/1080856 not found!
Jan 12 16:31:36 kernel: lookup_by_inode: ino 1080899 not found in elaborate
Jan 12 16:31:36 kernel: find_fh_dentry: 30:12/1080899 dir/1080856 not found!
Jan 12 16:31:36 kernel: lookup_by_inode: ino 1080899 not found in elaborate
Jan 12 16:31:36 kernel: find_fh_dentry: 30:12/1080899 dir/1080856 not found!
Jan 12 16:31:36 kernel: lookup_by_inode: ino 1080899 not found in elaborate

The "elaborate" directory was one of the ones etags was processing;
the user was actively editing files within when the disaster occurred.
I am guessing some rename or deletion happened at just the wrong time,
triggering a race condition in knfsd.

> Neil Brown has done a heroic job of cleaning this up, and we hope to
> get his fixes into a stable release as soon as possible: but his
> fixes are tied in with other NFS3 stuff & needs a bit more airtime
> before it's ready for a stable kernel....

It would be nice if several people could audit whatever knfsd stuff
shows up in the stable kernels, particularly with an eye towards
concurrency issues. (It seems that all of our problem NFS boxes are
SMP systems.)

It would also be nice to see a working knfsd in kernel 2.2.x...

 - Pat

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sat Jan 15 2000 - 21:00:20 EST