Re: Many open/close on same files yeilds "No such file or directory".

From: Neil Brown
Date: Sun May 11 2008 - 21:54:21 EST


On Friday May 9, jesper@xxxxxxxx wrote:
>
> When I disabled the NFS-server and rand my "real-world" program on a
> single processor (make -j 1). It ran through fine. It basically
> gets around 20 million chunks out of differnet file and assemble the
> chuncks in a few other files. This processes more or less 5 individual
> sections, so make can run effectively with a concurrency of 5.

(For linux-nfs readers: the problem is that repeatedly opening a given
file sometimes returns a ENOENT - http://lkml.org/lkml/2008/5/9/15).

The mention of an NFS-server made my ears prick up...

Do I understand correctly that the problem only occurs when you have
48 clients hammering away at the filesystem in question?

Could the clients be accessing the same file that you are experiencing
problems with? Or one of the directories in the path (if so, how
deep).

How many different files to these 20 million chunks come from? And
how does that number compare with the first number from
grep dentry /proc/slabinfo
??

The NFS server does some slighty strange things with the dcache if the
object being access is not in the cache.

Also, can get a few instances of
grep '^fh' /proc/nfs/rpc/nfsd

while things are going strange. The numbers are:
* fh <stale> <total-lookups> <anonlookups> <dir-not-in-dcache> <nondir-not-in-dcache>

That will show us if it is looking for things that aren't in the
dcache.

Finally, if the filesystem export with "subtree_check" or
"nosubtree_check"?
Does it make a difference if you switch the setting of this flag and
re-export?

NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/