Re: better lookup for knfsd

Olaf Kirch (okir@monad.swb.de)
Wed, 19 Nov 1997 11:33:16 +0100


On Tue, Nov 18, 1997 at 08:22:26AM -0500, Bill Hawes wrote:
> Since knfsd is now passing inode and parent inode numbers in the
> filehandle, when the dentry validation fails I can just do a lookup to
> find the dentry. For now I'm leaving the dentry cacheing code in place,
> but it probably won't be needed after I implement a patch-up list to
> save lookup information.

You will need it. After a reboot, clients may continue to use file handles
referencing the old dentry for a long time: think of the file handle
of the mount point itself, even though that's not a huge problem as
it's only used in lookup calls which tend to make up 1% of NFS traffic.
Worse, think of someone running long-lived applications off an NFS-mounted
directory (mail readers, editors, CAD programs...). You really don't
want to do a bottom-up path reconstruction on each page-in.

If you use a reasonable hash algorithm, you may even be better off leaving
the dentry out of the FH altogether and use just the ino/dev-to-dentry
cache.

There's one problem with the cache, though. Entries in the cache must have
a _very_ short TTL on the order of 2 seconds. One of the most
frequently asked questions about unfsd was `why can't I unmount my CDROM?',
which was because unfsd was holding file descriptors to open fds on it...

Still, no-one has answered me why knfsd can't use the following approach:

/* Look up super block for device */
sb = get_super_block(fh->fh_dev);

/* Look up inode by number */
inode = __iget(sb, fh->fh_ino, 0)

/* Create a fancy name for it... */
sprintf(namebuf, "%x-%d", fh->fh_dev, fh->fh_ino);
/* ... and create the qstr hash of it */
/* (see namei.c) */

/* Try to look it up under fake knfsd root */
dentry = d_lookup(fakeroot, &qstr);
if (dentry == NULL) {
/* not found - create and attach it */
dentry = d_alloc(fakeroot, namebuf);
d_add(fakeroot, dentry);
}

This should produce a valid dentry (this is also what the procfs does
for stuff like the fd/* and cwd entries).

The only thing to be aware of is that knfsd's lookup routine should _not_
go through the lookup cache because a lookup of .. would give it the
fakeroot. Instead, it should call inode->i_op->lookup() directly, as
it used to do. This also delegates the burden of directory caching to
the NFS clients where it belongs.

Advantages of this approach:

* It scales
* File handle remains valid across renames
* File handle remains valid across reboots
* Doesn't require costly operations like readdir
* Doesn't require any cache on behalf of knfsd
* Doesn't look like unfsd gone kernel-mode.

Disadvantages:

* fakeroot may become a contested resource (you need to
down its semaphore for the duration of dentry construction).
* Clutters the dcache with useless entries.

There's an alternative approach which may even eliminate the last two
items. Instead of storing the inodes under fakeroot by "<dev>-<ino>",
each knfsd thread could keep a pool of two or three dentries into
which it inserts the inodes it retrieves via __iget(). Of course, it
would have to make sure that after a lookup operation, all children
accumulated below the inode get d_drop'ed. I'm not sure whether this
will work, however.

Olaf

-- 
Olaf Kirch         |  --- o --- Nous sommes du soleil we love when we play
okir@monad.swb.de  |    / | \   sol.dhoop.naytheet.ah kin.ir.samse.qurax
okir@caldera.de    +-------------------- Why Not?! -----------------------