attempt to re-implement sillyrename for NFS in 2.1.*

Claus-Justus Heine (Heine@physik.rwth-aachen.de)
21 Aug 1997 22:59:51 +0200


--pgp-sign-Multipart_Thu_Aug_21_22:59:39_1997-1
Content-Type: multipart/mixed;
boundary="Multipart_Thu_Aug_21_22:59:39_1997-1"
Content-Transfer-Encoding: 7bit

--Multipart_Thu_Aug_21_22:59:39_1997-1
Content-Type: text/plain; charset=US-ASCII

There recently were some postings about -ESTALE errors with e.g. "Here"
scripts on NFS file systems with 2.1.* and problems with nis and some
other problems.

The reason for those errors is the missing support for that
"sillyrename" stuff that we had with the 2.0.* implementation of the
nfs code.

I dared to try to re-implement it. The hack included below has the
following features:

The 2.0.* sillyrename code uses ".nfs#ino" for the temporary files
where "#ino" is the DECIMAL representation of "inode->i_ino". Now,
"inode->i_ino" has type "unsigned long" and thus may be 64 bit
wide. The old code uses only 16 chars for the ".nfs#ino" name which
will not be enough for 64 bits. Therefore I switched to a HEXADECIMAL
representation with enough space for a 64 bit value. Thus the number
of chars for ".nfs#ino" is simply "sizeof(unsigned long)*2+5".

The patch allows for one temporary ".nfs#ino-number" file for each
directory an inode was linked into. This is necessary as the nfsd (at
least the Linux user-land nfsd) generates different file handles for
the same inode if it is contained in different directories (i.e. with
cross-directory links); the nfsd also generates a different NFS
file-handle when a file is renamed across directory boundaries. I
tried to support the latter following a suggestion of Olaf Kirch:

do a "sillyrename()" in the original directory, then create a link to
the new location.

Of course, sillyrename() is only needed when a file is held open by
some application while it is moved or unlinked.

As a check whether a file indeed is in use, I tried the following
construct:

if (dentry->d_inode->i_count > 1 || dentry->d_count > 1) {
then this inode is used by more than one application
}

I had the impression that this is sufficient, because:

Each dentry on the alias list has called iget() separately. Therefore
i_count >= #dentries on the alias list of that inode. Please FIXME.

There are quite some comments in the patch to fs/nfs/dir.c

Below there is also a short script that tries to create test cases for
the patch, i.e. it creates multiply linked files held open by
applications while the inode is unlinked, and moves files accross
directory boundaries while they are still being accessed by some other
application. The script is ugly enough, but maybe sufficient to get an
idea of the problem (when running it on plain 2.1.51 compared to the
patched version).

Not to be forgotten: the patch is relative to a plain 2.1.51 kernel
and might work or might not work. Test it yourself.

Cheers

Claus

--Multipart_Thu_Aug_21_22:59:39_1997-1
Content-Type: application/octet-stream
Content-Disposition: attachment; filename="foo.sh"
Content-Transfer-Encoding: 8bit

#!/bin/sh
cd /tmp
#
# first test case: moving files to different directories.
# Problem: the Linux user-land nfsd creates a different filehandle
# for the same inode if it is moved to another directory.
# Result: -ESTALE
# Solution: rename the original file to .nfs#ino and link to it to new
# destination. We allow one .nfs#ino file for each directory and inode!
#
for i in 1 2 3 4
do
mkdir foo$i
done
echo foo > foo
tail -f foo > /dev/null &
mv foo foo1
for i in 1 2 3 4 1 2 3 4
do
tail -f foo$i/foo > /dev/null &
mv foo$i/foo foo$(( $i%4 + 1))/foo
done
rm -f foo1/foo
echo
echo There should be .nfs... files:
echo
ls -a .nfs*
ls -a foo?/.nfs*
#
# The following MUST NOT succeed
#
echo
echo expect error messages when trying to remove the foo directories ...
echo
if rm -rf foo*
then
exit 1
fi
#
# now kill the "tail -f" programs. The .nfs#ino files should go away.
# If you don't have "skill" use "killall" or something appropriate
#
skill -TERM tail
echo
echo Now the .nfs files should have vanished!
echo
ls -a .nfs*
ls -a foo?/.nfs*
#
# they should have gone away
#
echo
echo And it should be no problem to delete all foo directories
echo
if ! rm -rf foo*
then
exit 1
fi
ls -a foo?
#
# test case two: link a file to multiple directories.
#
#
echo foo > foo
for i in 1 2 3 4
do
mkdir foo$i
ln foo foo$i/foo
tail -f foo$i/foo > /dev/null &
done
#
# unlink the original files
#
rm -rf foo* > /dev/null 2>&1
#
# .nfs#ino files should show up
#
echo
echo Again, some .nfs files ...
echo
ls -a .nfs* foo?/.nfs*
#
# kill the tails
#
skill -TERM tail
#
# should have gone away ...
#
echo
echo which should have gone away now.
echo
ls -a .nfs* foo?/.nfs*
#
# clean up
#
rm -rf foo?

--Multipart_Thu_Aug_21_22:59:39_1997-1
Content-Type: application/octet-stream
Content-Disposition: attachment; filename="linux-2.1.51.silly.dif"
Content-Transfer-Encoding: 8bit

--- linux-2.1.51/include/linux/nfs_fs_i.h Mon Aug 18 00:09:44 1997
+++ linux-2.1.51-nfs-swap/include/linux/nfs_fs_i.h Thu Aug 21 19:22:43 1997
@@ -4,6 +4,11 @@
#include <linux/nfs.h>
#include <linux/pipe_fs_i.h>

+struct silly_inodes_list {
+ struct silly_inodes_list *next;
+ struct inode *silly_inode; /* the parent inode */
+};
+
/*
* nfs fs inode data in memory
*/
@@ -51,7 +56,15 @@
* Instead of the directory inode, we might as well keep
* its NFS FH, but that requires a kmalloc.
*/
+#if 0
struct inode * silly_inode;
+#else
+ /* to handle the rename stuff properly, we need a list of all
+ * parent directories the inode was contained in and unlinked
+ * while still active
+ */
+ struct silly_inodes_list *silly_inodes;
+#endif

/*
* This is the list of dirty unwritten pages.
--- linux-2.1.51/fs/nfs/dir.c Thu Aug 21 19:19:28 1997
+++ linux-2.1.51-nfs-swap/fs/nfs/dir.c Thu Aug 21 22:25:41 1997
@@ -7,6 +7,10 @@
*
* 10 Apr 1996 Added silly rename for unlink --okir
* 28 Sep 1996 Improved directory cache --okir
+ * 21 Aug 1997 Claus Heine claus@momo.math.rwth-aachen.de
+ * Re-implemented silly rename for unlink, newly implemented
+ * silly rename for nfs_rename() following the suggestions
+ * of Olaf Kirch (okir) found in this file.
*/

#include <linux/sched.h>
@@ -378,10 +382,68 @@

inode = NULL;
if (!error) {
- error = -ENOENT;
inode = nfs_fhget(dir->i_sb, &fhandle, &fattr);
if (!inode)
return -EACCES;
+
+ /* Ok, in principle the sillyrename() stuff is working
+ * now, BUT: how to get rid of those ".nfs#ino" files????
+ * The new dcache code will keep them on the unused list,
+ * even after all processes have closed the file descriptors.
+ * Only if we don't allow those dentries to show up on the
+ * hash list of the parent dir, then we can be sure that
+ * the ".nfs#ino" files are REALLY deleted.
+ *
+ * Basically, there are two possibilities:
+ *
+ * i) Turn the lookup into a negative lookup. This gives
+ * lots of ugly "no such file or directory"
+ * messages, because the readdir() stuff provides
+ * e.g. the ls command with the ".nfs#ino" names,
+ * but then nfs_lookup() says: "Hey, there's no
+ * such file". One would have to change the
+ * readdir() stuff to hide those
+ * files. Mmh. Maybe the cleanest solution, but
+ * readdir() would have to do a lookup on each
+ * ".nfs..." file.
+ *
+ * ii) Allow those .nfs files to show up, but DON'T
+ * enqueue them on the hash list of the parent
+ * directory. This way a dput() will call iput()
+ * before putting the entry on the unused list.
+ * Ok, this disables hashed lookups of ".nfs#ino"
+ * files and thus produces network traffic. BUT:
+ * doing lookups from inside the readdir() stuff
+ * would produce even more traffic. Drawback: the
+ * ".nfs#ino" files are visible. But they can't
+ * be moved nor deleted; the nfs_rename() and
+ * nfs_unlink() stuff takes care of this.
+ *
+ *
+ */
+ if (NFS_RENAMED_DIRS(inode)) {
+ char silly[sizeof(inode->i_ino)*2+sizeof(".nfs")];
+ int slen;
+
+ slen = sprintf(silly, ".nfs%lx", inode->i_ino);
+ if (dentry->d_name.len == slen &&
+ !strncmp(dentry->d_name.name, silly, slen)) {
+#if 0
+ /* this causes "no such file or directory"
+ * messages
+ */
+ iput(inode);
+ inode = 0;
+#else
+ /* and this makes the ".nfs#ino" files
+ * visible
+ */
+ d_instantiate(dentry, inode);
+ d_drop(dentry); /* remove from any hash list */
+ return 0;
+#endif
+ }
+ }
} else if (error != -ENOENT)
return error;

@@ -531,8 +593,162 @@
}

/*
- * We should do silly-rename here, but I'm too lazy to fix
- * up the directory entry implications of it..
+ * This is used by nfs_unlink() and by nfs_rename().
+ *
+ * nfs_unlink() needs it when the file to be unlinked is still held open
+ * by somebody else.
+ *
+ * nfs_rename() needs it when moving files held open by somebody else
+ * to another directory, 'cause the user level nfs daemon changes the
+ * nfs file handle in this case.
+ *
+ * "THIS SILLY RENAME STUFF IS REALLY TROUBLESOME WITH THE NEW DCACHE CODE!"
+ * I thought, but, hey, it's not that difficult.
+ *
+ * Note: dentry->d_inode MUST exist (it does!)
+ *
+ * To handle linking and moving to different directories correctly, we must
+ * be able to create a .nfs#ino file in each directory that contains the
+ * file the inode belongs to. Again: the userland nfs daemon uses different
+ * file handles if the inode is moved or linked into another directory.
+ *
+ * FIXME: Is it necessary to lock that silly_inodes list? I don't think so,
+ * because the calling functions do_rename() and do_unlink() seem to do
+ * locking.
+ *
+ * Return values: -EBUSY: the caller tries to manipulate the .nfs#ino file
+ * itself. This causes the calling function (either
+ * nfs_unlink or nfs_rename) to fail.
+ * 0 : We actually did a silly rename
+ * everything else : The caller should proceed as usual.
+ */
+
+static int nfs_sillyrename(struct inode *dir, struct dentry *dentry)
+{
+ char silly[sizeof(dir->i_ino)*2+sizeof(".nfs")];
+ int slen, error;
+ struct silly_inodes_list *sinode;
+ struct dentry *sdentry;
+ struct qstr sqstr;
+
+ if (dentry->d_inode->i_count == 1 && dentry->d_count == 1) {
+ return -EIO; /* No need to silly rename. */
+ }
+
+ slen = sprintf(silly, ".nfs%lx", dentry->d_inode->i_ino);
+
+ if (dentry->d_name.len == slen &&
+ !strncmp(dentry->d_name.name, silly, slen)) {
+ return -EBUSY; /* don't allow to unlink silly_inode itself */
+ }
+
+ for (sinode = NFS_RENAMED_DIRS(dentry->d_inode);
+ sinode != NULL;
+ sinode = sinode->next) {
+ if (sinode->silly_inode == dir) {
+ return -EIO; /* may unlink */
+ }
+ }
+
+ /* I want to use d_move() and therefore I also need a
+ * new dentry for the .nfs#ino "silly file"
+ *
+ * lookup_dentry() cannot be used, as this would try to get
+ * the directory semaphore. But we do_rename() and do_unlink()
+ * already hold the directory lock.
+ */
+ sqstr.name = silly;
+ sqstr.len = slen;
+ sqstr.hash = 0; /* shouldn't be necessary */
+ sdentry = d_alloc(i_dentry(dir), &sqstr);
+ if (sdentry == NULL)
+ return -EIO; /* Mmmh. */
+
+ /* the name length has already been checked by the caller
+ */
+ error = nfs_proc_rename(NFS_SERVER(dir),
+ NFS_FH(dir), dentry->d_name.name,
+ NFS_FH(dir), silly);
+ if (error) {
+ dput(sdentry);
+ return error;
+ }
+
+ nfs_invalidate_dircache(dir);
+ d_move(dentry, sdentry);
+ dput(sdentry);
+ d_drop(dentry);
+
+ /* Enqueue the parent dir into the silly_inodes list of the
+ * renamed inode. "struct silly_inodes_list" holds only two
+ * pointers, but the kmalloc will eat 32 bytes.
+ */
+ sinode = kmalloc(sizeof(struct silly_inodes_list), GFP_KERNEL);
+ if (sinode == NULL) {
+ return 0; /* don't unlink, but no cleanup possible :-( */
+ }
+ sinode->silly_inode = dir;
+ sinode->next = NFS_RENAMED_DIRS(dentry->d_inode);
+ NFS_RENAMED_DIRS(dentry->d_inode) = sinode;
+
+ dir->i_count++; /* lock the parent directory */
+
+ return 0; /* don't unlink */
+}
+
+/* Note: we call nfs_proc_remove() directly rather than nfs_unlink()
+ * as the latter would require a dentry as argument. But we don't
+ * need to do a lookup here, because we know the name of the file to
+ * remove. Ok. The lookup idea would be silly anyways as it would
+ * produce unnecessary races.
+ *
+ * What to do in case of an error? We fake success. Fixme.
+ *
+ * Hopefully there is no locking necessary for the silly inodes list. Mmh.
+ *
+ * One last note:
+ *
+ * the new dcache keeps dentries for quite some time, until
+ * shrink_dcache() is called. That way, the ".nfs#ino" files would be kept
+ * for quite a while, until the last dentry belonging to "inode" is
+ * released.
+ *
+ * To cope with this active .nfs#ino files are NOT enqueued into the hash
+ * list of the parent directory (nfs_lookup() takes care of that) and thus
+ * are not cached at all.
+ */
+
+void nfs_sillyrename_cleanup(struct inode *inode)
+{
+ struct silly_inodes_list *silly_inode = NFS_RENAMED_DIRS(inode);
+ struct inode *dir;
+ char silly[sizeof(inode->i_ino)*2+sizeof(".nfs")];
+ int error, slen;
+
+ slen = sprintf(silly, ".nfs%lx", inode->i_ino);
+ while (silly_inode) {
+ dir = silly_inode->silly_inode;
+ nfs_invalidate_dircache(dir);
+ error = nfs_proc_remove(NFS_SERVER(dir), NFS_FH(dir), silly);
+
+ if (error < 0)
+ printk("NFS " __FUNCTION__ " failed (err = %d)\n",
+ -error);
+ iput(dir); /* even in case of error ? */
+
+ silly_inode = silly_inode->next;
+ kfree(NFS_RENAMED_DIRS(inode));
+ NFS_RENAMED_DIRS(inode) = silly_inode;
+ }
+}
+
+/* We do silly rename. In case sillyrename() returns -EBUSY, the inode
+ * belongs to an active .nfs#ino file and we return -EBUSY.
+ *
+ * If sillyrename() returns 0, we do nothing, otherwise we unlink.
+ *
+ * inode->i_nlink is updated here rather than waiting for the next
+ * nfs_refresh_inode()
*/
static int nfs_unlink(struct inode *dir, struct dentry *dentry)
{
@@ -549,12 +765,21 @@
if (dentry->d_name.len > NFS_MAXNAMLEN)
return -ENAMETOOLONG;

- error = nfs_proc_remove(NFS_SERVER(dir), NFS_FH(dir), dentry->d_name.name);
- if (error)
- return error;
+ error = nfs_sillyrename(dir, dentry);
+
+ if (error == -EBUSY) {
+ return -EBUSY; /* tried to unlink silly_inode */
+ } else if (error < 0) {
+ error = nfs_proc_remove(NFS_SERVER(dir),
+ NFS_FH(dir), dentry->d_name.name);
+ if (error < 0)
+ return error;
+
+ dentry->d_inode->i_nlink --;
+ nfs_invalidate_dircache(dir);
+ d_delete(dentry);
+ }

- nfs_invalidate_dircache(dir);
- d_delete(dentry);
return 0;
}

@@ -588,7 +813,7 @@
return error;

nfs_invalidate_dircache(dir);
- /* this looks _funny_ doesn't it? But: nfs_proc_symlynk()
+ /* this looks _funny_ doesn't it? But: nfs_proc_symlink()
* only fills in sattr, not fattr. Thus nfs_fhget() cannot be
* called, it would be pointless, without a valid fattr
* argument. Other possibility: call nfs_proc_lookup()
@@ -623,7 +848,8 @@
return error;

nfs_invalidate_dircache(dir);
- inode->i_count++;
+ inode->i_count ++;
+ inode->i_nlink ++; /* no need to wait for nfs_refresh_inode() */
d_instantiate(dentry, inode);
return 0;
}
@@ -634,8 +860,15 @@
* different file handle for the same inode after a rename (e.g. when
* moving to a different directory). A fail-safe method to do so would
* be to look up old_dir/old_name, create a link to new_dir/new_name and
- * rename the old file using the silly_rename stuff. This way, the original
+ * rename the old file using the sillyrename stuff. This way, the original
* file in old_dir will go away when the last process iput()s the inode.
+ *
+ * FIXED.
+ *
+ * It actually works quite well. One needs to have the possibility
+ * for one .nfs#ino file in each directory the file ever gets moved or
+ * linked to
+ *
*/
static int nfs_rename(struct inode *old_dir, struct dentry *old_dentry,
struct inode *new_dir, struct dentry *new_dentry)
@@ -659,10 +892,22 @@
if (old_dentry->d_name.len > NFS_MAXNAMLEN || new_dentry->d_name.len > NFS_MAXNAMLEN)
return -ENAMETOOLONG;

- error = nfs_proc_rename(NFS_SERVER(old_dir),
- NFS_FH(old_dir), old_dentry->d_name.name,
- NFS_FH(new_dir), new_dentry->d_name.name);
+ if (new_dir != old_dir) {
+ error = nfs_sillyrename(old_dir, old_dentry);
+
+ if (error == -EBUSY) { /* tried to access .nfs#ino file */
+ return error;
+ } else if (error == 0) { /* did silly rename stuff */
+ error = nfs_link(old_dentry->d_inode,
+ new_dir, new_dentry);

+ return error;
+ }
+ /* no need for silly rename, proceed as usual */
+ }
+ error = nfs_proc_rename(NFS_SERVER(old_dir),
+ NFS_FH(old_dir), old_dentry->d_name.name,
+ NFS_FH(new_dir), new_dentry->d_name.name);
if (error)
return error;

@@ -737,3 +982,10 @@
} else
inode->i_op = NULL;
}
+
+/*
+ * Local variables:
+ * version-control: t
+ * kept-new-versions: 5
+ * End:
+ */
--- linux-2.1.51/fs/nfs/inode.c Thu Aug 14 02:37:48 1997
+++ linux-2.1.51-nfs-swap/fs/nfs/inode.c Thu Aug 21 20:04:28 1997
@@ -75,11 +75,23 @@
nfs_put_inode(struct inode * inode)
{
dprintk("NFS: put_inode(%x/%ld)\n", inode->i_dev, inode->i_ino);
+ /* this is called before de-crementing i_count, therefore == 1
+ */
+ if (NFS_RENAMED_DIRS(inode) && inode->i_count == 1) {
+ nfs_sillyrename_cleanup(inode);
+ }
}

/*
* This should do any silly-rename cleanups once we
* get silly-renaming working again..
+ *
+ *
+ * NOPE!
+ * It is MUCH easier to do the cleanup in nfs_put_inode(), because
+ * delete_inode() is only called when i_nlink is 0. And it cause nothing
+ * but havoc to try to fake the link-count to 0 while the .nfs#ino file
+ * still exists. cH
*/
static void
nfs_delete_inode(struct inode * inode)
@@ -454,6 +466,8 @@
init_nfs_fs(void)
{
#ifdef CONFIG_PROC_FS
+ rpc_register_sysctl();
+ rpc_proc_init();
rpc_proc_register(&nfs_rpcstat);
#endif
return register_filesystem(&nfs_fs_type);

--Multipart_Thu_Aug_21_22:59:39_1997-1
Content-Type: text/plain; charset=US-ASCII

Claus-Justus Heine
claus@momo.math.rwth-aachen.de
http://samuel.math.rwth-aachen.de/~LBFM/claus

PGP Public Key:
http://samuel.math.rwth-aachen.de/~LBFM/claus/claus-public-key.asc

Ftape - the Linux Floppy Tape Project
WWW : http://samuel.math.rwth-aachen.de/~LBFM/claus/ftape
Mailing-list: linux-tape@vger.rutgers.edu

--Multipart_Thu_Aug_21_22:59:39_1997-1--

--pgp-sign-Multipart_Thu_Aug_21_22:59:39_1997-1
Content-Type: application/pgp-signature
Content-Transfer-Encoding: 7bit

-----BEGIN PGP MESSAGE-----
Version: 2.6.3i
Comment: Processed by Mailcrypt 3.4, an Emacs/PGP interface

iQCVAwUBM/ysRto/s1zPE7AdAQGPTgP/WCr8ZdmxvUCzx0s4uR0tALans6Z8jMPI
kc/Ptv18H/qho39glAbaoysqGDoGyBlbUMqku8zJt1x2Ejc7uW2suUweP99eaWVe
+JEPG2QjVIyXeYPg9pS5omQ/FLeD05Svy7X1eX4h9x9VzpiI435kZmEauZ1+yrb3
wjP1nvfo16A=
=GgB4
-----END PGP MESSAGE-----

--pgp-sign-Multipart_Thu_Aug_21_22:59:39_1997-1--