0-nlink inodes can lead to dirty filesystems being marked "clean"

Kevin Buhr (buhr@stat.wisc.edu)
26 Jun 1998 13:36:50 -0500


This affects 2.1.106 and probably others.

Consider a 0-nlink inode that's still in use by some process.

As far as I can see, the current semantics are as follows. The
filesystem on which the inode resides can be remounted read-only. For
"ext2", this means the filesystem will be marked "clean". However,
the 0-nlink inode and its blocks are still marked "in-use" in the
bitmaps. The filesystem will only *truly* be clean (i.e., not in need
of an "fsck") if the process exits; then, the implied "iput" will
deallocate the inode and its blocks in the associated bitmaps.

This means that, once a filesystem is remounted read-only with in-use
0-nlink inodes, it will be marked "clean" even if the system crashes
or (as in the case of "/sbin/init"), the process never actually exits.
Of course, in either case, this "clean" filesystem actually needs an
"fsck".

For a practical example, rightly or wrongly, Debian's "/sbin/init" is
dynamically linked. If the C library is upgraded, "init" will be
stuck using the 0-nlink inode of the old C library. When the root
filesystem is remounted read-only, it will be marked clean. Since the
"reboot" system call never returns, "init" never exits, and the root
filesystem is marked clean when it needs an "fsck" to reclaim the
space allocated to the old C library.

Assuming I've got it right, I'm not precisely sure what the best
solution is. There seem to be a few:

1. When a filesystem is remounted read-only, all 0-nlink inodes
still in use are linked in somewhere (presumably "lost+found").

This has the disadvantage of introducing a previously
user-mode-only concept into the kernel. It also might be too
subtle: folks might simply not realize that cruft was
accumulating in "/lost+found" if they don't see "fsck" putting
anything in there.

Do any other Unix-like kernels pull stunts like this?

2. When a filesystem is remounted read-only, it is *not* marked
clean if it has in-use, 0-nlink inodes. If a following "umount"
or "remount" finds no such inodes, *then* the filesystem can be
marked clean.

This would mean that, in the "init" scenario described above, the
filesystem would be marked dirty, forcing an "fsck" on the next
reboot.

Implementation would require changes in the VFS layer (to
communicate to the "remount" functions that the filesystem is
still dirty) and in various filesystem-specific functions,
however.

3. We refuse to remount a filesystem read-only if it has in-use,
0-nlink inodes; that is, we have "fs_may_remount_ro" return 0 so
that the remount request returns EBUSY.

In practice, this should act much like "2"; the kernel's
"do_remount" function will do everything short of actually
calling the filesystem's "remount" function, including syncing,
invalidating inodes, and so on. During shutdown, the root
filesystem remount will fail, but it will be "almost" clean, so
"fsck" can easily fix the mess on the next reboot.

This also has the advantage of being trivial to implement, so
I've included a patch below.

4. Link "init" with static libraries.

This isn't a bad idea in any case, but it seems plain *wrong* to
mark a filesystem "clean" when a crash would leave it dirty.

Anyway, scheme "3" seems to work for me, but maybe it breaks something
I haven't thought of.

Any thoughts?

Kevin <buhr@stat.wisc.edu>

* * *

Patch against 2.1.106 for scheme "3":

--- linux/fs/inode.c~ Fri May 8 19:54:39 1998
+++ linux/fs/inode.c Fri Jun 26 16:36:50 1998
@@ -753,7 +753,9 @@
{
struct file *file;

- /* Check that no files are currently opened for writing. */
+ /* Check that no files are currently opened for writing
+ * and that no inodes w/ nlink == 0 are in use
+ */
for (file = inuse_filps; file; file = file->f_next) {
struct inode *inode;
if (!file->f_dentry)
@@ -763,6 +765,10 @@
continue;
if (S_ISREG(inode->i_mode) && file->f_mode & FMODE_WRITE)
return 0;
+ if (!inode->i_nlink) {
+ printk(KERN_DEBUG "fs_may_remount_ro: in-use inode %d had i_nlink == 0\n", inode->i_ino);
+ return 0;
+ }
}
return 1; /* Tis' cool bro. */
}

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu