Re: Starting a grad project that may change kernel VFS. Early research

From: Bryan Donlan
Date: Mon Aug 24 2009 - 21:07:32 EST

On Mon, Aug 24, 2009 at 7:54 PM, Jeff Shanab<jshanab@xxxxxxxxxxxxx> wrote:
> Title: "Pay it forward patch set"
> Goal: Desire to change the dentry and inode functionality so commands
> like du -s appear to have greatly improved performance.
> How: TBD? 2 phase ubdate walking up the tree to root.
>   Prior to actually starting my Grad Project in Computer science, I am
> taking 1 semester to do research for it at the recommendation of my
> advisory.  I need to of course make sure it doesn't already exist.  It
> may be that all the changes end up in a file system and the kernel will
> be left alone, just one of the things I want help determining.
> 1) First question, where to put this functionality?
>    I originally thought to put my functionality in the VFS so that all
> mounted file systems could share it, but after reading fs.h, and
> inode.c, it looks like the VFS is purely an abstract interface and
> functionality at that level may not be wanted? Also I guess certain file
> systems may not have needed on disk structures to save the info (ie
> VFAT,NFS, etc)

VFS has a lot of generic functionality that filesystems can opt into -
but see below about your specific proposals...

> 2) Second Question. The two part idea.
>    I was thinking that a good way to handle this is that it starts with
> a file change in a directory. The directory entry contains a sum already
> for itself and all the subdirs and an adjustment is made immediately to
> that, it should be in the cache. Then we queue up the change to be sent
> to the parent(s?). These queued up events should be a low priority at a
> more human time like 1 second. If a large number of changes come to a
> directory, multiple adjustments hit the queue with the same (directory
> name, inode #?) and early ones are thrown out. So levels above would see
> at most a 1 per second low priority update.

As I understand it, you want to tag each directory with the total size
of its contents. There are a few problems with this:
1) A metadata change is required for a filesystem to use this. It
would be prohibitively expensive to cache all directories in memory to
remember their sizes, and we can't just traverse a directory and all
of its contents to find its disk space usage just because someone
touched it. So the size has to be remembered on disk.
2) Hard links break this scheme rather badly. Consider if /foo/x is
hardlinked to /bar/x. Then something modifies /bar/x. The kernel
cannot find all other hardlinks to /bar/x, so /foo's disk usage
estimate is not updated. Moreover /'s disk space usage would have
twice the actual size used by /{foo,bar}/x.

You can't just call it a rough estimate to get around 2), as the error
can build up without bounds, until you have directories apparently
taking 10x the size of your actual hard disk. That said, for
filesystems without hardlinks this is doable, but most Linux
filesystems support hardlinks. Heck, even NTFS supports hardlinks. So
it's unlikely to be useful in Linux...

>    I have a second set of changes I am considering and I think would
> fit more completely in a file system, but I bring them up here in case
> it influences the above.
> title: "User Metadata" aka "pet peeve reduction"
>    I would like to maintain a few classifications of metadata, most
> optional and configurable.
[snip details]

This is already supported through user xattrs. It just needs more
application support (good luck getting flash to use them for temp
files though ;)
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at