Re: [ANNOUNCE] Btrfs: a copy on write, snapshotting FS

From: Chris Mason
Date: Wed Jun 13 2007 - 13:00:55 EST

On Wed, Jun 13, 2007 at 12:14:40PM -0400, Albert Cahalan wrote:
> On 6/13/07, Chris Mason <chris.mason@xxxxxxxxxx> wrote:
> >On Wed, Jun 13, 2007 at 01:45:28AM -0400, Albert Cahalan wrote:
> >> The usual wishlist:
> >>
> >> * inode-to-pathnames mapping
> >
> >This one I'll code, it will help with inode link count verification. I
> >want to be able to detect at run time that an inode with a link count of
> >zero is still actually in a directory. So there will be back pointers
> >from the inode to the directory.
> Great, but fsck improvement wasn't on my mind. This is
> a desirable feature for the NFS server, and for regular users.
> Think about a backup program trying to maintain hard links.

Sure, it'll be there either way ;)

> >Also, the incremental backup code will be able to walk the btree to find
> >inodes that have changed, and the backpointers will help make a list of
> >file names that need to be rsync'd or whatever.
> >
> >> * a subvolume that is a single file (disk image, database, etc.)
> >
> >subvolumes can be made that have a single file in them, but they have to
> >be directories right now. Doing otherwise would complicate mounts and
> >other management tools (inside the btree, it doesn't really matter).
> Bummer. As I understand it, ZFS provides this. :-)

Grin, when the pain of typing cd subvol is btrfs' biggest worry, I'll be
doing very well.

> >> * directory indexes to better support Wine and Samba
> >> * secure delete via destruction of per-file or per-block random crypto
> >keys
> >
> >I'd rather keep secure delete as a userland problem (or a layered FS
> >problem). When you take backups and other copies of the file into
> >account, it's a bigger problem than btrfs wants to tackle right now.
> It can't be a userland problem if you allow disk blocks to move.
> Volume resizing, logging/journalling, etc. -- they combine to make
> the userland solution essentially impossible. (one could wipe the
> whole partition, or maybe fill ALL space on the volume)

Right about here is where I would insert a long story about ecryptfs, or
encryption solutions that happen all in userland. At any rate, it is
outside the scope of v1.0, even though I definitely agree it is an
important problem for some people.

> >> * atomic creation of copy-on-write directory trees
> >
> >Do you mean something more fine grained than the current snapshotting
> >system?
> I believe so. Example: I have a linux-2.6 directory. It's not
> a mount point or anything special like that. I want to copy
> it to a new directory called wip, without actually copying
> all the blocks. To all the normal POSIX API stuff, this copy
> should look like the result of "cp -a", not hard links.

This would be a snapshot, which has to be done on a subvolume right now.
It is not as nice as being able to pick a random directory, but I've
only been able to get this far by limiting the feature scope
significantly. What I did do was make subvolumes very cheap...just make
a bunch of them.

Keep in mind that if you implement a cow directory tree without a
snapshot, and you don't want to duplicate any blocks in the cow, you're
going to have fun with inode numbers.


To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at