Tux3 Report: One less feature

From: Daniel Phillips
Date: Tue Sep 09 2008 - 18:30:29 EST


Last week, the subvolumes feature was dropped from Tux3. I thought it
would be worth explaining why, because it says something about the Tux3
design philosophy and the direction I think we ought to be headed.

A subvolume is a separately mountable filesystem that coexists with
other subvolumes in a single, physical volume. For a tree-structured
filesystem, which is to say, nearly every modern filesystem, having
subvolumes just requires adding more tree roots. Nothing could be
simpler, right?

Well, almost. All subvolumes allocate from the same free space pool,
and indeed the idea of unifying allocation is the main argument for
having subvolumes. Otherwise why not just have separate volumes?

Since subvolumes cost very little to implement and apparently are
useful, adding a volume table to Tux3 was an easy call:


Code was duly written to manage the volume table, about 150 lines. So
far, so good. Then a fly flew into the ointment. What about fsync?

Each fsync requires the disk image of the allocation map to be up to
date and consistent with the synced filesystem image. But the
allocation map is shared by all subvolumes, so what do we do, sync
all of them? Or design the allocation subsystem so that it can be
separately synced per subvolume?

Worried about performance artifacts from the first approach, I
investigated the second:


Solution number four in that post is maybe the most efficient and least
invasive. There is just one thing wrong with it: it describes exactly
what logical volume managers already do. And why are we incorporating
a volume manager into Tux3 to implement this feature when the only
argument for having the feature is to share the allocation space? And
if the most efficient way to share the allocation space is to act like
a volume manager, then why not just use a volume manager?

It is not that it would be hard to implement the subvolume feature by
any of the methods I described. It is just that it feels wrong from a
philosophical standpoint. So after fretting about this for a few days
I decided to drop this questionable feature. If that means Tux3 has to
suffer feelings of inadequacy compared to ZFS, then so be it. Tux3 is
going to rely on a separate volume manager and that is that, unless
somebody comes up with a compelling reason why an efficient layering
cannot be achieved. (Note: this conviction relies partly on the
expectation that the existing LVM will be improved to be more capable,
see the nascent LVM3 design work.)

This is not in any way a swipe at Btrfs, which has subvolumes and does
integrate a number of volume manager features, as ZFS does. I think
that is the correct decision for that project. If the goal is to match
ZFS feature for feature, then be sure to cover them all, there are very
good logistical reasons for doing that. But I do not see the blurring
of the traditional layering between filesystem and block device as a
good or necessary thing. One could say that ZFS already suffers
negative effects from that design approach in that the majority of open
bugs they have seem to be related to volume management rather than the
filesystem proper.

I just think it is important for a filesystem to be as simple as
possible, if it aims to be reliable.


To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/