Re: [BK][PATCH] Reiser4, will double Linux FS performance, pleaseapply

From: Nikita Danilov (Nikita@Namesys.COM)
Date: Fri Nov 01 2002 - 05:59:22 EST


Linus Torvalds writes:
> In article <3DC1D9D0.684326AC@digeo.com>,
> Andrew Morton <akpm@digeo.com> wrote:
> >
> >But it should be done based on "feature equivalency". By default,
> >ext3 uses ordered data writes. Data is written to disk before
> >the metadata to which that data refers is committed to journal.
>
> Andrew, that's not necessarily a _good_ feature.
>
> Journaling is _not_ a great idea. There are other approaches to
> handling atomicity than journaling, like phase trees, that give
> equivalent atomicity guarantees without having to write out extra stuff,
> or even impose a very strict ordering between data and meta-data.
>
> I didn't read the reiser papers yet, but from Hans' description it
> sounds like reiser4 gives all the guarantees ext3 does with ordered
> writes, _and_ they get good performance.

Reiser4 uses "wandered logs" that are similar to phase-tree or things
that are called "shadows" or "side files" in the data bases world.

Idea is that most blocks with file system data (and meta-data) are
accessed by first reading their block number from some other "parent"
block (like indirect block in ext2). Now, if block is modified during
transaction *and* its parent block is also dirty, one can avoid writing
copy of block into the journal by:

 - allocating new block number ("wandered block")

 - storing modified content in the newly allocated wandered block

 - updating parent block to point to the new location

Old block is now unreachable from the parent, and if its block number is
stored somewhere in the journal one can use it for recovery.

Reiser4 balanced tree lends itself nicely into this model, of course.

Usual problem with such techniques is that they tend to destroy packing
due to frequent relocations. But in reality this can be used exactly for
the purpose of improving packing, if allocation of wandered blocks if
delayed for sufficiently long time (like until transaction commit).

>
> (In fact, from the description it sounds like it gives _more_ guarantees
> than even ext3 with ordered writes, in that it gives transactional
> behaviour for arbitrary writes. Maybe I should read the paper).
>
> Linus

Nikita.

> -
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Thu Nov 07 2002 - 22:00:20 EST