Re: (reiserfs) Re: New Linux 2.5 - 2.6 TODO (Alan Cox suggests delaying

From: Steve Lord (lord@sgi.com)
Date: Mon Jun 05 2000 - 16:38:56 EST


Andreas Dilger wrote:
>
> Chris Mason writes:
>
> > The ext3 jfs might also need small changes to wok cleanly with any
> > allocate on flush systems (XFS, reiserfs soon).
>
> Aren't there also changes required to the VFS for allocate-on-flush? I
> think this would benefit the performance of all filesystems, but of course
> XFS much more than ext2, because XFS was designed from the start to work
> with this.

Yes there are vfs level changes - XFS has currently added a number of
new interfaces
to let us do extent based allocation more efficiently as well as
allocating real
disk blocks for delayed allocate data.

Allocate on flush is not really what XFS does though - you have to
reserve disk
space from within the user system call, and then decide which particular
disk
space you are going to use sometime later before the data can be written
out to
disk.

>
> > Very possible, regardless of how you log the data, the api steps need to
> > make it all work are almost the same. From what I've read about xfs, it
> > could fit into the ext3 jfs as well. I'm assuming GFS will need the most
> > complicated journal layer...
>
> It is my understanding also that XFS and ext3 JFS are basically working in
> the same way. The real question would be if the XFS developers want to
> re-work their code to use ext3 JFS and/or if JFS is sufficiently abstract
> to be a plug-in replacement for the XFS journalling code. If it isn't a
> proper superset of what XFS does, it would be an indication that JFS needs
> a bit of rework. AFAIK, the only real feature that JFS doesn't yet implement
> is allowing journalling of multiple filesystems to the same journal, although
> I know this is on Stephen's TODO list.

Has anyone proposing this type of thing looked at the XFS transaction
code? It
is one huge feedback loop.

  o Each transaction reserves the amount of log space it might
potentially use,
    if there is insufficient space available then the oldest modified
metadata
    in memory is pushed on so that log space can be reused. An ordered
list
    (known as the active item list or AIL) is maintained to represent
meta data
    which is logged by dirty in memory.

  o metadata items are locked as they are added to a transaction.

  o we actually modify the in memory data structures here!

  o committing a transaction copies the modified portions of metadata
into an incore
    log, pins the meta-data in memory, and unlocks it so other threads
can start
    using it.

  o completion of log buffer I/O unpins meta data items so they can be
flushed
    to disk, and places them at the head of the AIL.

  o completion of meta-data I/O moves the tail of the log and removes
items from the
    AIL.

  o logging metadata which is already in the AIL means that it gets
moved in the AIL,
    potentially moving the tail of the log.

It might be easier to start with XFS's journalling and add what other
filesystems are
missing except:

  o we do not allow nested transactions

  o we never cancel a transaction once we have started modifying
metadata

>
> Cheers, Andreas

Steve

------------------------------------------------------------------------------
Steve Lord voice: +1-651-683-5291
Silicon Graphics Inc
655F Lone Oak Drive email: lord@sgi.com
Eagan, MN, 55121, USA
------------------------------------------------------------------------------

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Wed Jun 07 2000 - 21:00:23 EST