Re: fsync on large files

Stephen C. Tweedie (sct@redhat.com)
Wed, 17 Feb 1999 17:56:39 GMT


Hi,

On Wed, 17 Feb 1999 09:43:57 -0800 (PST), Linus Torvalds
<torvalds@transmeta.com> said:

> Good. Then don't go around calling it ext2 any more. I don't want to have
> people even _wondering_ about the stability of the central Linux
> filesystem.

It _isn't_ called ext2. It is called ext3.

> I would also suggest that Stephen actually drop ext2
> altogether. There's just too much historical stuff in most
> filesystems - things like having "." and ".." in directories, even
> though Linux doesn't need them and they only complicate renaming and
> loopback mounting a lot.

Linus, this makes very little sense right now: there is an _urgent_
need for a journaled filesystem for Linux, and there is an urgent need
for a solution which is of production quality. Implementing a new
filesystem from scratch is hardly the way to achieve that. The
recovery time of ext2 on large filesystems is a real showstopper to
many potential applications.

> I think that if people are doing a new filesystem, it should be done
> like "ext2" was originally done: by designing a new one, rather than
> building more scaffolding on top of an old one.

Then please check what I'm actually trying to do. I am explicitly
_not_ designing a new filesystem.

The journaling implemented so far is a way of committing multiple
buffer updates to disk atomically, using some region of disk to hold a
journal of the in-progress updates. The functionality is designed to
be sufficient to journal any typical filesystem. It will also be
flexible enough to journal reiserfs and Ingo's LVM stuff (there are
several things we want to do with LVM which will require atomic
updates across multiple block devices to be stable).

The ext3 code is nothing more than a set of calls to demarkate the
start and end of complete, consistent filesystem operations.

> There's also a lot of code to handle concurrent writes etc, which
> can't happen any more.

The fact that it cannot happen is a problem for scalable IO to
database files: the only existing standard Unix API which makes any
sense at all in that environment is O_SYNC IO with threads
(ie. asynchronous IO but with synchronous completion), and concurrent
writes are a must-have in that case. If you are saying you will
reject a patch to restore that behaviour then we must be on different
planets. The atomic write behaviour is a by-product of fixing an
entirely different problem (concurrent truncate, for which making it
atomic does make a lot of sense).

--Stephen

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/