Re: [Ext2-devel] [PATCH 1/2] ext2/3: Support 2^32-1 blocks(Kernel)
From: Andreas Dilger
Date: Thu Mar 16 2006 - 17:57:03 EST
On Mar 16, 2006 16:26 -0500, Theodore Ts'o wrote:
> On Thu, Mar 16, 2006 at 11:35:49AM -0700, Andreas Dilger wrote:
> > Beyond that, we need a format change and may as well have something
> > like extents, but even extents still need to allow a larger i_blocks,
>
> As a side note, one of the things that we've been talking about doing
> is bundling a number of small changes together into a single INCOMPAT
> flag. Changing i_blocks so its units are in blocks rather than
> 512-byte sectors was one such change.
> Another was guaranteeing that for large inodes (> 128 bytes) that at
> least some number of bytes (probably on the order of 32 bytes or so)
> would be reserved for things like the high resolution portion of
> ctime/mtime/atime, high watermark, and other inode extensions. (One
> of the problems with doing high res timestamps right is how to handle
> the case where you can't make room for the high res timestamps, due to
> too much space being taken up by extended attributes. The make(1)
> program gets really confused unless all files are either using or not
> using high res timestamps.)
>
> The idea was to do a quick easy strike of all of the ideas which could
> be implemented quickly, and perhaps try to get them done before RHEL 5
> snapshots. Even if RHEL5 doesn't enable use of these features by
> default, having it supported by RHEL5 would be extremely convenient.
While I agree with that in theory, in practise we never end up doing
this and it just ends up delaying the acceptance of the trivial patches.
It may also be a burden later on when some of the features that could
be e.g. ROCOMPAT are bundled with an INCOMPAT change and we then
make the filesystem gratuitously INCOMPAT.
In the end, I don't think having a couple of separate flags is any more
effort than having a single one. As yet we only have about 6 of each 32
feature bits used, and if we get close to running out we can make an
EXT3_FEATURE_{,RO,IN}COMPAT_NEXT_WORD flag to continue it on.
Note that I'm not against this in practise, but I wouldn't hold up any
feature for this reason. How long has large i_blocks been pending,
and usecond timestamps? Many years already, even though they are trivial
to implement, so I'm hesitant to tie them together and delay further.
I think i_blocks can be considered an ROCOMPAT feature, and the large
inode reservation for usecond timestamps could be COMPAT I think (since
an unsupporting kernel would still update all the timestamps consistently
even if the useconds on disk would be some constant instead of 0.
> > so that patch would be useful in any case... though it needs some
> > cleanup to remove all users of i_frag and i_faddr (which have never
> > ever been used).
>
> One of the things which we need to consider is whether we think we
> will never support tail packing or other forms of fragments, which is
> related to whether we think we will ever support large blocks (i.e.,
> 32k, 64k, and up). If we do, we might want to keep those fields
> around.
I thought the long-term plan for small files was to just store them
in an EA? That way, we can efficiently pack them inside the inode
or up to blocksize (space willing) without any usage of inode fields
(maybe with a flag to indicate that there is such a fragment to avoid
gratuitous EA searching). This would be a net win on performance since
it avoids an IO for the in-inode case at least.
I think testing with reiserfs showed that tail packing was a net loss in
most cases, since basically every benchmark I've ever seen with reiserfs
disables tail packing or suffers. For space constrained systems (if
there ever exists such a thing again ;-) it would probably be better to
go to compressed files.
Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/