Re: [RFC] new ->perform_write fop

From: Steven Whitehouse
Date: Fri May 21 2010 - 05:38:06 EST


Hi,

On Fri, 2010-05-21 at 09:05 +1000, Dave Chinner wrote:
> On Thu, May 20, 2010 at 10:12:32PM +0200, Jan Kara wrote:
> > On Thu 20-05-10 09:50:54, Dave Chinner wrote:
> > > On Wed, May 19, 2010 at 01:09:12AM +1000, Nick Piggin wrote:
> > > > On Tue, May 18, 2010 at 10:27:14PM +1000, Dave Chinner wrote:
> > > >
> > > > Is it really going to be a problem to implement block hole punching
> > > > in ext4 and gfs2?
> > >
[snip]
> > b) E.g. ext4 can do even without hole punching. It can allocate extent
> > as 'unwritten' and when something during the write fails, it just
> > leaves the extent allocated and the 'unwritten' flag makes sure that
> > any read will see zeros. I suppose that other filesystems that care
> > about multipage writes are able to do similar things (e.g. btrfs can
> > do the same as far as I remember, I'm not sure about gfs2).
>
> Allocating multipage writes as unwritten extents turns off delayed
> allocation and hence we'd lose all the benefits that this gives...

It should be possible to implement hole punching in GFS2 I think. The
main issue is locking order of resource groups. We have on our todo list
a rewrite of the truncate/delete code which is currently used to
deallocate data blocks and metadata tree blocks. The current algorithm
is a rather inefficient recursive scanning of the tree which is done
multiple times depending on the tree height.

Adapting that to punch holes should be possible without too much effort
if we need to do that. We do need to allow for the possibility that such
a deallocation might have to be split into multiple transactions
depending on the amount of metadata involved (for large files, this
could be larger than the size of the log for example). Currently the
code will split up truncates into multiple transactions which allows the
deallocation to be restartable from any transaction boundary.

GFS2 does not have any way to mark unwritten extents, so we cannot do
delayed allocation or implement an efficient fallocate. We can do better
performance-wise than just dd'ing zeros to a file for fallocate, but
we'll never be able to match a fs that can mark extents unwritten in
performance terms,

Steve.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/