Re: XFS status update for January 2011

From: Dave Chinner
Date: Mon Feb 14 2011 - 18:55:26 EST


On Mon, Feb 14, 2011 at 08:20:18AM -0700, tm@xxxxxx wrote:
> Hi Dave,
> > On Mon, Feb 14, 2011 at 10:17:26AM +0800, Tao Ma wrote:
> >> Hi Christoph,
> >> On 02/14/2011 02:42 AM, Christoph Hellwig wrote:
> >> >On the 4th of January we saw the release of Linux 2.6.37, which
> >> contains a
> >> >large XFS update:
> >> >
> >> > 67 files changed, 1424 insertions(+), 1524 deletions(-)
> >> >
> >> >User visible changes are the new XFS_IOC_ZERO_RANGE ioctl which allows
> >> >to convert already allocated space into unwritten extents that return
> >> >zeros on a read,
> >> would you mind describing some scenario that this ioctl can be used. I
> >> am
> >> just wondering whether ocfs2 can implement it as well.
> >
> > Zeroing a file without doing IO or having to punch out the blocks
> > already allocated to the file.
> >
> > In this case, we had a couple of different people in cloud storage
> > land asking for such functionality to optimise record deletion
> > be avoiding disruption of their preallocated file layouts as a
> > punch-then-preallocate operation does.
> Thanks for the info. yeah, ocfs2 is also used to host images in some cloud
> computing environment. So It looks helpful for us too.

Just to be clear, this optimisation isn't relevant for hosting VM
images in a cloud compute environment - this was added for
optimising the back end of distributed storage applications that
hold tens of millions of records and tens of TB of data per back end
storage host.

Hosting VM images is largely static, especially if you are
preallocating them - they never, ever get punched. Even if you are
using thin provisioning semantics and punching TRIMmed ranges, you
aren't converting the TRIMmed ranges back to preallocated state so
you wouldn't be using this interface. Hence I don't see this as
something that you would use in such an environment.

The distributed storage applications that this was added for
required atomic record deletes from the back end and the fastest and
safest way to do that was to turn the record being deleted back into
unwritten extents. This allows that operation to be done atomically
by the filesystem whilst providing simple recovery semantics to the
application. The XFS_IOC_ZERO_RANGE ioctl simply prevents the
fragmentation that this punch-then-preallocate operation was causing
and allows the back end to scale to much larger record stores...

Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/