Re: [PATCH 2/2] xfs: add FALLOC_FL_ZERO_RANGE to fallocate

From: Dave Chinner
Date: Tue Jun 12 2012 - 22:16:14 EST


On Tue, Jun 12, 2012 at 05:36:04PM +0200, Paolo Bonzini wrote:
> This is implemented in the same way as the other fallocate modes. All of
> them map to XFS ioctls that are implemented by xfs_change_file_space.
>
> However, we need to cap the length to the inode size if the user requested
> FALLOC_FL_KEEP_SIZE.

That's done on purpose. fallocate() explicitly allows preallocation
beyond EOF and that's what the FALLOC_FL_KEEP_SIZE flag is for - to
allow both offset and offset+len to lie beyond the current inode
size and have the preallocation succeed without changing the file
size.

This is so that we can prevent fragmentation of slow growing
append-only files like log files - we can preallocate way beyond EOF
without changing EOF so as the file grows over days and weeks it
does not fragment.

Similarly, hole punch needs to be able to punch out such
preallocated extents beyond EOF if requested, and it definitely must
not change EOF. So capping/erroring out when offset/offset+len is
definitely the wrong thing to do when FALLOC_FL_KEEP_SIZE is set.

> Cc: Dave Chinner <dchinner@xxxxxxxxxx>
> Signed-off-by: Paolo Bonzini <pbonzini@xxxxxxxxxx>
> ---
> fs/xfs/xfs_file.c | 36 ++++++++++++++++++++++++------------
> 1 files changed, 24 insertions(+), 12 deletions(-)
>
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index 9f7ec15..c811cf9 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -818,33 +818,45 @@ xfs_file_fallocate(
> loff_t new_size = 0;
> xfs_flock64_t bf;
> xfs_inode_t *ip = XFS_I(inode);
> - int cmd = XFS_IOC_RESVSP;
> + int cmd;
> int attr_flags = XFS_ATTR_NOLOCK;
>
> - if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE))
> + if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE |
> + FALLOC_FL_ZERO_RANGE))
> return -EOPNOTSUPP;
>
> + BUG_ON((mode & FALLOC_FL_PUNCH_HOLE) && (mode & FALLOC_FL_ZERO_RANGE));

Never put BUG_ON() or BUG() in XFS code that can return an error.
Return EINVAL if we chose not to support it, and if it's really
something we consider bad, emit a warning to syslog (i.e.
xfs_warn()) and potentially add a ASSERT() case so that debug
kernels will trip over it. Nobody should be panicing a production
system just because a user supplied a set of incorrect syscall
paramters....

Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/