Re: [PATCH v6 14/15] xfs: prepare xfs_break_layouts() for another layout type

From: Dan Williams
Date: Mon Mar 19 2018 - 16:52:55 EST


On Mon, Mar 19, 2018 at 10:45 AM, Darrick J. Wong
<darrick.wong@xxxxxxxxxx> wrote:
> On Thu, Mar 15, 2018 at 08:52:45AM -0700, Dan Williams wrote:
>> When xfs is operating as the back-end of a pNFS block server, it prevents
>> collisions between local and remote operations by requiring a lease to
>> be held for remotely accessed blocks. Local filesystem operations break
>> those leases before writing or mutating the extent map of the file.
>>
>> A similar mechanism is needed to prevent operations on pinned dax
>> mappings, like device-DMA, from colliding with extent unmap operations.
>>
>> BREAK_WRITE and BREAK_TRUNCATE are introduced as two distinct levels of
>> layout breaking.
>>
>> Layouts are broken in the BREAK_WRITE case to ensure that layout-holders
>> do not collide with local writes. Additionally, layouts are broken in
>> the BREAK_TRUNCATE case to make sure the layout-holder has a consistent
>> view of the file's extent map. While BREAK_WRITE breaks can be satisfied
>> be recalling FL_LAYOUT leases, BREAK_TRUNCATE breaks additionally
>> require waiting for busy dax-pages to go idle.
>>
>> Cc: "Darrick J. Wong" <darrick.wong@xxxxxxxxxx>
>> Cc: Ross Zwisler <ross.zwisler@xxxxxxxxxxxxxxx>
>> Reported-by: Dave Chinner <david@xxxxxxxxxxxxx>
>> Reported-by: Christoph Hellwig <hch@xxxxxx>
>> Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx>
>> ---
>> fs/xfs/xfs_file.c | 23 +++++++++++++++++------
>> fs/xfs/xfs_inode.h | 18 ++++++++++++++++--
>> fs/xfs/xfs_ioctl.c | 2 +-
>> fs/xfs/xfs_iops.c | 5 +++--
>> 4 files changed, 37 insertions(+), 11 deletions(-)
>>
>> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
>> index 5742d395a4e4..399c5221f101 100644
>> --- a/fs/xfs/xfs_file.c
>> +++ b/fs/xfs/xfs_file.c
>> @@ -352,7 +352,7 @@ xfs_file_aio_write_checks(
>>
>> xfs_ilock(ip, XFS_MMAPLOCK_EXCL);
>> *iolock |= XFS_MMAPLOCK_EXCL;
>> - error = xfs_break_layouts(inode, iolock);
>> + error = xfs_break_layouts(inode, iolock, BREAK_WRITE);
>> if (error) {
>> *iolock &= ~XFS_MMAPLOCK_EXCL;
>> xfs_iunlock(ip, XFS_MMAPLOCK_EXCL);
>> @@ -762,7 +762,8 @@ xfs_file_write_iter(
>> int
>> xfs_break_layouts(
>> struct inode *inode,
>> - uint *iolock)
>> + uint *iolock,
>> + enum layout_break_reason reason)
>> {
>> struct xfs_inode *ip = XFS_I(inode);
>> int ret;
>> @@ -770,9 +771,19 @@ xfs_break_layouts(
>> ASSERT(xfs_isilocked(ip, XFS_IOLOCK_SHARED | XFS_IOLOCK_EXCL
>> | XFS_MMAPLOCK_EXCL));
>>
>> - ret = xfs_break_leased_layouts(inode, iolock);
>> - if (ret > 0)
>> - ret = 0;
>> + switch (reason) {
>> + case BREAK_TRUNCATE:
>> + /* fall through */
>> + case BREAK_WRITE:
>> + ret = xfs_break_leased_layouts(inode, iolock);
>> + if (ret > 0)
>> + ret = 0;
>> + break;
>> + default:
>> + ret = -EINVAL;
>> + break;
>> + }
>> +
>> return ret;
>> }
>>
>> @@ -802,7 +813,7 @@ xfs_file_fallocate(
>> return -EOPNOTSUPP;
>>
>> xfs_ilock(ip, iolock);
>> - error = xfs_break_layouts(inode, &iolock);
>> + error = xfs_break_layouts(inode, &iolock, BREAK_TRUNCATE);
>> if (error)
>> goto out_unlock;
>>
>> diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
>> index 74c63f3a720f..1a66c7afcf45 100644
>> --- a/fs/xfs/xfs_inode.h
>> +++ b/fs/xfs/xfs_inode.h
>> @@ -379,6 +379,20 @@ static inline void xfs_ifunlock(struct xfs_inode *ip)
>> >> XFS_ILOCK_SHIFT)
>>
>> /*
>> + * Layouts are broken in the BREAK_WRITE case to ensure that
>> + * layout-holders do not collide with local writes. Additionally,
>> + * layouts are broken in the BREAK_TRUNCATE case to make sure the
>> + * layout-holder has a consistent view of the file's extent map. While
>> + * BREAK_WRITE breaks can be satisfied be recalling FL_LAYOUT leases,
>> + * BREAK_TRUNCATE breaks additionally require waiting for busy dax-pages
>> + * to go idle.
>> + */
>> +enum layout_break_reason {
>> + BREAK_WRITE,
>
> I wonder if this belongs in a system header file since the same general
> principle is going to apply to ext4 and others? If this really is a
> special xfs thing, then please use the "XFS_" prefix.
>
>> + BREAK_TRUNCATE,
>
> The "truncate" part of the name misled me for a bit. That operation is
> really about "make sure there are no busy dax pages because we're about
> to change some file pmem^Wblock mappings", right? Truncation, hole
> punching, reflinking, and zero_range (because it punches the range and
> reallocates unwritten extents) all have to wait for busy dax pages to
> become free before they can begin unmapping blocks. So this isn't just
> about truncation specifically; can I suggest "BREAK_UNMAPI" ?

I like that name better.

It isn't clear that this needs to go to a system header because I
expect ext4 will only support BREAK_UNMAPI for dax and not care about
BREAK_WRITE since that is an XFS-only / pNFS-only distinction in
current code. However, I'll let Jan chime in if this guess is
incorrect and ext4 plans to add pNFS block-server support.

> (I also haven't figured out whether the same requirements apply to
> things that *add* extent maps to the file, but I have to run to a
> meeting in a few so wanted to get this email sent.)

Add should not be a problem because DMA only ever starts after extents
are added, i.e. there's no risk of DMA starting to invalid address
because fs/dax.c arranges for extents to be allocated at fault time.