Re: [PATCH v4 3/3] fs, xfs: introduce MAP_DIRECT for creating block-map-sealed file ranges

From: Dan Williams
Date: Tue Aug 15 2017 - 13:11:34 EST


On Tue, Aug 15, 2017 at 2:18 AM, Kirill A. Shutemov
<kirill@xxxxxxxxxxxxx> wrote:
> On Mon, Aug 14, 2017 at 11:12:22PM -0700, Dan Williams wrote:
>> MAP_DIRECT is an mmap(2) flag with the following semantics:
>>
>> MAP_DIRECT
>> In addition to this mapping having MAP_SHARED semantics, successful
>> faults in this range may assume that the block map (logical-file-offset
>> to physical memory address) is pinned for the lifetime of the mapping.
>> Successful MAP_DIRECT faults establish mappings that bypass any kernel
>> indirections like the page-cache. All updates are carried directly
>> through to the underlying file physical blocks (modulo cpu cache
>> effects).
>>
>> ETXTBSY is returned on attempts to change the block map (allocate blocks
>> / convert unwritten extents / break shared extents) in the mapped range.
>> Some filesystems may extend these same restrictions outside the mapped
>> range and return ETXTBSY to any file operations that might mutate the
>> block map. MAP_DIRECT faults may fail with a SIGSEGV if the filesystem
>> needs to write the block map to satisfy the fault. For example, if the
>> mapping was established over a hole in a sparse file.
>
> We had issues before with user-imposed ETXTBSY. See MAP_DENYWRITE.
>
> Are we sure it won't a source of denial-of-service attacks?

I believe MAP_DENYWRITE allowed any application with read access to be
able to deny writes which is obviously problematic. MAP_DIRECT is
different. You need write access to the file so you can already
destroy data that another application might depend on, and this only
blocks allocation and reflink.

However, I'm not opposed to adding more safety around this. I think we
can address this concern with an fcntl seal as Dave suggests, but the
seal only applies to the 'struct file' instance and only gates whether
MAP_DIRECT is allowed on that file. The act of setting
F_MAY_SEAL_IOMAP requires CAP_IMMUTABLE, but MAP_DIRECT does not. This
allows the 'permission to mmap(MAP_DIRECT)' to be passed around with
an open file descriptor.

>
>> The kernel ignores attempts to mark a MAP_DIRECT mapping MAP_PRIVATE and
>> will silently fall back to MAP_SHARED semantics.
>
> Hm.. Any reason for this strage behaviour? Looks just broken to me.
>
>>
>> ERRORS
>> EACCES A MAP_DIRECT mapping was requested and PROT_WRITE was not set.
>>
>> EINVAL MAP_ANONYMOUS was specified with MAP_DIRECT.
>>
>> EOPNOTSUPP The filesystem explicitly does not support the flag
>>
>> SIGSEGV Attempted to write a MAP_DIRECT mapping at a file offset that
>> might require block-map updates.
>
> I think it should be SIGBUS.

Ok, that does seem to fit this definition from the mmap(2) man page:

SIGBUS Attempted access to a portion of the buffer that does not
correspond to the file (for example, beyond the end of the file,
including the case where another process has truncated the file).