Re: [PATCH v3 resend 1/2] mm: Add an F_SEAL_FUTURE_WRITE seal to memfd

From: Andy Lutomirski
Date: Fri Nov 09 2018 - 17:38:07 EST




> On Nov 9, 2018, at 2:20 PM, Daniel Colascione <dancol@xxxxxxxxxx> wrote:
>
>> On Fri, Nov 9, 2018 at 1:06 PM, Jann Horn <jannh@xxxxxxxxxx> wrote:
>>
>> +linux-api for API addition
>> +hughd as FYI since this is somewhat related to mm/shmem
>>
>> On Fri, Nov 9, 2018 at 9:46 PM Joel Fernandes (Google)
>> <joel@xxxxxxxxxxxxxxxxx> wrote:
>>> Android uses ashmem for sharing memory regions. We are looking forward
>>> to migrating all usecases of ashmem to memfd so that we can possibly
>>> remove the ashmem driver in the future from staging while also
>>> benefiting from using memfd and contributing to it. Note staging drivers
>>> are also not ABI and generally can be removed at anytime.
>>>
>>> One of the main usecases Android has is the ability to create a region
>>> and mmap it as writeable, then add protection against making any
>>> "future" writes while keeping the existing already mmap'ed
>>> writeable-region active. This allows us to implement a usecase where
>>> receivers of the shared memory buffer can get a read-only view, while
>>> the sender continues to write to the buffer.
>>> See CursorWindow documentation in Android for more details:
>>> https://developer.android.com/reference/android/database/CursorWindow
>>>
>>> This usecase cannot be implemented with the existing F_SEAL_WRITE seal.
>>> To support the usecase, this patch adds a new F_SEAL_FUTURE_WRITE seal
>>> which prevents any future mmap and write syscalls from succeeding while
>>> keeping the existing mmap active.
>>
>> Please CC linux-api@ on patches like this. If you had done that, I
>> might have criticized your v1 patch instead of your v3 patch...
>>
>>> The following program shows the seal
>>> working in action:
>> [...]
>>> Cc: jreck@xxxxxxxxxx
>>> Cc: john.stultz@xxxxxxxxxx
>>> Cc: tkjos@xxxxxxxxxx
>>> Cc: gregkh@xxxxxxxxxxxxxxxxxxx
>>> Cc: hch@xxxxxxxxxxxxx
>>> Reviewed-by: John Stultz <john.stultz@xxxxxxxxxx>
>>> Signed-off-by: Joel Fernandes (Google) <joel@xxxxxxxxxxxxxxxxx>
>>> ---
>> [...]
>>> diff --git a/mm/memfd.c b/mm/memfd.c
>>> index 2bb5e257080e..5ba9804e9515 100644
>>> --- a/mm/memfd.c
>>> +++ b/mm/memfd.c
>> [...]
>>> @@ -219,6 +220,25 @@ static int memfd_add_seals(struct file *file, unsigned int seals)
>>> }
>>> }
>>>
>>> + if ((seals & F_SEAL_FUTURE_WRITE) &&
>>> + !(*file_seals & F_SEAL_FUTURE_WRITE)) {
>>> + /*
>>> + * The FUTURE_WRITE seal also prevents growing and shrinking
>>> + * so we need them to be already set, or requested now.
>>> + */
>>> + int test_seals = (seals | *file_seals) &
>>> + (F_SEAL_GROW | F_SEAL_SHRINK);
>>> +
>>> + if (test_seals != (F_SEAL_GROW | F_SEAL_SHRINK)) {
>>> + error = -EINVAL;
>>> + goto unlock;
>>> + }
>>> +
>>> + spin_lock(&file->f_lock);
>>> + file->f_mode &= ~(FMODE_WRITE | FMODE_PWRITE);
>>> + spin_unlock(&file->f_lock);
>>> + }
>>
>> So you're fiddling around with the file, but not the inode? How are
>> you preventing code like the following from re-opening the file as
>> writable?
>
> Good catch. That's fixable too though, isn't it, just by fiddling with
> the inode, right?

True.

>
> Another, more general fix might be to prevent /proc/pid/fd/N opens
> from "upgrading" access modes. But that'd be a bigger ABI break.

I think we should fix that, too. I consider it a bug fix, not an ABI break, personally.

>
>> That aside: I wonder whether a better API would be something that
>> allows you to create a new readonly file descriptor, instead of
>> fiddling with the writability of an existing fd.
>
> That doesn't work, unfortunately. The ashmem API we're replacing with
> memfd requires file descriptor continuity. I also looked into opening
> a new FD and dup2(2)ing atop the old one, but this approach doesn't
> work in the case that the old FD has already leaked to some other
> context (e.g., another dup, SCM_RIGHTS). See
> https://developer.android.com/ndk/reference/group/memory. We can't
> break ASharedMemory_setProt.


Hmm. If we fix the general reopen bug, a way to drop write access from an existing struct file would do what Android needs, right? I donât know if there are general VFS issues with that.