Re: [PATCH v4 0/6] mm: introduce memfd_secret system call to create "secret" memory areas

From: David Hildenbrand
Date: Wed Aug 19 2020 - 08:06:17 EST


On 19.08.20 13:42, Mike Rapoport wrote:
> On Wed, Aug 19, 2020 at 12:47:54PM +0200, David Hildenbrand wrote:
>> On 18.08.20 16:15, Mike Rapoport wrote:
>>> From: Mike Rapoport <rppt@xxxxxxxxxxxxx>
>>>
>>> Hi,
>>>
>>> This is an implementation of "secret" mappings backed by a file descriptor.
>>>
>>> v4 changes:
>>> * rebase on v5.9-rc1
>>> * Do not redefine PMD_PAGE_ORDER in fs/dax.c, thanks Kirill
>>> * Make secret mappings exclusive by default and only require flags to
>>> memfd_secret() system call for uncached mappings, thanks again Kirill :)
>>>
>>> v3 changes:
>>> * Squash kernel-parameters.txt update into the commit that added the
>>> command line option.
>>> * Make uncached mode explicitly selectable by architectures. For now enable
>>> it only on x86.
>>>
>>> v2 changes:
>>> * Follow Michael's suggestion and name the new system call 'memfd_secret'
>>> * Add kernel-parameters documentation about the boot option
>>> * Fix i386-tinyconfig regression reported by the kbuild bot.
>>> CONFIG_SECRETMEM now depends on !EMBEDDED to disable it on small systems
>>> from one side and still make it available unconditionally on
>>> architectures that support SET_DIRECT_MAP.
>>>
>>>
>>> The file descriptor backing secret memory mappings is created using a
>>> dedicated memfd_secret system call The desired protection mode for the
>>> memory is configured using flags parameter of the system call. The mmap()
>>> of the file descriptor created with memfd_secret() will create a "secret"
>>> memory mapping. The pages in that mapping will be marked as not present in
>>> the direct map and will have desired protection bits set in the user page
>>> table. For instance, current implementation allows uncached mappings.
>>>
>>> Although normally Linux userspace mappings are protected from other users,
>>> such secret mappings are useful for environments where a hostile tenant is
>>> trying to trick the kernel into giving them access to other tenants
>>> mappings.
>>>
>>> Additionally, the secret mappings may be used as a mean to protect guest
>>> memory in a virtual machine host.
>>>
>>
>> Just a general question. I assume such pages (where the direct mapping
>> was changed) cannot get migrated - I can spot a simple alloc_page(). So
>> essentially a process can just allocate a whole bunch of memory that is
>> unmovable, correct? Is there any limit? Is it properly accounted towards
>> the process (memctl) ?
>
> The memory as accounted in the same way like with mlock(), so normal
> user won't be able to allocate more than RLIMIT_MEMLOCK.

Okay, thanks. AFAIU the difference to mlock() is that the pages here are
not movable, fragment memory, and limit compaction. Hm.

--
Thanks,

David / dhildenb