Re: [PATCH v5 04/13] mm/shmem: Restrict MFD_INACCESSIBLE memory against RLIMIT_MEMLOCK

From: David Hildenbrand
Date: Wed Apr 13 2022 - 12:31:01 EST


>
> So this is another situation where the actual backend (TDX, SEV, pKVM, pure software) makes a difference -- depending on exactly what backend we're using, the memory may not be unmoveable. It might even be swappable (in the potentially distant future).

Right. And on a system without swap we don't particularly care about
mlock, but we might (in most cases) care about fragmentation with
unmovable memory.

>
> Anyway, here's a concrete proposal, with a bit of handwaving:

Thanks for investing some brainpower.

>
> We add new cgroup limits:
>
> memory.unmoveable
> memory.locked
>
> These can be set to an actual number or they can be set to the special value ROOT_CAP. If they're set to ROOT_CAP, then anyone in the cgroup with capable(CAP_SYS_RESOURCE) (i.e. the global capability) can allocate movable or locked memory with this (and potentially other) new APIs. If it's 0, then they can't. If it's another value, then the memory can be allocated, charged to the cgroup, up to the limit, with no particular capability needed. The default at boot is ROOT_CAP. Anyone who wants to configure it differently is free to do so. This avoids introducing a DoS, makes it easy to run tests without configuring cgroup, and lets serious users set up their cgroups.

I wonder what the implications are for existing user space.

Assume we want to move page pinning (rdma, vfio, io_uring, ...) to the
new model. How can we be sure

a) We don't break existing user space
b) We don't open the doors unnoticed for the admin to go crazy on
unmovable memory.

Any ideas?

>
> Nothing is charge per mm.
>
> To make this fully sensible, we need to know what the backend is for the private memory before allocating any so that we can charge it accordingly.

Right, the support for migration and/or swap defines how to account.

--
Thanks,

David / dhildenb