Re: [RFC PATCH 00/19] mm: Introduce a cgroup to limit the amount of locked and pinned memory

From: David Hildenbrand
Date: Tue Jan 31 2023 - 08:58:12 EST


On 24.01.23 21:12, Jason Gunthorpe wrote:
On Tue, Jan 24, 2023 at 04:42:29PM +1100, Alistair Popple wrote:
Having large amounts of unmovable or unreclaimable memory in a system
can lead to system instability due to increasing the likelihood of
encountering out-of-memory conditions. Therefore it is desirable to
limit the amount of memory users can lock or pin.

From userspace such limits can be enforced by setting
RLIMIT_MEMLOCK. However there is no standard method that drivers and
other in-kernel users can use to check and enforce this limit.

This has lead to a large number of inconsistencies in how limits are
enforced. For example some drivers will use mm->locked_mm while others
will use mm->pinned_mm or user->locked_mm. It is therefore possible to
have up to three times RLIMIT_MEMLOCKED pinned.

Having pinned memory limited per-task also makes it easy for users to
exceed the limit. For example drivers that pin memory with
pin_user_pages() it tends to remain pinned after fork. To deal with
this and other issues this series introduces a cgroup for tracking and
limiting the number of pages pinned or locked by tasks in the group.

However the existing behaviour with regards to the rlimit needs to be
maintained. Therefore the lesser of the two limits is
enforced. Furthermore having CAP_IPC_LOCK usually bypasses the rlimit,
but this bypass is not allowed for the cgroup.

The first part of this series converts existing drivers which
open-code the use of locked_mm/pinned_mm over to a common interface
which manages the refcounts of the associated task/mm/user
structs. This ensures accounting of pages is consistent and makes it
easier to add charging of the cgroup.

The second part of the series adds the cgroup and converts core mm
code such as mlock over to charging the cgroup before finally
introducing some selftests.

As I don't have access to systems with all the various devices I
haven't been able to test all driver changes. Any help there would be
appreciated.

I'm excited by this series, thanks for making it.

The pin accounting has been a long standing problem and cgroups will
really help!

Indeed. I'm curious how GUP-fast, pinning the same page multiple times, and pinning subpages of larger folios are handled :)

--
Thanks,

David / dhildenb