[PATCH v2 0/5] Use obj_cgroup APIs to charge kmem pages

From: Muchun Song
Date: Wed Mar 03 2021 - 07:07:26 EST


Since Roman series "The new cgroup slab memory controller" applied. All
slab objects are charged with the new APIs of obj_cgroup. The new APIs
introduce a struct obj_cgroup to charge slab objects. It prevents
long-living objects from pinning the original memory cgroup in the memory.
But there are still some corner objects (e.g. allocations larger than
order-1 page on SLUB) which are not charged with the new APIs. Those
objects (include the pages which are allocated from buddy allocator
directly) are charged as kmem pages which still hold a reference to
the memory cgroup.

E.g. We know that the kernel stack is charged as kmem pages because the
size of the kernel stack can be greater than 2 pages (e.g. 16KB on x86_64
or arm64). If we create a thread (suppose the thread stack is charged to
memory cgroup A) and then move it from memory cgroup A to memory cgroup
B. Because the kernel stack of the thread hold a reference to the memory
cgroup A. The thread can pin the memory cgroup A in the memory even if
we remove the cgroup A. If we want to see this scenario by using the
following script. We can see that the system has added 500 dying cgroups
(This is not a real world issue, just a script to show that the large
kmallocs are charged as kmem pages which can pin the memory cgroup in the
memory).

#!/bin/bash

cat /proc/cgroups | grep memory

cd /sys/fs/cgroup/memory
echo 1 > memory.move_charge_at_immigrate

for i in range{1..500}
do
mkdir kmem_test
echo $$ > kmem_test/cgroup.procs
sleep 3600 &
echo $$ > cgroup.procs
echo `cat kmem_test/cgroup.procs` > cgroup.procs
rmdir kmem_test
done

cat /proc/cgroups | grep memory

This patchset aims to make those kmem pages to drop the reference to memory
cgroup by using the APIs of obj_cgroup. Finally, we can see that the number
of the dying cgroups will not increase if we run the above test script.

Patch 1-3 use obj_cgroup APIs to charge kmem pages.
Patch 4 introduces remote objcg charging APIs.
Patch 5 uses remote objcg charging APIs to charge kernel memory.

Changlogs in v2:
1. Fix some types in the commit log (Thanks Roman).
2. Do not introduce page_memcg_kmem helper (Thanks to Johannes and Shakeel).
3. Reduce the CC list to mm/memcg folks (Thanks to Johannes).
4. Introduce remote objcg charging APIs instead of convert "remote memcg
charging APIs" to "remote objcg charging APIs".

Muchun Song (5):
mm: memcontrol: introduce obj_cgroup_{un}charge_page
mm: memcontrol: make page_memcg{_rcu} only applicable for non-kmem
page
mm: memcontrol: charge kmem pages by using obj_cgroup APIs
mm: memcontrol: introduce remote objcg charging API
mm: memcontrol: use remote objcg charging APIs to charge kernel memory

fs/buffer.c | 10 +-
fs/notify/fanotify/fanotify.c | 6 +-
fs/notify/fanotify/fanotify_user.c | 2 +-
fs/notify/group.c | 3 +-
fs/notify/inotify/inotify_fsnotify.c | 8 +-
fs/notify/inotify/inotify_user.c | 2 +-
include/linux/bpf.h | 2 +-
include/linux/fsnotify_backend.h | 2 +-
include/linux/memcontrol.h | 114 +++++++++++++---
include/linux/sched.h | 4 +
include/linux/sched/mm.h | 38 ++++++
kernel/bpf/syscall.c | 35 ++---
kernel/fork.c | 3 +
mm/memcontrol.c | 257 ++++++++++++++++++++++++++---------
mm/page_alloc.c | 4 +-
15 files changed, 372 insertions(+), 118 deletions(-)

--
2.11.0