Re: kernfs memcg accounting

From: Vasily Averin
Date: Fri May 06 2022 - 04:37:48 EST


On 5/5/22 12:47, Michal Koutný wrote:
> On Thu, May 05, 2022 at 12:16:12AM +0300, Vasily Averin <vvs@xxxxxxxxxx> wrote:
>> I think it should allocate at least 2 pages.
>
> After decoding kmalloc_type(), I agree this falls into a global
> (unaccouted) kmalloc_cache.
>
>> However if cgroup_mkdir() calls mem_cgroup_alloc() it correctly account huge percpu
>> allocations but ignores neighbour multipage allocation.
>
> So, the spillover is bound and proportional to memcg limit (same ration
> like these two sizes).
> But it may be better to account it properly, especially if it's
> contribution form an offlined mem_cgroup.

I've traced mkdir /sys/fs/cgroup/vvs.test on 4cpu VM with Fedora
and self-complied upstream kernel, see table with results below.
These calculations are not precise, it depends on kernel config options,
number of cpus, enabled controllers, ignores possible page allocations etc
However I think this is enough to clarify the general situation.

Results:
- Total sum of accounted memory is ~60Kb.
- Accounted only 2 huge percpu allocation marked '=', ~18Kb.
(and can be 0 without memory controller)
- kernfs nodes and iattrs are among the main memory consumers.
they are marked '++' to be accounted first
- cgroup_mkdir always allocates 4Kb,
so I think it should be accounted first too.
- mem_cgroup_css_alloc allocations consumes 10K,
it's enough to be accounted, especially for VMs with 1-2 CPUs
- Almost all other allocations are quite small and can be ignored.
Exceptions are percpu allocations in alloc_fair_sched_group(),
this can consume a significant amount of memory on nodes
with multiple processors.
marked by '+', can be accounted later.
- kernfs nodes consumes ~6Kb memory inside simple_xattr_set()
and simple_xattr_alloc(). This is quite high numbers,
but is not critical, and I think we can ignore it at the moment.
- If all proposed memory will be accounted it gives us ~47Kb,
or ~75% of all allocated memory.

Any comments are welcome.

Thank you,
Vasily Averin

number bytes $1*$2 sum note call_site
of alloc
allocs
------------------------------------------------------------
1 14448 14448 14448 = percpu_alloc_percpu:
1 8192 8192 22640 ++ (mem_cgroup_css_alloc+0x54)
49 128 6272 28912 ++ (__kernfs_new_node+0x4e)
49 96 4704 33616 ? (simple_xattr_alloc+0x2c)
49 88 4312 37928 ++ (__kernfs_iattrs+0x56)
1 4096 4096 42024 ++ (cgroup_mkdir+0xc7)
1 3840 3840 45864 = percpu_alloc_percpu:
4 512 2048 47912 + (alloc_fair_sched_group+0x166)
4 512 2048 49960 + (alloc_fair_sched_group+0x139)
1 2048 2048 52008 ++ (mem_cgroup_css_alloc+0x109)
49 32 1568 53576 ? (simple_xattr_set+0x5b)
2 584 1168 54744 (radix_tree_node_alloc.constprop.0+0x8d)
1 1024 1024 55768 (cpuset_css_alloc+0x30)
1 1024 1024 56792 (alloc_shrinker_info+0x79)
1 768 768 57560 percpu_alloc_percpu:
1 640 640 58200 (sched_create_group+0x1c)
33 16 528 58728 (__kernfs_new_node+0x31)
1 512 512 59240 (pids_css_alloc+0x1b)
1 512 512 59752 (blkcg_css_alloc+0x39)
9 48 432 60184 percpu_alloc_percpu:
13 32 416 60600 (__kernfs_new_node+0x31)
1 384 384 60984 percpu_alloc_percpu:
1 256 256 61240 (perf_cgroup_css_alloc+0x1c)
1 192 192 61432 percpu_alloc_percpu:
1 64 64 61496 (mem_cgroup_css_alloc+0x363)
1 32 32 61528 (ioprio_alloc_cpd+0x39)
1 32 32 61560 (ioc_cpd_alloc+0x39)
1 32 32 61592 (blkcg_css_alloc+0x6b)
1 32 32 61624 (alloc_fair_sched_group+0x52)
1 32 32 61656 (alloc_fair_sched_group+0x2e)
3 8 24 61680 (__kernfs_new_node+0x31)
3 8 24 61704 (alloc_cpumask_var_node+0x1b)
1 24 24 61728 percpu_alloc_percpu: