Re: kernfs memcg accounting

From: Michal Koutný
Date: Wed May 04 2022 - 10:10:31 EST


On Wed, May 04, 2022 at 12:00:18PM +0300, Vasily Averin <vvs@xxxxxxxxxx> wrote:
> My patches protect the host mostly from misuse, when someone creates
> a huge number of nedevices inside a container.

Understood.

> Frankly speaking I do not see a big difference between memcg of current process,
> memcg of newly created child and memcg of its parent.

I agree that's not substantial difference. It's relevant for outer
entities "injecting" something into a subtree.
As I wrote previously, charging to the creator in the generic case is
sensible.

> As far as I understand, Roman chose the parent memcg because it was a special
> case of creating a new memory group. He temporally changed active memcg
> in mem_cgroup_css_alloc() and properly accounted all required memcg-specific
> allocations.

> However, he ignored accounting for a rather large struct mem_cgroup
> therefore I think we can do not worry about 128 bytes of kernfs node.

Are you referring to the current code (>= v5.18-rc2)? All big structs
related to mem_cgroup should be accounted. What is ignored?

> Primary I mean here struct mem_cgroup allocation in mem_cgroup_alloc().

Just note that memory controller may not be always enabled so
cgroup_mkdir != mem_cgroup_alloc().

> However, I think we need to take into account any other distributions called
> inside cgroup_mkdir: struct cgroup and kernefs node in common part and
> any other cgroup-cpecific allocations in other .css_alloc functions.
> They all can be called from inside container, allocates non-accountable
> memory and by this way theoretically can be misused.

Also note that (if you're purely on unified hierachy) you can protect
against that with cgroup.max.descendants and cgroup.max.depth.

Thanks,
Michal