Re: [PATCH] memcg: Ignore unprotected parent in mem_cgroup_protected()

From: Chris Down
Date: Sun Jun 16 2019 - 06:44:45 EST


Hi Xunlei,

Xunlei Pang writes:
docker and various types(different memory capacity) of containers
are managed by k8s, it's a burden for k8s to maintain those dynamic
figures, simply set "max" to key containers is always welcome.

Right, setting "max" is generally a fine way of going about it.

Set "max" to docker also protects docker cgroup memory(as docker
itself has tasks) unnecessarily.

That's not correct -- leaf memcgs have to _explicitly_ request memory protection. From the documentation:

memory.low

[...]

Best-effort memory protection. If the memory usages of a
cgroup and all its ancestors are below their low boundaries,
the cgroup's memory won't be reclaimed unless memory can be
reclaimed from unprotected cgroups.

Note the part that the cgroup itself also must be within its low boundary, which is not implied simply by having ancestors that would permit propagation of protections.

In this case, Docker just shouldn't request it for those Docker-related tasks, and they won't get any. That seems a lot simpler and more intuitive than special casing "0" in ancestors.

This patch doesn't take effect on any intermediate layer with
positive memory.min set, it requires all the ancestors having
0 memory.min to work.

Nothing special change, but more flexible to business deployment...

Not so, this change is extremely "special". It violates the basic expectation that 0 means no possibility of propagation of protection, and I still don't see a compelling argument why Docker can't just set "max" in the intermediate cgroup and not accept any protection in leaf memcgs that it doesn't want protection for.