Re: [RFC PATCH] cgroup: introduce dynamic protection for memcg

From: Zhaoyang Huang
Date: Wed Apr 06 2022 - 12:25:56 EST


On Tue, Apr 5, 2022 at 8:08 PM Michal Hocko <mhocko@xxxxxxxx> wrote:
>
> On Mon 04-04-22 21:14:40, Zhaoyang Huang wrote:
> [...]
> > Please be noticed that this patch DOES protect the memcg when external
> > pressure is 1GB as fixed low does.
>
> This is getting more and more confusing (at least to me). Could you
> describe the behavior of the reclaim for the following setups/situations?
>
> a) mostly reclaiming a clean page cache - via kswapd
> b) same as above but the direct reclaim is necessary but very
> lightweight
> c) direct reclaim makes fwd progress but not enough to satisfy the
> allocation request (so the reclaim has to be retried)
> d) direct reclaim not making progress and low limit protection is
> ignored.
>
> Say we have several memcgs and only some have low memory protection
> configured. What is the user observable state of the protected group and
> when and how much the protection can be updated?
I am not sure if I understand you right. Do you have suspicions on the
test result as you think protected memcg has no chance to update the
protection or the global reclaim should have been satisfied with the
reclaiming(step d is hard to reach?). Let me try to answer it under my
understanding, please give me feedback if you need more info. The
protection is updated while mem_cgroup_calculate_protection is called
during either kswapd or direct reclaim for each round of the priority
reclaiming and then the memcg's lruvec will be reached in step d.
>
> I think it would be also helpful to describe the high level semantic of
> this feature.
>
> > Besides, how does the admin decide
> > the exact number of low/min if it expand from small to even xGB in a
> > quick changing scenario?
>
> This is not really related, is it? There are different ways to tune for
> the protection.
I don't think so. IMO, it is hard to protect when memcg has a wide and
random range of its usage especially when depending on scenarios. Does
the example of EAS on scheduler make more sense? When comparing with
the legacy CFS, EAS does be against to some original design as load
balance etc, while it will keep some small tasks into one CORE.
>
> [...]
> --
> Michal Hocko
> SUSE Labs