Re: [PATCH v3 2/2] mm, memcg: Decouple e{low,min} state mutations from protection checks

From: Naresh Kamboju
Date: Fri May 22 2020 - 11:52:29 EST


On Fri, 22 May 2020 at 17:49, Yafang Shao <laoar.shao@xxxxxxxxx> wrote:
>
> On Fri, May 22, 2020 at 7:01 PM Naresh Kamboju
> <naresh.kamboju@xxxxxxxxxx> wrote:
> >
> > On Tue, 5 May 2020 at 14:12, Yafang Shao <laoar.shao@xxxxxxxxx> wrote:
> > >
> > > From: Chris Down <chris@xxxxxxxxxxxxxx>
> > >
> > > mem_cgroup_protected currently is both used to set effective low and min
> > > and return a mem_cgroup_protection based on the result. As a user, this
> > > can be a little unexpected: it appears to be a simple predicate function,
> > > if not for the big warning in the comment above about the order in which
> > > it must be executed.
> > >
> > > This change makes it so that we separate the state mutations from the
> > > actual protection checks, which makes it more obvious where we need to be
> > > careful mutating internal state, and where we are simply checking and
> > > don't need to worry about that.
> >
> > This patch is causing oom-killer while running mkfs -t ext4 on i386 kernel
> > running on x86_64 machine version linux-next 5.7.0-rc6-next-20200521.
> >
>
> Hi Narash,
>
> Thanks for your report.
> My suggestion to the issue found by you is reverting this bad commit.

Thanks for giving details on this problem.
I am not sure who will propose reverting this patch on the linux-next tree.
Please add Reported-by if it is sane.

>
> As I have explained earlier in another mail thread [1] that the usage
> around memcg->{emin, elow} is very buggy.
> We shouldn't use memcg->{emin, elow} in the reclaim context directly,
> because these two values can be modified by many reclaimers, so the
> good usage of it is storing the protection value into the
> scan_control. IOW, different reclaimers have different protection.
> But unfortunately my suggestion is ignored.
>
> We can set them to 0 before using them to workaround the issue found
> by you, but the fact is that we will introduce a new issue once we fix
> an old issue.
>
> [1]. https://lore.kernel.org/linux-mm/20200425152418.28388-1-laoar.shao@xxxxxxxxx/


- Naresh