Re: [PATCH 1/7] mm: memcontrol: fix cpuhotplug statistics flushing

From: Roman Gushchin
Date: Tue Feb 02 2021 - 21:29:55 EST


On Tue, Feb 02, 2021 at 03:07:47PM -0800, Roman Gushchin wrote:
> On Tue, Feb 02, 2021 at 01:47:40PM -0500, Johannes Weiner wrote:
> > The memcg hotunplug callback erroneously flushes counts on the local
> > CPU, not the counts of the CPU going away; those counts will be lost.
> >
> > Flush the CPU that is actually going away.
> >
> > Also simplify the code a bit by using mod_memcg_state() and
> > count_memcg_events() instead of open-coding the upward flush - this is
> > comparable to how vmstat.c handles hotunplug flushing.
>
> To the whole series: it's really nice to have an accurate stats at
> non-leaf levels. Just as an illustration: if there are 32 CPUs and
> 1000 sub-cgroups (which is an absolutely realistic number, because
> often there are many dying generations of each cgroup), the error
> margin is 3.9GB. It makes all numbers pretty much random and all
> possible tests extremely flaky.

Btw, I was just looking into kmem kselftests failures/flakiness,
which is caused by exactly this problem: without waiting for the
finish of dying cgroups reclaim, we can't make any reliable assumptions
about what to expect from memcg stats.

So looking forward to have this patchset merged!