Re: [rfc patch 2/6] vmscan: make distinction between memcg reclaimand LRU list selection

From: Johannes Weiner
Date: Tue May 17 2011 - 04:25:54 EST


On Mon, May 16, 2011 at 11:38:07PM -0700, Ying Han wrote:
> On Thu, May 12, 2011 at 9:03 AM, Johannes Weiner <hannes@xxxxxxxxxxx> wrote:
> > On Thu, May 12, 2011 at 11:33:13AM -0400, Rik van Riel wrote:
> >> On 05/12/2011 10:53 AM, Johannes Weiner wrote:
> >> >The reclaim code has a single predicate for whether it currently
> >> >reclaims on behalf of a memory cgroup, as well as whether it is
> >> >reclaiming from the global LRU list or a memory cgroup LRU list.
> >> >
> >> >Up to now, both cases always coincide, but subsequent patches will
> >> >change things such that global reclaim will scan memory cgroup lists.
> >> >
> >> >This patch adds a new predicate that tells global reclaim from memory
> >> >cgroup reclaim, and then changes all callsites that are actually about
> >> >global reclaim heuristics rather than strict LRU list selection.
> >> >
> >> >Signed-off-by: Johannes Weiner<hannes@xxxxxxxxxxx>
> >> >---
> >> >  mm/vmscan.c |   96 ++++++++++++++++++++++++++++++++++------------------------
> >> >  1 files changed, 56 insertions(+), 40 deletions(-)
> >> >
> >> >diff --git a/mm/vmscan.c b/mm/vmscan.c
> >> >index f6b435c..ceeb2a5 100644
> >> >--- a/mm/vmscan.c
> >> >+++ b/mm/vmscan.c
> >> >@@ -104,8 +104,12 @@ struct scan_control {
> >> >      */
> >> >     reclaim_mode_t reclaim_mode;
> >> >
> >> >-    /* Which cgroup do we reclaim from */
> >> >-    struct mem_cgroup *mem_cgroup;
> >> >+    /*
> >> >+     * The memory cgroup we reclaim on behalf of, and the one we
> >> >+     * are currently reclaiming from.
> >> >+     */
> >> >+    struct mem_cgroup *memcg;
> >> >+    struct mem_cgroup *current_memcg;
> >>
> >> I can't say I'm fond of these names.  I had to read the
> >> rest of the patch to figure out that the old mem_cgroup
> >> got renamed to current_memcg.
> >
> > To clarify: sc->memcg will be the memcg that hit the hard limit and is
> > the main target of this reclaim invocation.  current_memcg is the
> > iterator over the hierarchy below the target.
>
> I would assume the new variable memcg is a renaming of the
> "mem_cgroup" which indicating which cgroup we reclaim on behalf of.

The thing is, mem_cgroup would mean both the group we are reclaiming
on behalf of AND the group we are currently reclaiming from. Because
the hierarchy walk was implemented in memcontrol.c, vmscan.c only ever
saw one cgroup at a time.

> About the "current_memcg", i couldn't find where it is indicating to
> be the current cgroup under the hierarchy below the "memcg".

It's codified in shrink_zone().

for each child of sc->memcg:
sc->current_memcg = child
reclaim(sc)

In the new version I named (and documented) them:

sc->target_mem_cgroup: the entry point into the hierarchy, set
by the functions that have the scan control structure on their
stack. That's the one hitting its hard limit.

sc->mem_cgroup: the current position in the hierarchy below
sc->target_mem_cgroup. That's the one that actively gets its
pages reclaimed.

> Both mem_cgroup_shrink_node_zone() and try_to_free_mem_cgroup_pages()
> are called within mem_cgroup_hierarchical_reclaim(), and the sc->memcg
> is initialized w/ the victim passed down which is already the memcg
> under hierarchy.

I changed mem_cgroup_shrink_node_zone() to use do_shrink_zone(), and
mem_cgroup_hierarchical_reclaim() no longer calls
try_to_free_mem_cgroup_pages().

So there is no hierarchy walk triggered from within a hierarchy walk.

I just noticed that there is, however, a bug in that
mem_cgroup_shrink_node_zone() does not initialize sc->current_memcg.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/