Re: [PATCH 1/2] memcg: fix page_referencies cgroup filter on globalreclaim

From: Konstantin Khlebnikov
Date: Tue Feb 21 2012 - 06:38:08 EST


Johannes Weiner wrote:
On Wed, Feb 15, 2012 at 08:28:30PM +0400, Konstantin Khlebnikov wrote:
Global memory reclaimer should't skip referencies for any pages,
even if they are shared between different cgroups.

Agreed: if we reclaim from one memcg because of its limit, we want to
reclaim those pages that this group is not using. If it's used by
someone else, it should be evicted and refaulted by the group that
needs it.

If we reclaim globally, all references are "true" because we want to
evict those pages that are not used by any cgroup.

But if we reclaim a hierarchical subgroup, we don't want to evict
pages that are shared among this hierarchy, either, even if the memcg
that has the page charged to it is not using it. Bouncing the page
around the hierarchy is not sensible, because it does not solve the
problem of the parent hitting its limit when the sibling group will
refault it in a blink of an eye. It should only be evicted if the
memcg that's not using it nears its own limit, because only in that
case would reclaiming the page remedy the situation.

This patch adds scan_control->current_mem_cgroup, which points to currently
shrinking sub-cgroup in hierarchy, at global reclaim it always NULL.

So to be consistent, I'm wondering if we should pass
sc->target_mem_cgroup - the limit-hitting hierarchy root - to
page_referenced() and then have mm_match_cgroup() do a
mem_cgroup_same_or_subtree() check to see if the vma is in the
hierarchy rooted at sc->target_mem_cgroup.

Global reclaim is handled automatically, because mm_match_cgroup() is
not checked when the passed memcg is NULL, which sc->target_mem_cgroup
is for global reclaim.

Also we can try to recharge page to other cgroup, if we found in rmap another its user
outsize of currently shrinking hierarchy, page there is isolated, so at the end we will
insert page directly to its lru.

But the main purpose of this patch for me is killing mz->mem_cgroup dereference,
because I plan to replace mz with direct reference to lruvec, which will be memcg-free object.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/