Re: [patch 03/11] mm: vmscan: distinguish between memcg triggeringreclaim and memcg being scanned

From: Johannes Weiner
Date: Tue Sep 20 2011 - 04:58:50 EST


On Mon, Sep 19, 2011 at 04:29:55PM +0200, Michal Hocko wrote:
> On Mon 12-09-11 12:57:20, Johannes Weiner wrote:
> > Memory cgroup hierarchies are currently handled completely outside of
> > the traditional reclaim code, which is invoked with a single memory
> > cgroup as an argument for the whole call stack.
> >
> > Subsequent patches will switch this code to do hierarchical reclaim,
> > so there needs to be a distinction between a) the memory cgroup that
> > is triggering reclaim due to hitting its limit and b) the memory
> > cgroup that is being scanned as a child of a).
> >
> > This patch introduces a struct mem_cgroup_zone that contains the
> > combination of the memory cgroup and the zone being scanned, which is
> > then passed down the stack instead of the zone argument.
> >
> > Signed-off-by: Johannes Weiner <jweiner@xxxxxxxxxx>
>
> Looks good to me. Some minor comments bellow
> Anyways:
> Reviewed-by: Michal Hocko <mhocko@xxxxxxx>

Thanks!

> > @@ -1853,13 +1865,13 @@ static int vmscan_swappiness(struct scan_control *sc)
> > *
> > * nr[0] = anon pages to scan; nr[1] = file pages to scan
> > */
> > -static void get_scan_count(struct zone *zone, struct scan_control *sc,
> > - unsigned long *nr, int priority)
> > +static void get_scan_count(struct mem_cgroup_zone *mz, struct scan_control *sc,
> > + unsigned long *nr, int priority)
> > {
> > unsigned long anon, file, free;
> > unsigned long anon_prio, file_prio;
> > unsigned long ap, fp;
> > - struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc);
> > + struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(mz);
> > u64 fraction[2], denominator;
> > enum lru_list l;
> > int noswap = 0;
>
> You can save some patch lines by:
> struct zone *zone = mz->zone;
> and not doing zone => mz->zone changes that follow.

Actually, I really hate that I had to do that local zone variable in
other places. I only did it where it's used so often that it would
have changed every other line. If you insist, I'll change it, but I
would prefer to avoid it when possible.

> > @@ -2390,6 +2413,18 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont,
> > }
> > #endif
> >
> > +static void age_active_anon(struct zone *zone, struct scan_control *sc,
> > + int priority)
> > +{
> > + struct mem_cgroup_zone mz = {
> > + .mem_cgroup = NULL,
> > + .zone = zone,
> > + };
> > +
> > + if (inactive_anon_is_low(&mz))
> > + shrink_active_list(SWAP_CLUSTER_MAX, &mz, sc, priority, 0);
> > +}
> > +
>
> I do not like this very much because we are using a similar construct in
> shrink_mem_cgroup_zone so we are duplicating that code.
> What about adding age_mem_cgroup_active_anon (something like shrink_zone).

I am not sure I follow and I don't see what could be shared between
the zone shrinking and this as there are different exit conditions to
the hierarchy walk. Can you elaborate?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/