Re: [PATCHv4] memcg: reclaim memory from node in round-robin

From: KAMEZAWA Hiroyuki
Date: Thu May 26 2011 - 20:01:33 EST


On Thu, 26 May 2011 12:52:07 -0700
Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:

> On Fri, 6 May 2011 15:13:02 +0900
> KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx> wrote:
>
> > > It would be much better to work out the optimum time at which to rotate
> > > the index via some deterministic means.
> > >
> > > If we can't think of a way of doing that then we should at least pace
> > > the rotation frequency via something saner than wall-time. Such as
> > > number-of-pages-scanned.
> > >
> >
> >
> > What I think now is using reclaim_stat or usigng some fairness based on
> > the ratio of inactive file caches. We can calculate the total sum of
> > recalaim_stat which gives us a scan_ratio for a whole memcg. And we can
> > calculate LRU rotate/scan ratio per node. If rotate/scan ratio is small,
> > it will be a good candidate of reclaim target. Hmm,
> >
> > - check which memory(anon or file) should be scanned.
> > (If file is too small, rotate/scan ratio of file is meaningless.)
> > - check rotate/scan ratio of each nodes.
> > - calculate weights for each nodes (by some logic ?)
> > - give a fair scan w.r.t node's weight.
> >
> > Hmm, I'll have a study on this.
>
> How's the study coming along ;)
>
> I'll send this in to Linus today, but I'll feel grumpy while doing so.
> We really should do something smarter here - the magic constant will
> basically always be suboptimal for everyone and we end up tweaking its
> value (if we don't, then the feature just wasn't valuable in the first
> place) and then we add a tunable and then people try to tweak the
> default setting of the tunable and then I deride them for not setting
> the tunable in initscripts and then we have to maintain the stupid
> tunable after we've changed the internal implementation and it's all
> basically screwed up.
>
> How to we automatically determine the optimum time at which to rotate,
> at runtime?
>

Ah, I think I should check it after dirty page accounting comes...because
ratio of dirty pages is an important information..

Ok, what I think now is just comparing the number of INACTIVE_FILE or the number
of FILE CACHES per node.

I think we can periodically update per-node and total amount of file caches
and we can record per-node
node-file-cache * 100/ total-file cache
information into memcg's per-node structure.

Then, I think we can do some scheduling like lottery scheduling, a scan proportional
to the ratio of file caches in the memcg. If it's better to check INACTIVE_ANON,
I think swappiness can be used in above calcuration.

But yes, I or someone may be able to think of something much better.

Thanks,
-Kame





--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/