Re: [RFC v3 PATCH 0/5] mm: memcontrol: do memory reclaim when offlining

From: Yang Shi
Date: Wed Jan 09 2019 - 17:12:09 EST




On 1/9/19 1:23 PM, Johannes Weiner wrote:
On Wed, Jan 09, 2019 at 12:36:11PM -0800, Yang Shi wrote:
As I mentioned above, if we know some page caches from some memcgs
are referenced one-off and unlikely shared, why just keep them
around to increase memory pressure?
It's just not clear to me that your scenarios are generic enough to
justify adding two interfaces that we have to maintain forever, and
that they couldn't be solved with existing mechanisms.

Please explain:

- Unmapped clean page cache isn't expensive to reclaim, certainly
cheaper than the IO involved in new application startup. How could
recycling clean cache be a prohibitive part of workload warmup?

It is nothing about recycling. Those page caches might be referenced by memcg just once, then nobody touch them until memory pressure is hit. And, they might be not accessed again at any time soon.


- Why you cannot temporarily raise the kswapd watermarks right before
an important application starts up (your answer was sorta handwavy)

It could, but kswapd watermark is global. Boosting kswapd watermark may cause kswapd reclaim some memory from some memcgs which we want to keep untouched. Although v2's low/min could provide some protection, it is still not prohibited generally. And, v1 doesn't have such protection at all.

force_empty or wipe_on_offline could be used to target to some specific memcgs which we may know exactly what they do or it is safe to reclaim memory from them. IMHO, this may make better isolation.


- Why you cannot use madvise/fadvise when an application whose cache
you won't reuse exits

Sure we can. But, we can't guarantee all applications use them properly.


- Why you couldn't set memory.high or memory.max to 0 after the
application quits and before you call rmdir on the cgroup

I recall I explained this in the review email for the first version. Set memory.high or memory.max to 0 would trigger direct reclaim which may stall the offline of memcg. But, we have "restarting the same name job" logic in our usecase (I'm not quite sure why they do so). Basically, it means to create memcg with the exact same name right after the old one is deleted, but may have different limit or other settings. The creation has to wait for rmdir is done.


Adding a permanent kernel interface is a serious measure. I think you
need to make a much better case for it, discuss why other options are
not practical, and show that this will be a generally useful thing for
cgroup users and not just a niche fix for very specific situations.

I do understand your concern and the maintenance cost for a permanent kernel interface. I'm not quite sure if this is generic enough, however, Michal Hocko did mention "It seems we have several people asking for something like that already.", so at least it sounds not like "a niche fix for very specific situations".

In my first submit, I did reuse force_empty interface to keep it less intrusive, at least not a new interface. Since we have several people asking for something like that already, Michal suggested a new knob instead of reusing force_empty.

Thanks,
Yang