Re: [PATCH RFC] xfs, memcg: Call xfs_fs_nr_cached_objects() only in case of global reclaim

From: Dave Chinner
Date: Thu Mar 22 2018 - 19:47:08 EST


On Thu, Mar 22, 2018 at 07:52:37PM +0300, Kirill Tkhai wrote:
> Here is the problem I'm solving: https://lkml.org/lkml/2018/3/21/365.

Oh, finally you tell me what the problem is that you're trying to
solve. I *asked this several times* and got no response. Thank you
for wasting so much of my time.

> Current shrinker is not scalable. Then there are many memcg and mounts,
> the time of iteration shrink_slab() in case of global reclaim can
> take much time. There is times of shrink_slab() by the link. A node
> with 200 containers may waste 4 seconds on global reclaim just to
> iterate over all shrinkers for all cgroups, call shrinker::count_objects()
> and receive 0 zero objects.

So, your problem is the way the memcgs were tacked onto the side
of the list_lru infrastructure and are iterated, which has basically
nothing to do with the way the low level XFS inode shrinker behaves.

/me looks at the patches

/me shudders at the thought of external "cache has freeable items"
control for the shrinking of vfs caches.

Biggest problem I see with this is the scope for coherency bugs ini
the "memcg shrinker has freeable items" tracking. If that happens,
there's no way of getting that memcg to run it's shrinker ever
again. That seems very, very fragile and likely to be an endless
source of OOM bugs. The whole point of the shrinker polling
infrastructure is that it is not susceptible to this sort of bug.

Really, the problem is that there's no separate list of memcg aware
shrinkers, so every registered shrinker has to be iterated just to
find the one shrinker that is memcg aware.

So why not just do the simple thing which is create a separate
"memcg-aware" shrinker list (i.e. create shrinker_list_memcg
alongside shrinker_list) and only iterate the shrinker_list_memcg
when a memcg is passed to shrink_slab()?

That means we'll only run 2 shrinkers per memcg at most (sueprblock
and working set) per memcg reclaim call. That's a simple 10-20 line
change, not a whole mess of new code and issues...

> Can't we call shrink of shared objects only for top memcg? Something like
> this:

What's a "shared object", and how is it any different to a normal
slab cache object?

CHeers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx