Re: [PATCH v2] mm: mlock: remove lru_add_drain_all()

From: Michal Hocko
Date: Sat Oct 21 2017 - 04:11:55 EST


On Sat 21-10-17 08:51:04, Balbir Singh wrote:
> On Fri, Oct 20, 2017 at 9:25 AM, Shakeel Butt <shakeelb@xxxxxxxxxx> wrote:
> > lru_add_drain_all() is not required by mlock() and it will drain
> > everything that has been cached at the time mlock is called. And
> > that is not really related to the memory which will be faulted in
> > (and cached) and mlocked by the syscall itself.
> >
> > Without lru_add_drain_all() the mlocked pages can remain on pagevecs
> > and be moved to evictable LRUs. However they will eventually be moved
> > back to unevictable LRU by reclaim. So, we can safely remove
> > lru_add_drain_all() from mlock syscall. Also there is no need for
> > local lru_add_drain() as it will be called deep inside __mm_populate()
> > (in follow_page_pte()).
> >
> > On larger machines the overhead of lru_add_drain_all() in mlock() can
> > be significant when mlocking data already in memory. We have observed
> > high latency in mlock() due to lru_add_drain_all() when the users
> > were mlocking in memory tmpfs files.
> >
> > Signed-off-by: Shakeel Butt <shakeelb@xxxxxxxxxx>
> > ---
>
> I'm afraid I still don't fully understand the impact in terms of numbers and
> statistics as seen from inside a cgroup.

I really fail to see why there would be anything cgroup specific here.

> My understanding is that we'll slowly
> see the unreclaimable stats go up as we drain the pvec's across CPU's

Not really. Draining is a bit tricky. Anonymous PF (gup) use
lru_cache_add_active_or_unevictable so we bypass the LRU cache
on mlocked pages altogether. Filemap faults go via cache and
__pagevec_lru_add_fn to flush a full cache is not mlock aware. But gup
(follow_page_pte) path tries to move existing and mapped pages to the
unevictable LRU list. So yes we can see lazy mlock pages on evictable
LRU but reclaim will get them to the unevictable list when needed.
This should be mostly reduced to file mappings. But I haven't checked
the code recently and mlock is quite tricky so I might misremember.

In any case lru_add_drain_all is quite tangent to all this AFAICS.

> I understand the optimization and I can see why lru_add_drain_all() is
> expensive.

not only it is expensive it is paying price for previous caching which
might not be directly related to the mlock syscall.

> Acked-by: Balbir Singh <bsingharora@xxxxxxxxx>
>
> Balbir Singh.

--
Michal Hocko
SUSE Labs