Re: [PATCH v6 8/9] mm: multigenerational lru: user interface

From: Yu Zhao
Date: Wed Jan 12 2022 - 03:36:01 EST


On Mon, Jan 10, 2022 at 12:27:19PM +0200, Mike Rapoport wrote:
> Hi,
>
> On Tue, Jan 04, 2022 at 01:22:27PM -0700, Yu Zhao wrote:
> > Add /sys/kernel/mm/lru_gen/enabled as a runtime kill switch.
> >
> > Add /sys/kernel/mm/lru_gen/min_ttl_ms for thrashing prevention.
> > Compared with the size-based approach, e.g., [1], this time-based
> > approach has the following advantages:
> > 1) It's easier to configure because it's agnostic to applications and
> > memory sizes.
> > 2) It's more reliable because it's directly wired to the OOM killer.
> >
> > Add /sys/kernel/debug/lru_gen for working set estimation and proactive
> > reclaim. Compared with the page table-based approach and the PFN-based
> > approach, e.g., mm/damon/[vp]addr.c, this lruvec-based approach has
> > the following advantages:
> > 1) It offers better choices because it's aware of memcgs, NUMA nodes,
> > shared mappings and unmapped page cache.
> > 2) It's more scalable because it's O(nr_hot_evictable_pages), whereas
> > the PFN-based approach is O(nr_total_pages).
> >
> > Add /sys/kernel/debug/lru_gen_full for debugging.
> >
> > [1] https://lore.kernel.org/lkml/20211130201652.2218636d@xxxxxxxxxxxxx/
> >
> > Signed-off-by: Yu Zhao <yuzhao@xxxxxxxxxx>
> > Tested-by: Konstantin Kharlamov <Hi-Angel@xxxxxxxxx>
> > ---
> > Documentation/vm/index.rst | 1 +
> > Documentation/vm/multigen_lru.rst | 62 +++++
>
> The description of user visible interfaces should go to
> Documentation/admin-guide/mm
>
> Documentation/vm/multigen_lru.rst should have contained design description
> and the implementation details and it would be great to actually have such
> document.

Will do, thanks.

> > include/linux/nodemask.h | 1 +
> > mm/vmscan.c | 415 ++++++++++++++++++++++++++++++
> > 4 files changed, 479 insertions(+)
> > create mode 100644 Documentation/vm/multigen_lru.rst
> >
> > diff --git a/Documentation/vm/index.rst b/Documentation/vm/index.rst
> > index 6f5ffef4b716..f25e755b4ff4 100644
> > --- a/Documentation/vm/index.rst
> > +++ b/Documentation/vm/index.rst
> > @@ -38,3 +38,4 @@ algorithms. If you are looking for advice on simply allocating memory, see the
> > unevictable-lru
> > z3fold
> > zsmalloc
> > + multigen_lru
> > diff --git a/Documentation/vm/multigen_lru.rst b/Documentation/vm/multigen_lru.rst
> > new file mode 100644
> > index 000000000000..6f9e0181348b
> > --- /dev/null
> > +++ b/Documentation/vm/multigen_lru.rst
> > @@ -0,0 +1,62 @@
> > +.. SPDX-License-Identifier: GPL-2.0
> > +
> > +=====================
> > +Multigenerational LRU
> > +=====================
> > +
> > +Quick start
> > +===========
> > +Runtime configurations
> > +----------------------
> > +:Required: Write ``1`` to ``/sys/kernel/mm/lru_gen/enable`` if the
> > + feature wasn't enabled by default.
>
> Required for what? This sentence seem to lack context. Maybe add an
> overview what is Multigenerational LRU so that users will have an idea what
> these knobs control.

Apparently I left an important part of this quick start in the next
patch, where Kconfig options are added. I'm wonder whether I should
squash the next patch into this one.

I always separate Kconfig changes and leave them in the last patch
because it gives me peace of mind knowing it'll never give any auto
bisectors a hard time.

But I saw people not following this practice, and I'm also tempted to
do so. Can anybody remind me whether it's considered a bad practice to
have code changes and Kconfig changes in the same patch?

> > +
> > +Recipes
> > +=======
>
> Some more context here will be also helpful.

Will do.

> > +Personal computers
> > +------------------
> > +:Thrashing prevention: Write ``N`` to
> > + ``/sys/kernel/mm/lru_gen/min_ttl_ms`` to prevent the working set of
> > + ``N`` milliseconds from getting evicted. The OOM killer is invoked if
> > + this working set can't be kept in memory. Based on the average human
> > + detectable lag (~100ms), ``N=1000`` usually eliminates intolerable
> > + lags due to thrashing. Larger values like ``N=3000`` make lags less
> > + noticeable at the cost of more OOM kills.
> > +
> > +Data centers
> > +------------
> > +:Debugfs interface: ``/sys/kernel/debug/lru_gen`` has the following
> > + format:
> > + ::
> > +
> > + memcg memcg_id memcg_path
> > + node node_id
> > + min_gen birth_time anon_size file_size
> > + ...
> > + max_gen birth_time anon_size file_size
> > +
> > + ``min_gen`` is the oldest generation number and ``max_gen`` is the
> > + youngest generation number. ``birth_time`` is in milliseconds.
> > + ``anon_size`` and ``file_size`` are in pages.
>
> And what does oldest and youngest generations mean from the user
> perspective?

Good question. Will add more details in the next spin.