Re: [PATCH] memcg: provide reclaim stats via 'memory.reclaim'

From: Yosry Ahmed
Date: Wed May 18 2022 - 18:47:08 EST


On Wed, May 18, 2022 at 3:38 PM Vaibhav Jain <vaibhav@xxxxxxxxxxxxx> wrote:
>
> [1] Provides a way for user-space to trigger proactive reclaim by introducing
> a write-only memcg file 'memory.reclaim'. However reclaim stats like number
> of pages scanned and reclaimed is still not directly available to the
> user-space.
>
> This patch proposes to extend [1] to make the memcg file 'memory.reclaim'
> readable which returns the number of pages scanned / reclaimed during the
> reclaim process from 'struct vmpressure' associated with each memcg. This should
> let user-space asses how successful proactive reclaim triggered from memcg
> 'memory.reclaim' was ?

Isn't this a racy read? struct vmpressure can be changed between the
write and read by other reclaim operations, right?

I was actually planning to send a patch that does not updated
vmpressure for user-controller reclaim, similar to how PSI is handled.

The interface currently returns -EBUSY if the entire amount was not
reclaimed, so isn't this enough to figure out if it was successful or
not? If not, we can store the scanned / reclaim counts of the last
memory.reclaim invocation for the sole purpose of memory.reclaim
reads. Maybe it is actually more intuitive to users to just read the
amount of memory read? In a format that is similar to the one written?

i.e
echo "10M" > memory.reclaim
cat memory.reclaim
9M

>
> With the patch following command flow is expected:
>
> # echo "1M" > memory.reclaim
>
> # cat memory.reclaim
> scanned 76
> reclaimed 32
>
> [1]: https://lore.kernel.org/r/20220425190040.2475377-1-yosryahmed@xxxxxxxxxx
>
> Cc: Shakeel Butt <shakeelb@xxxxxxxxxx>
> Cc: Yosry Ahmed <yosryahmed@xxxxxxxxxx>
> Signed-off-by: Vaibhav Jain <vaibhav@xxxxxxxxxxxxx>
> ---
> Documentation/admin-guide/cgroup-v2.rst | 15 ++++++++++++---
> mm/memcontrol.c | 14 ++++++++++++++
> 2 files changed, 26 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> index 27ebef2485a3..44610165261d 100644
> --- a/Documentation/admin-guide/cgroup-v2.rst
> +++ b/Documentation/admin-guide/cgroup-v2.rst
> @@ -1209,18 +1209,27 @@ PAGE_SIZE multiple when read back.
> utility is limited to providing the final safety net.
>
> memory.reclaim
> - A write-only nested-keyed file which exists for all cgroups.
> + A nested-keyed file which exists for all cgroups.
>
> - This is a simple interface to trigger memory reclaim in the
> - target cgroup.
> + This is a simple interface to trigger memory reclaim and retrieve
> + reclaim stats in the target cgroup.
>
> This file accepts a single key, the number of bytes to reclaim.
> No nested keys are currently supported.
>
> + Reading the file returns number of pages scanned and number of
> + pages reclaimed from the memcg. This information fetched from
> + vmpressure info associated with each cgroup.
> +
> Example::
>
> echo "1G" > memory.reclaim
>
> + cat memory.reclaim
> +
> + scanned 78
> + reclaimed 30
> +
> The interface can be later extended with nested keys to
> configure the reclaim behavior. For example, specify the
> type of memory to reclaim from (anon, file, ..).
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 2e2bfbed4717..9e43580a8726 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -6423,6 +6423,19 @@ static ssize_t memory_oom_group_write(struct kernfs_open_file *of,
> return nbytes;
> }
>
> +static int memory_reclaim_show(struct seq_file *m, void *v)
> +{
> + struct mem_cgroup *memcg = mem_cgroup_from_seq(m);
> + struct vmpressure *vmpr = memcg_to_vmpressure(memcg);
> +
> + spin_lock(&vmpr->sr_lock);
> + seq_printf(m, "scanned %lu\nreclaimed %lu\n",
> + vmpr->scanned, vmpr->reclaimed);
> + spin_unlock(&vmpr->sr_lock);
> +
> + return 0;
> +}
> +
> static ssize_t memory_reclaim(struct kernfs_open_file *of, char *buf,
> size_t nbytes, loff_t off)
> {
> @@ -6525,6 +6538,7 @@ static struct cftype memory_files[] = {
> .name = "reclaim",
> .flags = CFTYPE_NS_DELEGATABLE,
> .write = memory_reclaim,
> + .seq_show = memory_reclaim_show,
> },
> { } /* terminate */
> };
> --
> 2.35.1
>