Re: [PATCH] memcg, kmem: do not fail __GFP_NOFAIL charges

From: Shakeel Butt
Date: Thu Sep 12 2019 - 22:46:40 EST


On Wed, Sep 11, 2019 at 8:16 AM Michal Hocko <mhocko@xxxxxxxxxx> wrote:
>
> On Wed 11-09-19 07:37:40, Andrew Morton wrote:
> > On Wed, 11 Sep 2019 14:00:02 +0200 Michal Hocko <mhocko@xxxxxxxxxx> wrote:
> >
> > > On Mon 09-09-19 13:22:45, Michal Hocko wrote:
> > > > On Fri 06-09-19 11:24:55, Shakeel Butt wrote:
> > > [...]
> > > > > I wonder what has changed since
> > > > > <http://lkml.kernel.org/r/20180525185501.82098-1-shakeelb@xxxxxxxxxx/>.
> > > >
> > > > I have completely forgot about that one. It seems that we have just
> > > > repeated the same discussion again. This time we have a poor user who
> > > > actually enabled the kmem limit.
> > > >
> > > > I guess there was no real objection to the change back then. The primary
> > > > discussion revolved around the fact that the accounting will stay broken
> > > > even when this particular part was fixed. Considering this leads to easy
> > > > to trigger crash (with the limit enabled) then I guess we should just
> > > > make it less broken and backport to stable trees and have a serious
> > > > discussion about discontinuing of the limit. Start by simply failing to
> > > > set any limit in the current upstream kernels.
> > >
> > > Any more concerns/objections to the patch? I can add a reference to your
> > > earlier post Shakeel if you want or to credit you the way you prefer.
> > >
> > > Also are there any objections to start deprecating process of kmem
> > > limit? I would see it in two stages
> > > - 1st warn in the kernel log
> > > pr_warn("kmem.limit_in_bytes is deprecated and will be removed.
> > > "Please report your usecase to linux-mm@xxxxxxxxx if you "
> > > "depend on this functionality."
> >
> > pr_warn_once() :)
> >
> > > - 2nd fail any write to kmem.limit_in_bytes
> > > - 3rd remove the control file completely
> >
> > Sounds good to me.
>
> Here we go
>
> From 512822e551fe2960040c23b12c7b27a5fdab9013 Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@xxxxxxxx>
> Date: Wed, 11 Sep 2019 17:02:33 +0200
> Subject: [PATCH] memcg, kmem: deprecate kmem.limit_in_bytes
>
> Cgroup v1 memcg controller has exposed a dedicated kmem limit to users
> which turned out to be really a bad idea because there are paths which
> cannot shrink the kernel memory usage enough to get below the limit
> (e.g. because the accounted memory is not reclaimable). There are cases
> when the failure is even not allowed (e.g. __GFP_NOFAIL). This means
> that the kmem limit is in excess to the hard limit without any way to
> shrink and thus completely useless. OOM killer cannot be invoked to
> handle the situation because that would lead to a premature oom killing.
>
> As a result many places might see ENOMEM returning from kmalloc and
> result in unexpected errors. E.g. a global OOM killer when there is a
> lot of free memory because ENOMEM is translated into VM_FAULT_OOM in #PF
> path and therefore pagefault_out_of_memory would result in OOM killer.
>
> Please note that the kernel memory is still accounted to the overall
> limit along with the user memory so removing the kmem specific limit
> should still allow to contain kernel memory consumption. Unlike the kmem
> one, though, it invokes memory reclaim and targeted memcg oom killing if
> necessary.
>
> Start the deprecation process by crying to the kernel log. Let's see
> whether there are relevant usecases and simply return to EINVAL in the
> second stage if nobody complains in few releases.
>
> Signed-off-by: Michal Hocko <mhocko@xxxxxxxx>

Reviewed-by: Shakeel Butt <shakeelb@xxxxxxxxxx>

> ---
> Documentation/admin-guide/cgroup-v1/memory.rst | 3 +++
> mm/memcontrol.c | 3 +++
> 2 files changed, 6 insertions(+)
>
> diff --git a/Documentation/admin-guide/cgroup-v1/memory.rst b/Documentation/admin-guide/cgroup-v1/memory.rst
> index 41bdc038dad9..e53fc2f31549 100644
> --- a/Documentation/admin-guide/cgroup-v1/memory.rst
> +++ b/Documentation/admin-guide/cgroup-v1/memory.rst
> @@ -87,6 +87,9 @@ Brief summary of control files.
> node
>
> memory.kmem.limit_in_bytes set/show hard limit for kernel memory
> + This knob is deprecated it shouldn't be
> + used. It is planned to be removed in
> + a foreseeable future.
> memory.kmem.usage_in_bytes show current kernel memory allocation
> memory.kmem.failcnt show the number of kernel memory usage
> hits limits
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index e18108b2b786..113969bc57e8 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -3518,6 +3518,9 @@ static ssize_t mem_cgroup_write(struct kernfs_open_file *of,
> ret = mem_cgroup_resize_max(memcg, nr_pages, true);
> break;
> case _KMEM:
> + pr_warn_once("kmem.limit_in_bytes is deprecated and will be removed. "
> + "Please report your usecase to linux-mm@xxxxxxxxx if you "
> + "depend on this functionality.\n");
> ret = memcg_update_kmem_max(memcg, nr_pages);
> break;
> case _TCP:
> --
> 2.20.1
>
>
> --
> Michal Hocko
> SUSE Labs