Re: [PATCH 4/4] memcg: synchronously enforce memory.high

From: Roman Gushchin
Date: Thu Feb 10 2022 - 18:29:41 EST


On Thu, Feb 10, 2022 at 02:22:36PM -0800, Shakeel Butt wrote:
> On Thu, Feb 10, 2022 at 12:15 PM Roman Gushchin <guro@xxxxxx> wrote:
> >
> [...]
> >
> > Has this approach been extensively tested in the production?
> >
> > Injecting sleeps at return-to-userspace moment is safe in terms of priority
> > inversions: a slowed down task will unlikely affect the rest of the system.
> >
> > It way less predictable for a random allocation in the kernel mode, what if
> > the task is already holding a system-wide resource?
> >
> > Someone might argue that it's not better than a system-wide memory shortage
> > and the same allocation might go into a direct reclaim anyway, but with
> > the way how memory.high is used it will happen way more often.
> >
>
> Thanks for the review.
>
> This patchset is tested in the test environment for now and I do plan
> to test this in production but that is a slow process and will take
> some time.
>
> Let me answer the main concern you have raised i.e. the safety of
> throttling a task synchronously in the charge code path. Please note
> that synchronous memory reclaim and oom-killing can already cause the
> priority inversion issues you have mentioned. The way we usually
> tackle such issues are through userspace controllers. For example oomd
> is the userspace solution for catering such issues related to
> oom-killing. Here we have a similar userspace daemon monitoring the
> workload and deciding if it should let the workload grow or kill it.
>
> Now should we keep the current high limit enforcement implementation
> and let it be ineffective for some real workloads or should we make
> the enforcement more robust and let the userspace tackle some corner
> case priority inversion issues. I think we should follow the second
> option as we already have precedence of doing the same for reclaim and
> oom-killing.

Well, in a theory it sounds good and I have no intention to oppose the
idea. However in practice we might easily get quite serious problems.
So I think we should be extra careful here. In the end we don't want to
pull and then revert this patch.

The difference between the system-wide direct reclaim and this case is that
usually kswapd is doing a good job of refilling the empty buffer, so we don't
usually work in the circumstances of the global memory shortage. And when we do,
often it's not working out quite well, this is why oomd and other similar
solutions are required.

Another option is to use your approach only for special cases (e.g. huge
allocations) and keep the existing approach for most other allocations.

Thanks!