Re: mm: 5.16 regression: reclaim_throttle leads to stall in near-OOM conditions

From: Mel Gorman
Date: Wed Nov 24 2021 - 05:36:02 EST


On Wed, Nov 24, 2021 at 01:19:54AM +0900, Alexey Avramov wrote:
> I found stalls in near-OOM conditions with Linux 5.16. This is not the
> hang-up that was reported by Artem S. Tashkinov in 2019 [1]. It's a *new*
> regression. I will demonstrate this with one simple experiment, which I
> will reproduce with different kernels or settings.
>
> With older versions of the kernel, running the `tail /dev/zero` command
> usually quickly leads to OOM condition.
>
> I will run the command `for i in {1...3}; do tail /dev/zero; done` and log
> PSI metrics (using psi2log script from nohang v0.2.0 [2]) and some values
> from `/proc/meminfo` (using mem2log v0.1.0 [3]) while this command is
> running. During the experiment a single tab browser will be kept opened in
> which some video will be playing.
>

Ok, I can reproduce this. However, it does eventually get killed OOM so
the system makes progress but maybe the throttling should be for very
short intervals if failing to make progress and there have been multiple
reclaim failures recently. Disabling the throttling entirely just results
in cases where 100% CPU is used spinning through lru lists.

Thanks for the report

--
Mel Gorman
SUSE Labs