Re: [PATCH v4] mm/page_alloc: bail out on fatal signal during reclaim/compaction retry attempt

From: Michal Hocko
Date: Mon May 31 2021 - 09:29:45 EST


On Mon 31-05-21 13:35:31, Vlastimil Babka wrote:
> On 5/31/21 1:33 PM, Michal Hocko wrote:
> > On Thu 20-05-21 15:29:01, Aaron Tomlin wrote:
> >> A customer experienced a low-memory situation and decided to issue a
> >> SIGKILL (i.e. a fatal signal). Instead of promptly terminating as one
> >> would expect, the aforementioned task remained unresponsive.
> >>
> >> Further investigation indicated that the task was "stuck" in the
> >> reclaim/compaction retry loop. Now, it does not make sense to retry
> >> compaction when a fatal signal is pending.
> >
> > Is this really true in general? The memory reclaim is retried even when
> > fatal signals are pending. Why should be compaction different? I do
> > agree that retrying way too much is bad but is there any reason why this
> > special case doesn't follow the max retry logic?
>
> Compaction doesn't do anything if fatal signal is pending, it bails out
> immediately and the checks are rather frequent. So why retry?

OK, I was not aware of that and it would be helpful to have that
mentioned in the changelog.

--
Michal Hocko
SUSE Labs