Re: [PATCH v8] oom_kill.c: futex: Don't OOM reap the VMA containing the robust_list_head

From: Nico Pache
Date: Fri Apr 08 2022 - 04:52:46 EST




On 4/8/22 04:37, Thomas Gleixner wrote:
> On Fri, Apr 08 2022 at 10:15, Peter Zijlstra wrote:
>> On Thu, Apr 07, 2022 at 11:28:09PM -0400, Nico Pache wrote:
>>> Theoretically a failure can still occur if there are locks mapped as
>>> PRIVATE|ANON; however, the robust futexes are a best-effort approach.
>>> This patch only strengthens that best-effort.
>>>
>>> The following case can still fail:
>>> robust head (skipped) -> private lock (reaped) -> shared lock
>>> (skipped)
>>
>> This is still all sorts of confused.. it's a list head, the entries can
>> be in any random other VMA. You must not remove *any* user memory before
>> doing the robust thing. Not removing the VMA that contains the head is
>> pointless in the extreme.
>>
>> Did you not read the previous discussion?
>
> Aside of that we all agreed that giving a oom-killed task time to
> cleanup itself instead of brute force cleaning it up immediately, which
> is the real problem here. Can we fix that first before adding broken
> heuristics?
We've tried multiple approaches to reproduce the case you are talking about with
no success...

Why make a change for something that we cant reproduce when we are sure this
works for all the cases we've attempted.

I also dont see how this a broken heuristic... If anything adding a delay is
broken. How do we assure the delay is long enough for the exit to clean up the
futexes? In a heavily contended CPU with high memory pressure the delay may also
lead to other processes unnecessarily OOMing.

Cheers,
-- Nico

>
> Thanks,
>
> tglx
>
>