Re: [RFC PATCH] mm, oom: disable dump_tasks by default

From: Tetsuo Handa
Date: Thu Sep 05 2019 - 09:40:14 EST


On 2019/09/05 5:04, David Rientjes wrote:
> On Wed, 4 Sep 2019, Michal Hocko wrote:
>
>>>> It's primary purpose is
>>>> to help analyse oom victim selection decision.
>>>
>>> I disagree, for I use the process list for understanding what / how many
>>> processes are consuming what kind of memory (without crashing the system)
>>> for anomaly detection purpose. Although we can't dump memory consumed by
>>> e.g. file descriptors, disabling dump_tasks() loose that clue, and is
>>> problematic for me.
>>
>> Does anything really prevent you from enabling this by sysctl though? Or
>> do you claim that this is a general usage pattern and therefore the
>> default change is not acceptable or do you want a changelog to be
>> updated?
>>
>
> I think the motivation is that users don't want to need to reproduce an
> oom kill to figure out why: they want to be able to figure out which
> process had higher than normal memory usage.

Right. Users can't reproduce an OOM kill to figure out why. Those who do
failure analysis for users (e.g. technical staff at support center) need to
figure out system's state when an OOM kill happened. And installing dynamic
hooks like SystemTap / eBPF is hardly acceptable for users.

What I don't like is that Michal refuses to solve OOM stalling problem,
does not try to understand how difficult to avoid problems caused by
thoughtless printk(), and instead recommending to disable oom_dump_tasks.

There is nothing that prevents users from enabling oom_dump_tasks by sysctl.
But that requires a solution for OOM stalling problem. Since I know how
difficult to avoid problems caused by printk() flooding, I insist that
we need "mm,oom: Defer dump_tasks() output." patch.