Re: [PATCH v7] mm: Add memory allocation watchdog kernel thread.

From: Tetsuo Handa
Date: Fri Mar 10 2017 - 20:47:17 EST


Michal Hocko wrote:
> So, we have means to debug these issues. Some of them are rather coarse
> and your watchdog can collect much more and maybe give us a clue much
> quicker but we still have to judge whether all this is really needed
> because it doesn't come for free. Have you considered this aspect?

Sigh... You are ultimately ignoring the reality. Educating everybody to master
debugging tools does not come for free. If I liken your argumentation to
security modules, it looks like the following.

"There is already SELinux. SELinux can do everything. Thus, AppArmor is not needed.
I don't care about users/customers who cannot administrate SELinux."

The reality is different. We need tools which users/customers can afford using.
You had better getting away from existing debug tools which kernel developers
are using.

First of all, SysRq is an emergency tool and therefore it requires administrator's
intervention. Your argumentation sounds to me that "Give up debugging unless you
can sit on in front of console of Linux systems 24-7" which is already impossible.

SysRq-t cannot print seq= and delay= fields because information of in-flight allocation
request is not accessible from "struct task_struct", making extremely difficult to
judge whether progress is made when several SysRq-t snapshots are taken.

Also, year by year it is getting difficult to use vmcore for analysis because vmcore
might include sensitive data (even after filtering out user pages). I saw cases where
vmcore cannot be sent to support centers due to e.g. organization's information
control rules. Sometimes we have to analyze from only kernel messages. Some pieces of
information extracted by running scripts against /usr/bin/crash on cutomer's side
might be available, but in general we can't assume that the whole memory image which
includes whatever information is available.

In most cases, administrators can't capture even SysRq-t; let alone vmcore.
Therefore, automatic watchdog is highly appreciated. Have you considered this aspect?