Re: [patch] mm, hugetlb_cgroup: suppress SIGBUS when hugetlb_cgroup charge fails

From: Michal Hocko
Date: Mon May 28 2018 - 11:59:08 EST


On Fri 25-05-18 15:18:11, David Rientjes wrote:
[...]
> Let's see what Mike and Aneesh say, because they may object to using
> VM_FAULT_OOM because there's no way to guarantee that we'll come under the
> limit of hugetlb_cgroup as a result of the oom. My assumption is that we
> use VM_FAULT_SIGBUS since oom killing will not guarantee that the
> allocation can succeed.

Yes. And the lack of hugetlb awareness in the oom killer is another
reason. There is absolutely no reason to kill a task when somebody
misconfigured the hugetlb pool.

> But now a process can get a SIGBUS if its hugetlb
> pages are not allocatable or its under a limit imposed by hugetlb_cgroup
> that it's not aware of. Faulting hugetlb pages is certainly risky
> business these days...

It's always been and I am afraid it will always be unless somebody
simply reimplements the current code to be NUMA aware for example (it is
just too easy to drain a per NODE reserves...).

> Perhaps the optimal solution for reaching hugetlb_cgroup limits is to
> induce an oom kill from within the hugetlb_cgroup itself? Otherwise the
> unlucky process to fault their hugetlb pages last gets SIGBUS.

Hmm, so you expect that the killed task would simply return pages to the
pool? Wouldn't that require to have a hugetlb cgroup OOM killer that
would only care about hugetlb reservations of tasks? Is that worth all
the effort and the additional code?
--
Michal Hocko
SUSE Labs