Re: [v8 0/4] cgroup-aware OOM killer

From: Roman Gushchin
Date: Wed Sep 13 2017 - 17:57:00 EST


On Wed, Sep 13, 2017 at 02:29:14PM +0200, Michal Hocko wrote:
> On Mon 11-09-17 13:44:39, David Rientjes wrote:
> > On Mon, 11 Sep 2017, Roman Gushchin wrote:
> >
> > > This patchset makes the OOM killer cgroup-aware.
> > >
> > > v8:
> > > - Do not kill tasks with OOM_SCORE_ADJ -1000
> > > - Make the whole thing opt-in with cgroup mount option control
> > > - Drop oom_priority for further discussions
> >
> > Nack, we specifically require oom_priority for this to function correctly,
> > otherwise we cannot prefer to kill from low priority leaf memcgs as
> > required.
>
> While I understand that your usecase might require priorities I do not
> think this part missing is a reason to nack the cgroup based selection
> and kill-all parts. This can be done on top. The only important part
> right now is the current selection semantic - only leaf memcgs vs. size
> of the hierarchy).

I agree.

> I strongly believe that comparing only leaf memcgs
> is more straightforward and it doesn't lead to unexpected results as
> mentioned before (kill a small memcg which is a part of the larger
> sub-hierarchy).

One of two main goals of this patchset is to introduce cgroup-level
fairness: bigger cgroups should be affected more than smaller,
despite the size of tasks inside. I believe the same principle
should be used for cgroups.

Also, the opposite will make oom_semantics more weird: it will mean
kill all tasks, but also treat memcg as a leaf cgroup.

>
> I didn't get to read the new version of this series yet and hope to get
> to it soon.

Thanks!