Re: [v8 3/4] mm, oom: add cgroup v2 mount option for cgroup-aware OOM killer

From: Michal Hocko
Date: Wed Sep 13 2017 - 08:23:21 EST


On Tue 12-09-17 21:01:15, Roman Gushchin wrote:
> On Mon, Sep 11, 2017 at 01:48:39PM -0700, David Rientjes wrote:
> > On Mon, 11 Sep 2017, Roman Gushchin wrote:
> >
> > > Add a "groupoom" cgroup v2 mount option to enable the cgroup-aware
> > > OOM killer. If not set, the OOM selection is performed in
> > > a "traditional" per-process way.
> > >
> > > The behavior can be changed dynamically by remounting the cgroupfs.
> >
> > I can't imagine that Tejun would be happy with a new mount option,
> > especially when it's not required.
> >
> > OOM behavior does not need to be defined at mount time and for the entire
> > hierarchy. It's possible to very easily implement a tunable as part of
> > mem cgroup that is propagated to descendants and controls the oom scoring
> > behavior for that hierarchy. It does not need to be system wide and
> > affect scoring of all processes based on which mem cgroup they are
> > attached to at any given time.
>
> No, I don't think that mixing per-cgroup and per-process OOM selection
> algorithms is a good idea.
>
> So, there are 3 reasonable options:
> 1) boot option
> 2) sysctl
> 3) cgroup mount option
>
> I believe, 3) is better, because it allows changing the behavior dynamically,
> and explicitly depends on v2 (what sysctl lacks).

I see your argument here. I would just be worried that we end up really
needing more oom strategies in future and those wouldn't fit into memcg
mount option scope. So 1/2 sounds more exensible to me long term. Boot
time would be easier because we do not have to bother dynamic selection
in that case.

> So, the only question is should it be opt-in or opt-out option.
> Personally, I would prefer opt-out, but Michal has a very strong opinion here.

Yes I still strongly believe this has to be opt-in.
--
Michal Hocko
SUSE Labs