Re: [v8 0/4] cgroup-aware OOM killer

From: Johannes Weiner
Date: Tue Sep 26 2017 - 13:26:28 EST


On Tue, Sep 26, 2017 at 03:30:40PM +0200, Michal Hocko wrote:
> On Tue 26-09-17 13:13:00, Roman Gushchin wrote:
> > On Tue, Sep 26, 2017 at 01:21:34PM +0200, Michal Hocko wrote:
> > > On Tue 26-09-17 11:59:25, Roman Gushchin wrote:
> > > > On Mon, Sep 25, 2017 at 10:25:21PM +0200, Michal Hocko wrote:
> > > > > On Mon 25-09-17 19:15:33, Roman Gushchin wrote:
> > > > > [...]
> > > > > > I'm not against this model, as I've said before. It feels logical,
> > > > > > and will work fine in most cases.
> > > > > >
> > > > > > In this case we can drop any mount/boot options, because it preserves
> > > > > > the existing behavior in the default configuration. A big advantage.
> > > > >
> > > > > I am not sure about this. We still need an opt-in, ragardless, because
> > > > > selecting the largest process from the largest memcg != selecting the
> > > > > largest task (just consider memcgs with many processes example).
> > > >
> > > > As I understand Johannes, he suggested to compare individual processes with
> > > > group_oom mem cgroups. In other words, always select a killable entity with
> > > > the biggest memory footprint.
> > > >
> > > > This is slightly different from my v8 approach, where I treat leaf memcgs
> > > > as indivisible memory consumers independent on group_oom setting, so
> > > > by default I'm selecting the biggest task in the biggest memcg.
> > >
> > > My reading is that he is actually proposing the same thing I've been
> > > mentioning. Simply select the biggest killable entity (leaf memcg or
> > > group_oom hierarchy) and either kill the largest task in that entity
> > > (for !group_oom) or the whole memcg/hierarchy otherwise.
> >
> > He wrote the following:
> > "So I'm leaning toward the second model: compare all oomgroups and
> > standalone tasks in the system with each other, independent of the
> > failed hierarchical control structure. Then kill the biggest of them."
>
> I will let Johannes to comment but I believe this is just a
> misunderstanding. If we compared only the biggest task from each memcg
> then we are basically losing our fairness objective, aren't we?

Sorry about the confusion.

Yeah I was making the case for what Michal proposed, to kill the
biggest terminal consumer, which is either a task or an oomgroup.

You'd basically iterate through all the tasks and cgroups in the
system and pick the biggest task that isn't in an oom group or the
biggest oom group and then kill that.

Yeah, you'd have to compare the memory footprints of tasks with the
memory footprints of cgroups. These aren't defined identically, and
tasks don't get attributed every type of allocation that a cgroup
would. But it should get us in the ballpark, and I cannot picture a
scenario where this would lead to a completely undesirable outcome.