Re: [RFC] [PATCH] Cgroup based OOM killer controller

From: David Rientjes
Date: Tue Jan 27 2009 - 05:53:59 EST


On Tue, 27 Jan 2009, Nikanth Karthikesan wrote:

> > As previously stated, I think the heuristic to penalize tasks for not
> > having an intersection with the set of allowable nodes of the oom
> > triggering task could be made slightly more severe. That's irrelevant to
> > your patch, though.
> >
>
> But the heuristic makes it non-deterministic, unlike memcg case. And this
> mandates special handling for cpuset constrained OOM conditions in this patch.
>

Dividing a badness score by 8 if a task's set of allowable nodes do not
insect with the oom triggering task's set does not make an otherwise
deterministic algorithm non-deterministic.

I don't understand what you're arguing for here. Are you suggesting that
we should not prefer tasks that intersect the set of allowable nodes?
That makes no sense if the goal is to allow for future memory freeing.

> > We also talked about a cgroup /dev/mem_notify device file that you can
> > poll() and learn of low memory situations so that appropriate action can
> > be taken even in lowmem situations as opposed to simply oom conditions.
> >
>
> Userspace also needs to handle the cpuset constrained _almost-oom_'s
> specially? I wonder how easily userspace can handle that.
>

It handles it very well, cpusets are a client of cgroups and the
/dev/mem_notify extension would be as well. So to handle lowmem
notifications for your cpuset, you would mount both cgroup subsystems at
the same time and then poll() on the mem_notify file. It would be
responsible for the aggregate of tasks that the cgroup represents.

If lowmem notifications are implemented in the reclaim path, this is much
easier for cpusets than for the memory controller, actually, since we
already collect per-node ZVC information.

> > These types of policy decisions belong in userspace.
>
> Yes, policy decisions will be made in user-space using this oom-controller.
> This is just a framework/means to enforce policies. We do not make any
> decisions inside the kernel.
>
> But yes, the badness calculation by the oom killer implements some kind of
> policy inside the kernel, but I guess it can stay, as this oom-controller lets
> user policy over-ride kernel policy. ;-)
>

The goal should not be to override the kernel's choice, because that
decision depends heavily on the type of oom and the state of the machine
at the time. Appropriate changes to the oom killer's heuristics are
always welcome; in my opinion, we should probably penalize tasks that do
not intersect the triggering task's set of allowable nodes more than we
currently do.

I think you'll find that your goals can be accomplished with a mem_notify
cgroup and that it is a much more powerful interface so that your
userspace policy can be better informed, especially if it is aware of
lowmem situations where oom conditions are imminent.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/