Re: user defined OOM policies

From: Vladimir Murzin
Date: Wed Nov 20 2013 - 12:22:01 EST

Next message: Thomas Gleixner: "Re: [PATCH 3/7] idle, thermal, acpi: Remove home grown idleimplementations"
Previous message: Jacob Pan: "Re: [PATCH 0/7] Cure some vaux idle wrackage"
In reply to: Vladimir Murzin: "Re: user defined OOM policies"
Next in thread: Michal Hocko: "Re: user defined OOM policies"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Tue, Nov 19, 2013 at 02:40:07PM +0100, Michal Hocko wrote:
Hi Michal
> On Tue 19-11-13 14:14:00, Michal Hocko wrote:
> [...]
> > We have basically ended up with 3 options AFAIR:
> > 1) allow memcg approach (memcg.oom_control) on the root level
> > for both OOM notification and blocking OOM killer and handle
> > the situation from the userspace same as we can for other
> > memcgs.
>
> This looks like a straightforward approach as the similar thing is done
> on the local (memcg) level. There are several problems though.
> Running userspace from within OOM context is terribly hard to do
> right. This is true even in the memcg case and we strongly discurage
> users from doing that. The global case has nothing like outside of OOM
> context though. So any hang would blocking the whole machine. Even
> if the oom killer is careful and locks in all the resources it would
> have hard time to query the current system state (existing processes
> and their states) without any allocation. There are certain ways to
> workaround these issues - e.g. give the killer access to memory reserves
> - but this all looks scary and fragile.
>
> > 2) allow modules to hook into OOM killer path and take the
> > appropriate action.
>
> This already exists actually. There is oom_notify_list callchain and
> {un}register_oom_notifier that allow modules to hook into oom and
> skip the global OOM if some memory is freed. There are currently only
> s390 and powerpc which seem to abuse it for something that looks like a
> shrinker except it is done in OOM path...
>
> I think the interface should be changed if something like this would be
> used in practice. There is a lot of information lost on the way. I would
> basically expect to get everything that out_of_memory gets.

Some time ago I was trying to hook OOM with custom module based policy. I
needed to select process based on uss/pss values which required page walking
(yes, I know it is extremely expensive, but sometimes I'd pay the bill). The
learned lesson is quite simple - it is harmful to expose (all?) internal
functions and locking into modules - the result is going to be completely
unreliable and non predictable mess, unless the well defined interface and
helpers will be established.

>
> > 3) create a generic filtering mechanism which could be
> > controlled from the userspace by a set of rules (e.g.
> > something analogous to packet filtering).
>
> This looks generic enough but I have no idea about the complexity.

Never thought about it, but just wonder which input and output supposed to
have for this filtering mechanism?

Vladimir
> --
> Michal Hocko
> SUSE Labs
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@xxxxxxxxxx For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Thomas Gleixner: "Re: [PATCH 3/7] idle, thermal, acpi: Remove home grown idleimplementations"
Previous message: Jacob Pan: "Re: [PATCH 0/7] Cure some vaux idle wrackage"
In reply to: Vladimir Murzin: "Re: user defined OOM policies"
Next in thread: Michal Hocko: "Re: user defined OOM policies"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]