Re: [RFC] oom-kill: give the dying task a higher priority

From: KAMEZAWA Hiroyuki
Date: Mon May 31 2010 - 03:30:26 EST


On Mon, 31 May 2010 16:05:48 +0900
Minchan Kim <minchan.kim@xxxxxxxxx> wrote:

> On Mon, May 31, 2010 at 3:35 PM, KOSAKI Motohiro
> <kosaki.motohiro@xxxxxxxxxxxxxx> wrote:
> > Hi
> >
> >> Hi, Kosaki.
> >>
> >> On Sat, May 29, 2010 at 12:59 PM, KOSAKI Motohiro
> >> <kosaki.motohiro@xxxxxxxxxxxxxx> wrote:
> >> > Hi
> >> >
> >> >> oom-killer: give the dying task rt priority (v3)
> >> >>
> >> >> Give the dying task RT priority so that it can be scheduled quickly and die,
> >> >> freeing needed memory.
> >> >>
> >> >> Signed-off-by: Luis Claudio R. GonÃalves <lgoncalv@xxxxxxxxxx>
> >> >
> >> > Almostly acceptable to me. but I have two requests,
> >> >
> >> > - need 1) force_sig() 2)sched_setscheduler() order as Oleg mentioned
> >> > - don't boost priority if it's in mem_cgroup_out_of_memory()
> >>
> >> Why do you want to not boost priority if it's path of memcontrol?
> >>
> >> If it's path of memcontrol and CONFIG_CGROUP_MEM_RES_CTLR is enabled,
> >> mem_cgroup_out_of_memory will select victim task in memcg.
> >> So __oom_kill_task's target task would be in memcg, I think.
> >
> > Yep.
> > But priority boost naturally makes CPU starvation for out of the group
> > processes.
> > It seems to break cgroup's isolation concept.
> >
> >> As you and memcg guys don't complain this, I would be missing something.
> >> Could you explain it? :)
> >
> > So, My points are,
> >
> > 1) Usually priority boost is wrong idea. It have various side effect, but
> > Â system wide OOM is one of exception. In such case, all tasks aren't
> > Â runnable, then, the downside is acceptable.
> > 2) memcg have OOM notification mechanism. If the admin need priority boost,
> > Â they can do it by their OOM-daemon.
>
> Is it possible kill the hogging task immediately when the daemon send
> kill signal?
> I mean we can make OOM daemon higher priority than others and it can
> send signal to normal process. but when is normal process exited after
> receiving kill signal from OOM daemon? Maybe it's when killed task is
> executed by scheduler. It's same problem again, I think.
>
> Kame, Do you have an idea?
>
This is just an idea and I have no implementaion, yet.

With memcg, oom situation can be recovered by "enlarging limit temporary".
Then, what the daemon has to do is

1. send signal (kill or other signal to abort for coredump.)
2. move a problematic task to a jail if necessary.
3. enlarge limit for indicating "Go"
4. After stabilization, reduce the limit.

This is the fastest. Admin has to think of extra-room or jails and
the daemon should be enough clever. But in most case, I think this works well.

Thanks,
-Kame

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/