Re: [RFC] oom-kill: give the dying task a higher priority

From: Minchan Kim
Date: Mon May 31 2010 - 05:30:56 EST


On Mon, May 31, 2010 at 4:25 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@xxxxxxxxxxxxxx> wrote:
> On Mon, 31 May 2010 16:05:48 +0900
> Minchan Kim <minchan.kim@xxxxxxxxx> wrote:
>
>> On Mon, May 31, 2010 at 3:35 PM, KOSAKI Motohiro
>> <kosaki.motohiro@xxxxxxxxxxxxxx> wrote:
>> > Hi
>> >
>> >> Hi, Kosaki.
>> >>
>> >> On Sat, May 29, 2010 at 12:59 PM, KOSAKI Motohiro
>> >> <kosaki.motohiro@xxxxxxxxxxxxxx> wrote:
>> >> > Hi
>> >> >
>> >> >> oom-killer: give the dying task rt priority (v3)
>> >> >>
>> >> >> Give the dying task RT priority so that it can be scheduled quickly and die,
>> >> >> freeing needed memory.
>> >> >>
>> >> >> Signed-off-by: Luis Claudio R. GonÃalves <lgoncalv@xxxxxxxxxx>
>> >> >
>> >> > Almostly acceptable to me. but I have two requests,
>> >> >
>> >> > - need 1) force_sig() 2)sched_setscheduler() order as Oleg mentioned
>> >> > - don't boost priority if it's in mem_cgroup_out_of_memory()
>> >>
>> >> Why do you want to not boost priority if it's path of memcontrol?
>> >>
>> >> If it's path of memcontrol and CONFIG_CGROUP_MEM_RES_CTLR is enabled,
>> >> mem_cgroup_out_of_memory will select victim task in memcg.
>> >> So __oom_kill_task's target task would be in memcg, I think.
>> >
>> > Yep.
>> > But priority boost naturally makes CPU starvation for out of the group
>> > processes.
>> > It seems to break cgroup's isolation concept.
>> >
>> >> As you and memcg guys don't complain this, I would be missing something.
>> >> Could you explain it? :)
>> >
>> > So, My points are,
>> >
>> > 1) Usually priority boost is wrong idea. It have various side effect, but
>> > Â system wide OOM is one of exception. In such case, all tasks aren't
>> > Â runnable, then, the downside is acceptable.
>> > 2) memcg have OOM notification mechanism. If the admin need priority boost,
>> > Â they can do it by their OOM-daemon.
>>
>> Is it possible kill the hogging task immediately when the daemon send
>> kill signal?
>> I mean we can make OOM daemon higher priority than others and it can
>> send signal to normal process. but when is normal process exited after
>> receiving kill signal from OOM daemon? Maybe it's when killed task is
>> executed by scheduler. It's same problem again, I think.
>>
>> Kame, Do you have an idea?
>>
> This is just an idea and I have no implementaion, yet.
>
> With memcg, oom situation can be recovered by "enlarging limit temporary".
> Then, what the daemon has to do is
>
> Â1. send signal (kill or other signal to abort for coredump.)
> Â2. move a problematic task to a jail if necessary.
> Â3. enlarge limit for indicating "Go"
> Â4. After stabilization, reduce the limit.
>
> This is the fastest. Admin has to think of extra-room or jails and
> the daemon should be enough clever. But in most case, I think this works well.

I think it is very hard that how much we have to make extra-room since
we can't expect how many tasks are stuck to allocate memory.
But tend to agree that system-wide OOM problem is more important than
memcg's one.
And memcg's guy doesn't seem to have any problem. So I am not against
this patch any more.

Thanks, Kosaki and Kame.

> Thanks,
> -Kame
>
>



--
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/