Re: [patch -mmotm] mm: invoke oom killer for __GFP_NOFAIL

From: Mel Gorman
Date: Mon May 11 2009 - 17:32:37 EST


On Mon, May 11, 2009 at 11:00:44PM +0900, Minchan Kim wrote:
> Hi, Mel.
>
> On Mon, May 11, 2009 at 10:38 PM, Mel Gorman <mel@xxxxxxxxx> wrote:
> > On Mon, May 11, 2009 at 08:21:21PM +0900, Minchan Kim wrote:
> >> On Mon, May 11, 2009 at 6:12 PM, Minchan Kim <minchan.kim@xxxxxxxxx> wrote:
> >> > On Mon, May 11, 2009 at 5:40 PM, David Rientjes <rientjes@xxxxxxxxxx> wrote:
> >> >> On Mon, 11 May 2009, Minchan Kim wrote:
> >> >>
> >> >>> Hmm.. if __alloc_pages_may_oom fail to allocate free page due to order > PAGE_ALLOC_COSTRY_ORDER,
> >> >>>
> >> >>> It will go to nopage label in __alloc_pages_slowpath.
> >> >>> Then it will show the page allocation failure warning and will return.
> >> >>> Retrying depends on caller.
> >> >>>
> >> >>
> >> >> Correct.
> >> >>
> >> >>> So, I think it won't loop forever.
> >> >>> Do I miss something ?
> >> >>>
> >> >>
> >> >> __GFP_NOFAIL allocations shouldn't fail, that's the point of the gfp flag.
> >> >> So failing without attempting to free some memory is the wrong thing to
> >> >> do.
> >> >
> >> > Thanks for quick reply.
> >> > I was confused by your description.
> >> > I thought you suggested we have to prevent loop forever.
> >> >
> >> >>
> >> >>> In addition, the OOM killer can help for getting the high order pages ?
> >> >>>
> >> >>
> >> >> Sure, if it selects a task that will free a lot of memory, which is it's
> >> >> goal.
> >> >>
> >> >
> >> > How do we know any task have a lot of memory ?
> >> > If we select wrong task and kill one ?
> >> >
> >> > I have a concern about innocent task.
> >>
> >> Now, I look over __out_of_memory.
> >> For selecting better tasks in case of PAGE_ALLOC_COSTRY_ORDER, How
> >> about increasing score of task which have VM_HUGETLB vma in badness ?
> >>
> >
> > That is unjustified. It penalises a process even if it only allocated one
> > hugepage and it is not a reflection of how much memory the process is using
> > or how badly behaved it is.
> > Even worse, if the huge page was allocated from the static hugepage pool then
> > the hugepages are freed to the hugepage pool and not the page allocator when
> > the process is killed. This means that killing a process using hugepages
> > does not necessarily help applications requiring more memory unless they
> > also want hugepages. However, a hugepage allocation will not trigger the
> > OOM killer so killing processes using hugepages still does not help.
>
> Thanks for pointing me.
> In fact, I expect your great answer. :)
>
> So, how do we prevent innocent task killing for allocation of high order page ?

Not by targetting users of hugepages anyway, that's for sure. My expectation
normally for a high-order allocation failing is for the caller to recover
from the situation gracefully. In the event it can't, the caller is running
a major risk and I would question why it's __GFP_NOFAIL.

I recognise that this is not much of an answer. I haven't read all the
related threads so I don't know what application is depending so heavily on
high-order allocations succeeding that it warrented __GFP_NOFAIL and couldn't
be addressed in some other fashion like vmalloc().

Killing a process allocating huges will only help another process requiring
hugepages. Unless dynamic hugepage pool resizing was used, the pages freed
are not usable for normal high-order allocations so teaching the OOM
killer to target those processes is unlikely to help solve whatever
problem is being addressed.

> I think it is trade off. but at least, we have been prevent it until now.
>
> But this patch increases the probability of innocent task killing.

I think any increase in probability is minimal. When it gets down to it, there
should be zero costly-high-order allocations that are also __GFP_NOFAIL. If
anything, the patch would show up as OOM-kill pointing out what caller needs to
be fixed as opposed to having apparently infinite loops in the page allocator.

> Is GFP_NOFAIL's early bailout more important than killing of innocent task ?
>

In my opinion, yes, in the sense that a OOM-kill report is easier to diagnose
than an infinite loop.

> I am not sure.
>
> > --
> > Mel Gorman
> > Part-time Phd Student                          Linux Technology Center
> > University of Limerick                         IBM Dublin Software Lab
> >
>
>
>
> --
> Kinds regards,
> Minchan Kim
>

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/