Re: upcoming kerneloops.org item: get_page_from_freelist

From: Linus Torvalds
Date: Wed Jun 24 2009 - 15:46:38 EST




On Wed, 24 Jun 2009, Andrew Morton wrote:

> On Wed, 24 Jun 2009 12:16:20 -0700 (PDT)
> Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> > Lookie here. This is 2.6.0:mm/page_alloc.c:
> >
> > do_retry = 0;
> > if (!(gfp_mask & __GFP_NORETRY)) {
> > if ((order <= 3) || (gfp_mask & __GFP_REPEAT))
> > do_retry = 1;
> > if (gfp_mask & __GFP_NOFAIL)
> > do_retry = 1;
> > }
> > if (do_retry) {
> > blk_congestion_wait(WRITE, HZ/50);
> > goto rebalance;
> > }
>
> rebalance:
> if ((p->flags & (PF_MEMALLOC | PF_MEMDIE)) && !in_interrupt()) {
> /* go through the zonelist yet again, ignoring mins */
> for (i = 0; zones[i] != NULL; i++) {
> struct zone *z = zones[i];
>
> page = buffered_rmqueue(z, order, cold);
> if (page)
> goto got_pg;
> }
> goto nopage;
> }

Your point?

That's the recursive allocation or oom case. Not the normal case at all.

The _normal_ case is to do the whole "try_to_free_pages()" case and try
and try again. Forever.

IOW, we have traditionally never failed small kernel allocations. It makes
perfect sense that people _depend_ on that.

Now, we have since relaxed that (a lot). And in answer to that, people
have added more __GFP_NOFAIL flags, I bet. It's all very natural. Claiming
that this is some "new error" and that we should warn about NOFAIL
allocations with big orders is just silly and simply not true.

Linus


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/