Re: [PATCH] 2.6.26-rc: x86: pci-dma.c: use __GFP_NO_OOM instead of__GFP_NORETRY

From: Andrew Morton
Date: Wed May 28 2008 - 04:41:17 EST


On Wed, 28 May 2008 10:31:25 +0200 Miquel van Smoorenburg <mikevs@xxxxxxxxxx> wrote:

> > When the 16MB zone overflows (which can be common in some workloads)
> > calling the OOM killer is pretty useless because it has barely any
> > real user data [only exception would be the "only 16MB" case Alan
> > mentioned]. Killing random processes in this case is bad.
> >
> > I think for 16MB __GFP_NORETRY is ok because there should be
> > nothing freeable in there so looping is useless. Only exception would be the
> > "only 16MB total" case again but I'm not sure 2.6 supports that at all
> > on x86.
> >
> > On the other hand d_a_c() does more allocations than just 16MB, especially
> > on 64bit and the other zones need different strategies.
>
> Okay, so how about this then ?
>
> --- linux-2.6.26-rc4.orig/arch/x86/kernel/pci-dma.c 2008-05-26 20:08:11.000000000 +0200
> +++ linux-2.6.26-rc4/arch/x86/kernel/pci-dma.c 2008-05-28 10:27:41.000000000 +0200
> @@ -397,9 +397,6 @@
> if (dev->dma_mask == NULL)
> return NULL;
>
> - /* Don't invoke OOM killer */
> - gfp |= __GFP_NORETRY;
> -
> #ifdef CONFIG_X86_64
> /* Why <=? Even when the mask is smaller than 4GB it is often
> larger than 16MB and in this case we have a chance of
> @@ -410,7 +407,9 @@
> #endif
>
> again:
> - page = dma_alloc_pages(dev, gfp, get_order(size));
> + /* Don't invoke OOM killer or retry in lower 16MB DMA zone */
> + page = dma_alloc_pages(dev,
> + (gfp & GFP_DMA) ? gfp | __GFP_NORETRY : gfp, get_order(size));
> if (page == NULL)
> return NULL;

I guess that's more specifally solving that-which-we-wish-to-solve.

Formally we should be testing __GFP_DMA here, not GFP_DMA - just the
zone selector field. They're presently equal, but someone could
legitimately come along and do

#define GFP_DMA (__GFP_DMA|__GFP_HIGH)

or similar.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/