Re: oom-killer: gfp_mask=0xd1 with 2.6.12 on EM64T

From: Andrew Morton
Date: Sat Mar 04 2006 - 19:12:00 EST


J M Cerqueira Esteves <jmce@xxxxxxxxxxxxxxxx> wrote:
>

argh. Please always do reply-to-all. I almost missed this one.

> Andrew Morton wrote:
> > That's quite an old kernel. If this is the notorious bio-uses-GFP_DMA bug
> > then I'd have expected this kernel to be useless from day one. Did you
> > install it recently?
>
> On this double Xeon, yes. I had no problems before with 2.6.12 and the
> same "heavy" software on dual Opteron and dual dual core Opteron
> machines, and this is my first installation on a EM64T.
> At first it seemed everything was ok with 2.6.12 here too, but in a
> couple of days we started gettings some of those oom killings when
> running some Gaussian jobs. In at least a pair of cases the system froze
> completely.
>
> > If you're feeling keen you could add this patch which would confirm it:
>
> Added it and already got output for a similar "killing". Since I'm not
> sure what could be most relevant among those messages, I refrained from
> attaching them all here, and instead put them at
> http://jmce.artenumerica.org/tmp/linux-2.6.12-oom_killings/EM64T-kern.log

Those x86_64 backtraces are quite hard to follow. They get much better if
you enable CONFIG_FRAME_POINTER, and that makes very little difference to
code quality.

> > And if it's that bug then I'm afraid you'll have to sit tight until 2.6.16.
> > We shouldn't release 2.6.16 until this thing is fixed.
>
> Do those call traces suggest that uncorrected bug you mention?

It's hard to say what happened there. I _think_ it went oom in
get_sectorsize()'s GFP_KERNEL|GFP_DMA allocation. (Jens, do we really need
GFP_DMA in there?)

But that's only a 512-byte allocation. Something else must have used up
all the DMA zone.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/