Re: oom-killer: gfp_mask=0xd1 with 2.6.15.4 on EM64T [previously2.6.12]

From: J M Cerqueira Esteves
Date: Fri Mar 17 2006 - 04:45:49 EST


J M Cerqueira Esteves wrote:
> Andrew Morton wrote:
>>We have a candidate fix at
>>ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.16-rc5/2.6.16-rc5-mm2/broken-out/x86_64-mm-blk-bounce.patch.
>> [...] The patch is against 2.6.16-rc5.
>
> Testing that kernel now, with good news: the machine has been apparently
> stable, running Gaussian processes for the last 20 hours, with no
> oom-killer messages.

... and still using that 2.6.16-rc5 with the suggested patch,
during the last 11 days, always doing a lot of number-crunching with
Gaussian and other programs, we had no more oom-killings or other
noticeable instabilities.

I did take the opportunity to configure the kernel with CONFIG_EDAC,
CONFIG_EDAC_MM_EDAC and CONFIG_EDAC_E752X, and during this period (11
days) got about 20 messages like these:

Mar 7 15:25:08 localhost kernel: [182069.699544] Non-Fatal Error DRAM
Controler
Mar 7 15:25:08 localhost kernel: [182069.699559] EDAC MC0: CE page
0x9c334, offset 0x0, grain 4096, syndrome 0x2510, row 2, channel 1,
label "": e752x CE

always with the same values for page, offset, grain, syndrome, row, and
channel values. A defective DIMM?

Best regards
J Esteves

--
+351 939838775 Skype:jmcerqueira http://del.icio.us/jmce

Attachment: signature.asc
Description: OpenPGP digital signature