Re: discriminate single bit error hardware failure from slabcorruption.

From: Randy.Dunlap
Date: Thu Feb 02 2006 - 14:26:17 EST


On Thu, 2 Feb 2006, Dave Jones wrote:

> In the case where we detect a single bit has been flipped, we spew
> the usual slab corruption message, which users instantly think
> is a kernel bug. In a lot of cases, single bit errors are
> down to bad memory, or other hardware failure.
>
> This patch adds an extra line to the slab debug messages in those
> cases, in the hope that users will try memtest before they report a bug.
>
> 000: 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
> Single bit error detected. Possibly bad RAM. Please run memtest86.

does memtest86 run on all $ARCHes ?
or this is good for <large percentage>, so it's Good. :)
Just checking; it is a good idea.

> Signed-off-by: Dave Jones <davej@xxxxxxxxxx>
>
> --- linux-2.6.15/mm/slab.c~ 2006-01-09 13:25:17.000000000 -0500
> +++ linux-2.6.15/mm/slab.c 2006-01-09 13:26:01.000000000 -0500
> @@ -1313,8 +1313,11 @@ static void poison_obj(kmem_cache_t *cac
> static void dump_line(char *data, int offset, int limit)
> {
> int i;
> + unsigned char total=0;
> printk(KERN_ERR "%03x:", offset);
> for (i = 0; i < limit; i++) {
> + if (data[offset+i] != POISON_FREE)
> + total += data[offset+i];
> printk(" %02x", (unsigned char)data[offset + i]);
> }
> printk("\n");
> @@ -1019,6 +1023,18 @@ static void dump_line(char *data, int of
> }
> }
> printk("\n");
> + switch (total) {
> + case 0x36:
> + case 0x6a:
> + case 0x6f:
> + case 0x81:
> + case 0xac:
> + case 0xd3:
> + case 0xd5:
> + case 0xea:
> + printk (KERN_ERR "Single bit error detected. Possibly bad RAM. Please run memtest86.\n");
> + return;
> + }
> }
> #endif
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

--
~Randy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/