Re: Segfaults in mkdir under high load. Software or hardware?

From: Bongani Hlope
Date: Mon Sep 19 2005 - 14:24:59 EST


On Monday 19 September 2005 03:03, Maurice Volaski wrote:
> At 6:00 AM -0500 9/18/05,
>
> linux-kernel-daily-digest-request@xxxxxxxxxxxxxxxxx wrote:
> >> I have been seeing a similar thing:
> >>
> >> ./current:Sep 17 18:00:01 [kernel] mkdir[7696]: segfault at
> >> 0000000000000000 rip 000000000040184d rsp 00007fffff826350 error 4
> >>
> >> I'm using the plain 2.6.13 (from gentoo vanilla sources), though it
> >> was compiled with
> >> gcc version 3.4.4 (Gentoo 3.4.4-r1, ssp-3.4.4-1.0, pie-8.7.8)
> >
> >x86_64 ? If so see http://bugzilla.kernel.org/show_bug.cgi?id=4851
>
> Dual Opteron, and this looks like my issue. It recommends echo 0 >
> /proc/sys/kernel/randomize_va_space but that has not stopped it from
> happening, so I'll probably wait for the patch to get merged.

Linus has a patch for that, which you might try. Look at
http://bugzilla.kernel.org/show_bug.cgi?id=4851 for more details on this bug.

--- arch/x86_64/kernel/setup.c.orig 2005-09-18 07:34:36.000000000 +0200
+++ arch/x86_64/kernel/setup.c 2005-09-18 07:37:25.000000000 +0200
@@ -793,10 +793,23 @@ static void __init amd_detect_cmp(struct
#endif
}

+#define HWCR 0xc0010015
+
static int __init init_amd(struct cpuinfo_x86 *c)
{
int r;
int level;
+#if CONFIG_SMP
+ unsigned long value;
+ // Disable TLB flush filter by setting HWCR.FFDIS:
+ // bit 6 of msr C001_0015
+ //
+ // Errata 63 for SH-B3 steppings
+ // Errata 122 for all(?) steppings
+ rdmsrl(HWCR, value);
+ value |= 1 << 6;
+ wrmsrl(HWCR, value);
+#endif

/* Bit 31 in normal CPUID used for nonstandard 3DNow ID;
3DNow is IDd by bit 31 in extended CPUID (1*32+31) anyway */

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/