Re: [patch] reproducible athlon mce fix

From: Geoffrey Lee
Date: Sun Nov 09 2003 - 01:24:56 EST


On Sun, Nov 02, 2003 at 12:52:03PM +0000, Dave Jones wrote:
> On Sun, Nov 02, 2003 at 01:57:48PM +0800, Geoffrey Lee wrote:
>
> > preempt_disable();
> > +#if CONFIG_MK7
> > + for (i=1; i<nr_mce_banks; i++) {
> > +#else
> > for (i=0; i<nr_mce_banks; i++) {
> > +#endif
> > rdmsr (MSR_IA32_MC0_STATUS+i*4, low, high);
>
> This needs to be a runtime check. In 2.6, a K7 can boot
> a P4 kernel, and vice versa.
>

Yes, of course.

Would it be sufficient to check for the boot_cpu_data.x86 == 6 and
boot_cpu_data.x86_vendor == X86_VENDOR_AMD? This would seem to do
the right thing. Updated patch attached.


- g.
--- linux-2.6.0-test9/arch/i386/kernel/cpu/mcheck/non-fatal.c.orig 2003-11-02 13:31:43.000000000 +0800
+++ linux-2.6.0-test9/arch/i386/kernel/cpu/mcheck/non-fatal.c 2003-11-02 21:50:36.000000000 +0800
@@ -21,6 +21,7 @@

static struct timer_list mce_timer;
static int timerset;
+static int startbank;

#define MCE_RATE 15*HZ /* timer rate is 15s */

@@ -30,7 +31,7 @@
int i;

preempt_disable();
- for (i=0; i<nr_mce_banks; i++) {
+ for (i=startbank; i<nr_mce_banks; i++) {
rdmsr (MSR_IA32_MC0_STATUS+i*4, low, high);

if (high & (1<<31)) {
@@ -68,6 +69,19 @@

static int __init init_nonfatal_mce_checker(void)
{
+ /*
+ Certain Athlons would cause spurious MCE non-fatal
+ exception check to be reported, if we poke at bank 0.
+ Avoid this if the running machine is an Athlon.
+
+ -- geoff.
+ */
+ if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD &&
+ boot_cpu_data.x86 == 6)
+ startbank = 1;
+ else
+ startbank = 0;
+
if (timerset == 0) {
/* Set the timer to check for non-fatal
errors every MCE_RATE seconds */