Re: 2.6.0-test4 and hardware reports a non fatal incident

From: Matt Gibson
Date: Sat Aug 30 2003 - 08:22:00 EST


On Saturday 30 Aug 2003 11:49, Matt Gibson wrote:
> On Thursday 28 Aug 2003 23:17, Randy.Dunlap wrote:
> > Yes, the kernel has decided that your processor only has 1 Bank of
> > MCE register data to report. I don't know how/why. Sorry.
>
> Could it be something to do with this (in
> arch/i386/kernel/cpu/mcheck/k7.c)?
>
> if (l & (1<<8)) /* Control register present ? */
> wrmsr (MSR_IA32_MCG_CTL, 0xffffffff, 0xffffffff);
> nr_mce_banks = l & 0xff;
>
> for (i=1; i<nr_mce_banks; i++) {
>
> Check out the "for". Or am I reading this wrong?

Having checked back, this was changed between test-2 and test-3. The
checking code in k7_machine_check() still loops from 0 rather than 1. I
think this may be leading to false reporting of problems, which may be why I
and Tomasz are seeing these MCE messages on our Athlons.

Anyone who knows more about this stuff care to comment? Is someone looking
after MCE at the moment? I couldn't find out much info on it.

Thanks,

Matt

--
"It's the small gaps between the rain that count,
and learning how to live amongst them."
-- Jeff Noon
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/