Re: K8 ECC error with linux-2.6.32

From: Borislav Petkov
Date: Mon Dec 14 2009 - 17:23:43 EST


On Mon, Dec 14, 2009 at 02:26:45PM +0100, Johannes Hirte wrote:
> Northbridge Error, node 0, core: -1
> K8 ECC error.
> Northbridge Error, node 0, core: -1
> K8 ECC error.
> Northbridge Error, node 0, core: -1
> K8 ECC error.
> Northbridge Error, node 0, core: -1
> K8 ECC error.
> Northbridge Error, node 0, core: -1
> K8 ECC error.
> Northbridge Error, node 0, core: -1
> K8 ECC error.
> Northbridge Error, node 0, core: -1
> K8 ECC error.
> Northbridge Error, node 0, core: -1
> K8 ECC error.

Ok, let's see what kind of errors does your machine report. It looks
like benign GART TLB walk errors but let's verify that first. Can you
apply the following patchlet and re-trigger the problem:

--
diff --git a/drivers/edac/edac_mce_amd.c b/drivers/edac/edac_mce_amd.c
index 713ed7d..fc4a68e 100644
--- a/drivers/edac/edac_mce_amd.c
+++ b/drivers/edac/edac_mce_amd.c
@@ -311,9 +311,12 @@ void amd_decode_nb_mce(int node_id, struct err_regs *regs, int handle_errors)
if (regs->nbsh & K8_NBSH_ERR_CPU_VAL)
pr_cont(", core: %u\n", (u8)(regs->nbsh & 0xf));
} else {
- pr_cont(", core: %d\n", ilog2((regs->nbsh & 0xf)));
+ pr_cont(", core: %d\n", fls(regs->nbsh & 0xf) - 1);
}

+ pr_err("%s: NBSL: 0x%08x, NBSL: 0x%08x\n",
+ __func__, regs->nbsl, regs->nbsh);
+

pr_emerg("%s.\n", EXT_ERR_MSG(xec));


Thanks.

--
Regards/Gruss,
Boris.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/