Re: [RFC EDAC/GHES] edac: lock module owner to avoid error report conflicts

From: Tony Luck
Date: Thu Nov 01 2012 - 13:25:17 EST


On Thu, Nov 1, 2012 at 4:47 AM, Mauro Carvalho Chehab
<mchehab@xxxxxxxxxx> wrote:
> Take a look at arch/x86/kernel/cpu/mcheck/mce-apei.c:
>
> void apei_mce_report_mem_error(int corrected, struct cper_sec_mem_err *mem_err)
> {
> struct mce m;
>
> /* Only corrected MC is reported */
> if (!corrected || !(mem_err->validation_bits &
> CPER_MEM_VALID_PHYSICAL_ADDRESS))
> return;
>
> mce_setup(&m);
> m.bank = 1;
> /* Fake a memory read corrected error with unknown channel */
> m.status = MCI_STATUS_VAL | MCI_STATUS_EN | MCI_STATUS_ADDRV | 0x9f;
> m.addr = mem_err->physical_addr;
> mce_log(&m);
> mce_notify_irq();
> }
>
> Bank information there is fake; status is fake. Only addr is really filled
> there; it works only for corrected errors.

This went in like this to help out the Westmere-EX processors that
didn't fill out MCi_ADDR for corrected errors. APEI could get the
address from some platform CSRs ... reporting via /dev/mcelog
so that predictive analysis in mcelog(8) would work on these machines.

I don't think we can rip it out yet ... not until those machines are
shuffled off to recycle heaven.

But perhaps we should get smarter about which machines we enable
APEI on? If we get everything we need from the machine check banks,
then the detour via the BIOS to report the same thing again isn't helpful.

-Tony
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/