Re: [PATCH] RAS: Add a tracepoint for reporting memory controllerevents

From: Borislav Petkov
Date: Fri Jun 01 2012 - 18:59:30 EST


On Fri, Jun 01, 2012 at 06:21:29PM +0000, Luck, Tony wrote:
> But we need to make sure that user space can actually run. That's the
> motivation behind the CMCI disable patches. Since Intel broadcasts
> CMCI to all cpus on a socket - a CMCI storm on a single socket machine
> will stop any user code from running.

Uuh, that doesn't sound good. Can't you guys make the CMCI run on one
CPU only? I mean, it is a single CECC, no need to stop all cores on the
socket for it, right?

Arguably, it'll be best if the core that sees the CECC fires the CMCI
too and the others continue on their merry way.

> I'd make one small change to what you said:
>
> The kernel's job is to report enough error information that user space
> can make an accurate assessment of the source of the error.
>
> I.e. "enough" is less than "as many errors as it possibly can".

Ok, I see what you mean.

--
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/