Re: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error traceevent

From: Mauro Carvalho Chehab
Date: Thu Aug 15 2013 - 09:26:22 EST


Em Thu, 15 Aug 2013 11:38:31 +0200
Borislav Petkov <bp@xxxxxxxxx> escreveu:

> On Wed, Aug 14, 2013 at 09:22:11PM -0300, Mauro Carvalho Chehab wrote:
> > 1) EDAC core needs to know that it should reject "hardware first"
> > drivers.
>
> -ENOPARSE. What do you mean?

I mean that the edac core needs to know that, on a given system, the
BIOS is accessing the hardware registers and sending the data via ghes_edac.

On such case, it should reject the driver that reads such data directly
from the hardware, as having both active cause inconsistent error reports
(I got a few BZ reports about that).

> > 3) If BIOS vendors add later some solution to enumerate the DIMMS
> > per memory controller, channel, socket with APEI, the addition to the
> > existing driver would be trivial.
>
> Actually, with BIOS vendors wanting to do firmware-first strategy with
> DRAM errors, they must have a pretty good and intimate picture of the
> memory topology at hand. So it is only a logical consequence for them,
> when reporting a memory error to the OS to tell us the silkscreen label
> too, while at it.
>
> And if they do that, we don't need the additional layer - just a
> tracepoint from APEI and a userspace script.

No. As we want that fatal errors to also be properly reported, the
kernel will still need to know the memory layout.

Ok, such information can come via userspace, just like we do with the
other EDAC drivers, but we'll need to allow to dynamically create the
memory layout via sysfs (or to use some other interface for loading that
data).

> It's a whole another question if they don't do that.

--

Cheers,
Mauro
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/