Re: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error traceevent

From: Naveen N. Rao
Date: Tue Aug 13 2013 - 13:32:27 EST


On 08/13/2013 06:12 PM, Borislav Petkov wrote:
On Tue, Aug 13, 2013 at 04:51:33PM +0530, Naveen N. Rao wrote:
You're right - my trace point makes all the data provided by apei
as-is to userspace. However, ghes_edac seems to squash some of this
data into a string when reporting through mc_event.

Right, for systems which don't need EDAC to decode to the DIMM or for
which there are no EDAC drivers written, they could use a tracepoint
which carries APEI info as-is. Others, which need EDAC, should probably
use trace_mc_event and disable the APEI tracepoint.

If I'm not mistaken, even for systems that have EDAC drivers, it looks to me like EDAC can't really decode to the DIMM given what is provided by the bios in the APEI report currently. If and when ghes_edac gains this capability, users will have a choice between raw APEI reports vs. edac processed ones.


I think this should address Tony's concerns...

Btw, you could call your TP something simpler like
trace_ghes_memory_event or so.

I started out with a simpler name, but eventually decided to use the name from the CPER record so it is clear what this event carries. I think this will be better when adding further ghes events for say, processor generic, PCIe and others.


Btw 2, if GHES can report other types of errors (I'm pretty sure it can)
maybe we can use a single tracepoint called trace_ghes_event for any
types of errors coming out of it...

Two problems with this:
- One, the record size will be really big since the cper records for each type of error is large.
- Two, it may be better to filter events based on the type of error (memory error, processor, pcie, ...) rather than subscribing for all ghes error reports.


Oh, and while at it, we probably need to start thinking of a mechanism
to disable all the error printing, i.e. cper_print_mem() and such,
if a userspace agent is listening in on the tracepoint and the error
information is carried through it to userspace.

Do you mean conditionally print the cper records based on whether the tracepoint is enabled or not? Wouldn't that be confusing if someone is monitoring dmesg as well?


Thanks,
Naveen

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/