Re: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error traceevent

From: Mauro Carvalho Chehab
Date: Wed Aug 14 2013 - 19:56:52 EST


Em Tue, 13 Aug 2013 22:47:36 +0530
"Naveen N. Rao" <naveen.n.rao@xxxxxxxxxxxxxxxxxx> escreveu:

> On 08/13/2013 06:11 PM, Mauro Carvalho Chehab wrote:
> > Em Tue, 13 Aug 2013 17:11:18 +0530
> > "Naveen N. Rao" <naveen.n.rao@xxxxxxxxxxxxxxxxxx> escreveu:
> >
> >> On 08/12/2013 08:14 PM, Mauro Carvalho Chehab wrote:
> >>>> But, this only seems to expose the APEI data as a string
> >>>> and doesn't look to really make all the fields available to user-space
> >>>> in a raw manner. Not sure how well this can be utilised by a user-space
> >>>> tool. Do you have suggestions on how we can do this?
> >>>
> >>> There's already an userspace tool that handes it:
> >>> https://git.fedorahosted.org/cgit/rasdaemon.git/
> >>>
> >>> What is missing there on the current version is the bits that would allow
> >>> to translate from APEI way to report an error (memory node, card, module,
> >>> bank, device) into a DIMM label[1].
> >>
> >> If I'm reading this right, all APEI data seems to be squashed into a
> >> string in mc_event.
> >
> > Yes. We had lots of discussion about how to map memory errors over the
> > last couple years. Basically, it was decided that the information that
> > could be decoded into a DIMM to be mapped as integers, and all other
> > driver-specific data to be added as strings.
> >
> > On the tests I did, different machines/vendors fill the APEI data on
> > a different way, with makes harder to associate them to a DIMM.
>
> Ok, so it looks like ghes_edac isn't quite useful yet.
>
> In the meantime, like Boris suggests, I think we can have a different
> trace event for raw APEI reports - userspace can use it as it pleases.

"In the meantime" is something that worries me the most. Kernel APIs should
be stable. We cannot randomly change it on each new kernel version.

Better to spend a little more time discussing than implementing a new trace
that will be removed on a near future.
>
> Once ghes_edac gets better, users can decide whether they want raw APEI
> reports or the EDAC-processed version and choose one or the other trace
> event.
>
> Regards,
> Naveen
>


--

Cheers,
Mauro
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/