Re: [PATCH 3/3] mce: acpi/apei: trace: Enable ghes memory error traceevent

From: Naveen N. Rao
Date: Tue Aug 13 2013 - 12:56:26 EST


On 08/13/2013 05:51 PM, Mauro Carvalho Chehab wrote:
Em Tue, 13 Aug 2013 17:06:14 +0530
"Naveen N. Rao" <naveen.n.rao@xxxxxxxxxxxxxxxxxx> escreveu:

On 08/12/2013 11:26 PM, Borislav Petkov wrote:
On Mon, Aug 12, 2013 at 02:25:57PM -0300, Mauro Carvalho Chehab wrote:
Userspace still needs the EDAC sysfs, in order to identify how the
memory is organized, and do the proper memory labels association.

What edac_ghes does is to fill those sysfs nodes, and to call the
existing tracing to report errors.

I suppose you're referring to the entries under /sys/devices/system/edac/mc?

Yes.


I'm not sure I understand how this helps. ghes_edac seems to just be
populating this based on dmi, which if I'm not mistaken, can be obtained
in userspace (mcelog as an example).

Also, on my system, all DIMMs are being reported under mc0. I doubt if
the labels there are accurate.

Yes, this is the current status of ghes_edac, where BIOS doesn't provide any
reliable way to associate a given APEI report to a physical DIMM slot label.

The plan is to add more logic there as BIOSes start to provide some reliable
way to do such association. I discussed this subject with a few vendors
while I was working at Red Hat.

Hmm... is there anything specific in the APEI report that could help? More importantly, is there a need to do this in-kernel rather than in user-space?

Thanks,
Naveen

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/