Re: Hardware Error Kernel Mini-Summit

From: Mauro Carvalho Chehab
Date: Tue May 18 2010 - 13:35:39 EST


Hidetoshi Seto wrote:
> (2010/05/18 3:23), Mauro Carvalho Chehab wrote:
>> During the last LF Collaboration Summit, we've done a mini-summit [1],
>> intended to improve the hardware error detection in kernel, currently
>> provided by MCE and EDAC subsystems.
>>
>> The idea of this mini-summit came up after Thomas Gleixner and Ingo
>> Molnar suggestions that edac and mce should converge into an error
>> subsystem.
>>
>> I'm enclosing the minutes of the meeting, in order to allow it to be
>> reviewed by other kernel hackers that are interested on the theme but
>> unfortunately couldn't come to the meeting.
>>
>> Btw, during the meeting, it were decided that EDAC ML could better work
>> if moved to vger, so I'm copying here both the old and the new edac
>> mailing lists.
>>
>> [1] http://events.linuxfoundation.org/lfcs2010/edac
>>
>> ---
>
> Thank you very much for providing this report.
>
> I agree that we should have a well organized error subsystem that
> covers all error sources in the system and that provides enough
> simple and powerful API for users. As one of interested absentee,
> I think I could be of some help to you (e.g. x86 low level).

Thank you for your offer. Any help is welcome.
>
> It might be off-topic here, but I'd like to point that you missed
> the presence of PCIe AER subsystem that handle hardware errors on
> PCIe devices nowadays (It works well on ppc, x86 and so on).
> Given that APEI also covers PCIe errors and that some system can
> map MC registers to PCI configuration space, I think there is no
> way for the new error subsystem to ignore I/O device errors while
> it care errors on CPU/memory and cooperate with APEI.

Yes, it makes sense to integrate also PCIe AER subystem. IMO, the first
step is to provide an error core integrated to perf, and then start
integrating the several error systems around it.

--

Cheers,
Mauro
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/