Re: x86/mce merge, integration hickup + crash, design thoughts

From: Huang Ying
Date: Tue Jan 13 2009 - 21:02:35 EST


On Wed, 2009-01-14 at 01:45 +0800, Ingo Molnar wrote:
> * Andi Kleen <ak@xxxxxxxxxxxxxxx> wrote:
>
> > Ingo Molnar wrote:
> >
> >>>> A far more useful design for handling MCE events would be to feed
> >>>> them into printk logging.
> >>> If there's ASCII logging it should be separate from normal printk.
> >>
> >> Well, why?
> >
> > Mostly because the problem is not a kernel issue. Especially large
> > systems with a lot of memory can generate a lot of corrected events (one
> > bit flips in DIMMs are not that uncommon) and it's not good to mix that
> > all up into other kernel messages. It also makes it more clear that it's
> > not a kernel problem, but a hardware problem. I've got feedback over the
> > years that confirm this sight.
>
> Is your argument that syslog is not suitable for the logging of hw events?
>
> If that is your argument then the answer is to extend syslog with those
> aspects, instead of widening the quirky /dev based mce ABIs to achieve
> something similar.

In current /dev based mce ABI implementation, syslog is used for logging
hw events, not though printk, but through /dev/mcelog and /sbin/mcelog.

For uncorrected MCE, they should be logged via printk. But for corrected
MCE, there could be thousands/millions ones (imagining you have a DIMM
with one data pin corrupted). I don't think it's a good idea to blend
these hardware events with other kernel software events in printk.

Best Regards,
Huang Ying

Attachment: signature.asc
Description: This is a digitally signed message part