Re: [PATCH 4/4] x86, mce: Have MCE persistent event off by defaultfor now

From: Ingo Molnar
Date: Wed May 04 2011 - 02:59:11 EST



* Luck, Tony <tony.luck@xxxxxxxxx> wrote:

> > Ok, the problem I see with it is that people without a RAS daemon
> > running will have the mechanism collecting MCEs in the background, using
> > up resources (4 pages per CPU is the buffer) and not doing anything (in
> > the best case that is, when we're not broken otherwise).
>
> Can the kernel detect whether anyone is listening to the
> persistent MCE event? If so, then the kernel could printk()
> something to let the user with no RAS daemon (or a dead
> daemon) that stuff is happening that they might like to
> know about.
>
> Probably make some sense to delay such a message (so that in
> the boot case we give the daemon a chance to get started before
> complaining that it hasn't shown up for work).

Yes, i definitely think a gateway to printk would be useful, so that the system
can log MCE events the syslog way as well. This probably makes sense for
persistent events in general, not just MCE events.

Btw., as a sidenote, the much more interesting direction is the reverse
direction: we want a gateway of printk into the RAS daemon as well - in form of
a special 'printk events' that contain:

- the log level of the kernel when the message was generated
- the log level of the message
- the printk timestamp
- plus the printk message itself, as a free-form string

This would allow RAS functionality to dispatch off printk events immediately
and transparently, without having to separately worry about how to talk to
syslogd/klogd how to get its logs ...

printk itself could become a persistent event. (Transparently and without
breaking compatible syslogd/klogd functionality.)

This would also allow the RAS daemon to log printk messages around suspicious
MCE events, in a time-serialized way via a single event channel - so post
mortem can be done using a single facility.

There's ongoing work to timestamp perf events with GTOD timestamps - that way
global log analysis becomes possible as well.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/