Re: [PATCH 4/4] x86, mce: Have MCE persistent event off by defaultfor now

From: Borislav Petkov
Date: Tue May 03 2011 - 15:52:40 EST


On Tue, May 03, 2011 at 01:17:50PM -0400, Luck, Tony wrote:
> > Ok, the problem I see with it is that people without a RAS daemon
> > running will have the mechanism collecting MCEs in the background, using
> > up resources (4 pages per CPU is the buffer) and not doing anything (in
> > the best case that is, when we're not broken otherwise).
>
> Can the kernel detect whether anyone is listening to the
> persistent MCE event? If so, then the kernel could printk()
> something to let the user with no RAS daemon (or a dead
> daemon) that stuff is happening that they might like to
> know about.

Right, so I have a primitive way to do that when you enable ras over the
command line, i.e. boot with "ras=on." But that doesn't help in cases
where the daemon dies for some reason.

Maybe the decoding path should look at whether the event descriptor is
still mmapped or whether the event is enabled; let me think about it a
bit longer, good point btw!

> Probably make some sense to delay such a message (so that in
> the boot case we give the daemon a chance to get started before
> complaining that it hasn't shown up for work).

Yep, that and also I need to address the case for catching earlybird
MCEs, when perf hasn't been initialized yet. I'm thinking we could
reuse the mcelog buffer and feed those into the RAS daemon after init.
Something like that.

Thanks.

--
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/