Re: Hardware Error Kernel Mini-Summit

From: Russ Anderson
Date: Mon May 24 2010 - 13:13:43 EST


On Wed, May 19, 2010 at 12:00:02AM +0200, Ingo Molnar wrote:
> * Tony Luck <tony.luck@xxxxxxxxx> wrote:
>
> > [...] Getting from a machine check handler through some
> > context switches (and page faults etc.) to a user level
> > daemon before the error gets recorded looks to be really
> > hard.
>
> As Boris mentioned it too, critical policy action can and
> will be done straight in the kernel.

That is how it is done in ia64. The MCA interrupt
handler does the low level handling. It makes sure
all the cpus have rendezvoused, looks at the MCA record
to determine what happend and does whatever recovery
steps are needed, such as kill the application.

It definitely needs to be handled in the kernel.

> Ingo

--
Russ Anderson, OS RAS/Partitioning Project Lead
SGI - Silicon Graphics Inc rja@xxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/