Re: [RFC 0/9] mce recovery for Sandy Bridge server

From: Borislav Petkov
Date: Tue May 24 2011 - 17:04:36 EST


On Tue, May 24, 2011 at 10:56:26AM -0700, Tony Luck wrote:
> > Maybe something like
> >
> > set_current_state(TASK_UNINTERRUPTIBLE);
> >
> > finish work in NMI context
> >
> > do remaining work in process context like sending appropriate signals
> > etc; finally:
> >
> > set_task_state(tsk, TASK_RUNNING)
>
> That looks pretty easy - are their any weird side effects that I should
> be worried about? My perf/event can't really include the "task" pointer
> (that sounds way too internal) - but I can provide the process id, so
> the "RAS daemon" that sees this event can look up the task to do that
> final set_task_state(tsk, TASK_RUNNING).

Actually, I was thinking more in the direction of doing this in a kernel
thread or workqueue without going back to the RAS daemon. Then you would
only need to save the task_struct ptr.

> Does this work in the threaded case? In the case where the task was in
> kernel context (but in a CONFIG_PREEMT=y kernel at some point
> where preemption is allowed)?

Well, IIUC and depending on the error, if it is severe enough, you would
want to run the remaining work right after the NMI handler finishes
without going to userspace.

Hmm..

--
Regards/Gruss,
Boris.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/