Re: [RFC PATCH] x86, mce: change the mce notifier to 'blocking' from 'atomic'

From: Thomas Gleixner
Date: Wed Apr 12 2017 - 17:12:48 EST



On Wed, 12 Apr 2017, Dan Williams wrote:

> On Wed, Apr 12, 2017 at 1:52 PM, Luck, Tony <tony.luck@xxxxxxxxx> wrote:
> > On Wed, Apr 12, 2017 at 01:27:05PM -0700, Verma, Vishal L wrote:
> >> > /* We only care about memory errors */
> >> > if (!(mce->status & MCACOD))
> >> > return NOTIFY_DONE;
> >
> > N.B. that isn't a valid test that this is a memory error. You need
> >
> >
> > if (!(m->status & 0xef80) == BIT(7))
> > return NOTIFY_DONE;
> >
> > See: Intel SDM Volume 3B - 15.9.2 Compound Error Codes
>
> But Vishal's point is that even if we get this check correct the
> notifier still requires no sleeping operations. So we would need to
> move recoverable notifications to a separate blocking notifier chain.

There is another solution:

Convert the notifier to a blocking notifier and in the panic case, ignore
the locking and invoke the notifier chain directly. That needs some minimal
surgery in the notifier code to allow that, but that's certainly less ugly
than splitting stuff up into two chains.

Thanks,

tglx