Re: [GIT PULL] 2 RAS fixes for 3.17, refreshed

From: Boris Ostrovsky
Date: Fri Jun 27 2014 - 12:58:26 EST


On 06/27/2014 12:08 PM, Borislav Petkov wrote:
On Fri, Jun 27, 2014 at 11:12:59AM -0400, Boris Ostrovsky wrote:
Yes, it fails because xen_late_init_mcelog() registers /dev/mcelog and (I
think) it happens before mcheck_init_device().
Yes, mcheck_init_device is device_initcall_sync() while
xen_late_init_mcelog() is device_initcall().

In other words, misc_register() expected to fail in mcheck/mce.c on
(privileged?) PV guests (provided right CONFIG_XEN_* is set).
So

cef12ee52b05 ("xen/mce: Add mcelog support for Xen platform")

made it this way so that xen's init routine runs first.

So it is not the case that misc_register() fails often on xen but it is
*supposed* to fail by design, when running in dom0. And *then* you need
the notifier *not* unregistered on the error path so that the timers do
get deleted properly.

Ok, I see it now. Frankly, I'm not really sure I want to rush this in
now because it might break something else, Who TF knows what.

Right now my gut feeling tells me we should still queue it for 3.17 and
have it run for a while in linux-next. We can backport it to stable
later after some testing...

I don't have a problem with having it soak in linux-next for a while but I am not too crazy about releasing 3.16 with this bug (even knowing that there will be a backport later). When we hit this problem the results are rather unpleasant in that it's not immediately clear what's happened.

We are still at rc2 so we have 3-4 weeks before 3.16 goes out.


-boris


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/