Re: x86/mce: machine check warning during poweroff

From: Srivatsa S. Bhat
Date: Thu Jan 19 2012 - 07:03:54 EST


On 01/19/2012 03:38 AM, Suresh Siddha wrote:

> On Wed, 2012-01-18 at 16:32 +0300, Sergey Senozhatsky wrote:
>> Just a small note, since you're talking about removing CPU from nohz.idle_cpus_mask,
>> that I'm able to reproduce this problem not only when offlining CPU, but during
>> onlininig as well (kernel 3.3):
>
> yes, if the nohz state is not cleared properly during offline, then the
> issue can happen any time including cpu online etc.
>
> Srivatsa, I thought CPU_PRI_SCHED_INACTIVE as INT_MAX for some reason
> and was expecting sched_ilb_notifier() will be called after setting that
> cpu as inactive. I am now using CPU_DYING which will be called from the
> cpu going down.
>

> Here is the v2 version of the fix. Can you folks please give it another

> try?
>


Suresh, your patch works perfectly! Thanks a lot!
Tested-by: Srivatsa S. Bhat <srivatsa.bhat@xxxxxxxxxxxxxxxxxx>

And the reasoning behind the patch matches the test results:
we don't allow select_nohz_load_balancer() to undo the cleanup that we
did in sched_ilb_notifier(), by ensuring that sched_ilb_notifier() runs
*after* sched_cpu_inactive().

So, you can have my "Reviewed-by" too, if you like!

By the way, it would be great if you could kindly describe the above
mentioned subtle aspect in the patch description as well..

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/