RE: [PATCH] x86: auto poll/interrupt mode switch for CMC to stopCMC storm

From: Luck, Tony
Date: Wed May 23 2012 - 13:01:52 EST


> What's the point of doing this work? Why can't we just do that on the
> CPU which got hit by the MCE storm and leave the others alone? They
> either detect it themself or are just not affected.

CMCI gets broadcast to all threads on a socket. So
if one cpu has a problem, many cpus have a problem :-(
Some machine check banks are local to a thread/core,
so we need to make sure that the CMCI gets taken by
someone who can actually see the bank with the problem.
The others are collateral damage - but this means there
is even more reason to do something about a CMCI storm
as the effects are not localized.

> What's wrong with doing that strictly per cpu and avoid the whole
> global state horror?

Is that less of a horror? We'd have some cpus polling and some
taking CMCI (in somewhat arbitrary and ever changing combinations).
I'm not sure which is less bad.

-Tony
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/