RE: [PATCH] x86: auto poll/interrupt mode switch for CMC to stopCMC storm

From: Thomas Gleixner
Date: Thu May 24 2012 - 14:18:20 EST


On Thu, 24 May 2012, Luck, Tony wrote:

> > So can you please explain how this is better than having this strict
> > per cpu and avoid all the mess which comes with that patch? The
> > approach of letting global state be modified in a random manner is
> > just doomed.
>
> Well doomed sounds bad :-) ... and I think I now agree that we should
> get rid of global state and have polling vs. CMCI mode be per-cpu. It
> means that it will take fractionally longer to react to a storm, but
> on the plus side we'll naturally set storm mode on just the cpus
> that are seeing it on a multi-socket system without having to check
> topology data ... which should be better for the case where a noisy
> source of CMCI is plaguing one socket, while other sockets have some
> much lower rate of CMCI that we'd still like to log.

I thought more about it - see my patch. So I have a global state now
as well, but it's only making sure that stuff stays in poll mode as
long as others are in poll mode. That's good I think as you avoid the
following:

cmcis which affect siblings or a socket are delivered to all affected
cores, but only one core might see the bank. So all others would
reenable fast and then switch back to polling because the storm still
persists. This would ping pong so, we probably want to avoid it.

Ideally the storm_on_cpus variable should be per socket and not system
wide, but we can do that when it really becomes an issue.

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/