Re: [PATCH] tmp patch to fix hotplug issue in CMCI storm

From: Chen Gong
Date: Fri Jun 15 2012 - 02:52:02 EST


ä 2012/6/14 22:07, Thomas Gleixner åé:
On Thu, 14 Jun 2012, Chen Gong wrote:
this patch is based on tip tree and previous 5 patches.

You really don't need all this complexity to handle that. The main
thing is that you clear the storm state and adjust the storm counter
when the cpu goes offline (in case the state is ACTIVE).

When it comes online again then you can simply let it restart cmci. If
the storm on this cpu (or node) still exists then it will notice and
everything falls in place.

I ever tested some different scenarios, if storm on this cpu still
exists, it triggers the CMCI and broadcast it on the sibling CPU,
which means the counter *cmci_storm_on_cpus* will increase beyond
the upper limit. E.g. on a 2 sockets SandyBridge-EP system (one socket
has 8 cores and 16 threads), inject one error on one socket, you can
watch *cmci_storm_on_cpus* = 16 becuase of CMCI broadcast, during
this time, offline and online one CPU on this socket, firstly
*cmci_storm_on_cpus* = 15 because of offline and ACTIVE status, and then *cmci_storm_on_cpus* = 31 in that CMCI is actived because of
online.That's why I have to disable CMCI during whole online/offline
until CMCI storm is subsided. Frankly, the logic is a little bit
complex so that I write many comments to avoid I forget it after some
time :-)


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/