Re: [patch 3/4] stop_machine: implementstop_machine_from_offline_cpu()

From: Suresh Siddha
Date: Fri Jun 24 2011 - 13:56:05 EST


On Fri, 2011-06-24 at 00:45 -0700, Peter Zijlstra wrote:
> On Thu, 2011-06-23 at 11:19 -0700, Suresh Siddha wrote:
> > > In commit d0af9eed5aa91b6b7b5049cae69e5ea956fd85c3 you mention that its
> > > specific to HT, wouldn't it make sense to limit the stop-machine use in
> > > the next patch to the sibling mask instead of the whole machine?
> >
> > That specific issue was seen in the context of HT. But the SDM
> > guidelines (pre date HT and) are applicable for SMP too.
>
> Sure, but we managed to ignore those long enough, could we not continue
> to violate them and keep to the minimum that is working in practice?

No we didn't violate all of them.

There are two paths where we do the rendezvous. One is during cpu online
(either boot, hotplug, suspend/resume etc) where we init the MTRR
registers and another is when MTRR's change (through /proc/mtrr
interface) during runtime.

Before the commit 'd0af9eed5aa91b6b7b5049cae69e5ea956fd85c3' we were
indeed doing rendezvous of all the cpu's when we were dynamically
changing MTRR's.

And even in older kernels prior to 2.6.11 or so, we were doing
rendezvous on all occasions. And during cpu hotplug code revamp, we
broke the MTRR rendezvous for online paths which led to the WSM issue we
fixed in the commit 'd0af9eed5aa91b6b7b5049cae69e5ea956fd85c3'.

> From what I understand the explosion is WSM+/VMX on HT only because the
> siblings share state, or do we have proof that it yields problems
> between cores as well?

During dynamic MTRR register changes, I believe rendezvous of all the
cpu's is more critical, otherwise, there can be multiple cpu's accessing
the same memory location with different memory attributes and that can
potentially lead to memory corruption, machine-check etc.

During cpu online, I think we can be less aggressive (like perhaps
rendezvous only some of the HT/core siblings etc).

This is what we are planning to bring up with the cpu folks and see what
we can cut down and what is essential.

thanks,
suresh

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/