On Fri, 2012-09-07 at 23:36 +0530, Raghavendra K T wrote:[...]CCing PeterZ also.
On 09/07/2012 06:41 PM, Andrew Theurer wrote:I have noticed recently that PLE/yield_to() is still not that scalable
for really large guests, sometimes even with no CPU over-commit. I have
a small change that make a very big difference.
We are indeed avoiding CPUS in guest mode when we checkMy understanding is that it checks if the candidate vcpu task is in
task->flags& PF_VCPU in vcpu_on_spin path. Doesn't that suffice?
guest mode (let's call this vcpu g1vcpuN), and that vcpu will not be a
target to yield to if it is already in guest mode. I am concerned about
a different vcpu, possibly from a different VM (let's call it g2vcpuN),
but it also located on the same runqueue as g1vcpuN -and- running. That
vcpu, g2vcpuN, may also be doing a directed yield, and it may already be
holding the rq lock. Or it could be in guest mode. If it is in guest
mode, then let's still target this rq, and try to yield to g1vcpuN.
However, if g2vcpuN is not in guest mode, then don't bother trying.
Patch include below.
Here's the new, v2 result with the previous two:
10 VMs, 16-way each, all running dbench (2x cpu over-commit)
throughput +/- stddev
ple on: 2552 +/- .70%
ple on: w/fixv1: 4621 +/- 2.12% (81% improvement)
ple on: w/fixv2: 6115* (139% improvement)
[*] I do not have stdev yet because all 10 runs are not complete[...]
for v1 to v2, host CPU dropped from 60% to 50%. Time in spin_lock() is
So this seems to be working. However I wonder just how far we can take
this. Ideally we need to be in<3-4% in host for PLE work, like I
observe for the 8-way VMs. We are still way off.
signed-off-by: Andrew Theurer<habanero@xxxxxxxxxxxxxxxxxx>
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index fbf1fd0..c767915 100644
@@ -4844,6 +4844,9 @@ bool __sched yield_to(struct task_struct *p, bool
p_rq = task_rq(p);
+ if (task_running(p_rq, p) || p->state || !(p_rq->curr->flags&
+ goto out_no_unlock;