Re: [PATCH RFC 0/2] kvm: Improving undercommit,overcommit scenariosin PLE handler

From: Raghavendra K T
Date: Thu Sep 27 2012 - 07:27:34 EST


On 09/27/2012 02:06 PM, Avi Kivity wrote:
On 09/25/2012 03:40 PM, Raghavendra K T wrote:
On 09/24/2012 07:46 PM, Raghavendra K T wrote:
On 09/24/2012 07:24 PM, Peter Zijlstra wrote:
On Mon, 2012-09-24 at 18:59 +0530, Raghavendra K T wrote:
However Rik had a genuine concern in the cases where runqueue is not
equally distributed and lockholder might actually be on a different run
queue but not running.

Load should eventually get distributed equally -- that's what the
load-balancer is for -- so this is a temporary situation.

We already try and favour the non running vcpu in this case, that's what
yield_to_task_fair() is about. If its still not eligible to run, tough
luck.

Yes, I agree.


Do you think instead of using rq->nr_running, we could get a global
sense of load using avenrun (something like avenrun/num_onlinecpus)

To what purpose? Also, global stuff is expensive, so you should try and
stay away from it as hard as you possibly can.

Yes, that concern only had made me to fall back to rq->nr_running.

Will come back with the result soon.

Got the result with the patches:
So here is the result,

Tried this on a 32 core ple box with HT disabled. 32 guest vcpus with
1x and 2x overcommits

Base = 3.6.0-rc5 + ple handler optimization patches
A = Base + checking rq_running in vcpu_on_spin() patch
B = Base + checking rq->nr_running in sched/core
C = Base - PLE

---+-----------+-----------+-----------+-----------+
| Ebizzy result (rec/sec higher is better) |
---+-----------+-----------+-----------+-----------+
| Base | A | B | C |
---+-----------+-----------+-----------+-----------+
1x | 2374.1250 | 7273.7500 | 5690.8750 | 7364.3750|
2x | 2536.2500 | 2458.5000 | 2426.3750 | 48.5000|
---+-----------+-----------+-----------+-----------+

% improvements w.r.t BASE
---+------------+------------+------------+
| A | B | C |
---+------------+------------+------------+
1x | 206.37603 | 139.70410 | 210.19323 |
2x | -3.06555 | -4.33218 | -98.08773 |
---+------------+------------+------------+

we are getting the benefit of almost PLE disabled case with this
approach. With patch B, we have dropped a bit in gain.
(because we still would iterate vcpus until we decide to do a directed
yield).

This gives us a good case for tracking preemption on a per-vm basis. As
long as we aren't preempted, we can keep the PLE window high, and also
return immediately from the handler without looking for candidates.

1) So do you think, deferring preemption patch ( Vatsa was mentioning
long back) is also another thing worth trying, so we reduce the chance
of LHP.

IIRC, with defer preemption :
we will have hook in spinlock/unlock path to measure depth of lock held,
and shared with host scheduler (may be via MSRs now).
Host scheduler 'prefers' not to preempt lock holding vcpu. (or rather
give say one chance.

2) looking at the result (comparing A & C) , I do feel we have
significant in iterating over vcpus (when compared to even vmexit)
so We still would need undercommit fix sugested by PeterZ (improving by
140%). ?

So looking back at threads/ discussions so far, I am trying to
summarize, the discussions so far. I feel, at least here are the few
potential candidates to go in:

1) Avoiding double runqueue lock overhead (Andrew Theurer/ PeterZ)
2) Dynamically changing PLE window (Avi/Andrew/Chegu)
3) preempt_notify handler to identify preempted VCPUs (Avi)
4) Avoiding iterating over VCPUs in undercommit scenario. (Raghu/PeterZ)
5) Avoiding unnecessary spinning in overcommit scenario (Raghu/Rik)
6) Pv spinlock
7) Jiannan's proposed improvements
8) Defer preemption patches

Did we miss anything (or added extra?)

So here are my action items:
- I plan to repost this series with what PeterZ, Rik suggested with
performance analysis.
- I ll go back and explore on (3) and (6) ..

Please Let me know..






--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/