Re: [PATCH] kvm: handle last_boosted_vcpu = 0 case

From: Raghavendra K T
Date: Mon Jul 02 2012 - 23:32:42 EST


On 07/02/2012 08:19 PM, Rik van Riel wrote:
On 06/28/2012 06:55 PM, Vinod, Chegu wrote:
Hello,

I am just catching up on this email thread...

Perhaps one of you may be able to help answer this query.. preferably
along with some data. [BTW, I do understand the basic intent behind
PLE in a typical [sweet spot] use case where there is over
subscription etc. and the need to optimize the PLE handler in the host
etc. ]

In a use case where the host has fewer but much larger guests (say
40VCPUs and higher) and there is no over subscription (i.e. # of vcpus
across guests<= physical cpus in the host and perhaps each guest has
their vcpu's pinned to specific physical cpus for other reasons), I
would like to understand if/how the PLE really helps ? For these use
cases would it be ok to turn PLE off (ple_gap=0) since is no real need
to take an exit and find some other VCPU to yield to ?

Yes, that should be ok.

I think this should be true when we have ple_window tuned to correct
value for guest. (same what you raised)

But otherwise, IMO, it is a very tricky question to answer. PLE is
currently benefiting even flush_tlb_ipi etc apart from spinlock. Having
a properly tuned value for all types of workload, (+load) is really
complicated.
Coming back to ple_handler, IMHO, if we have slight increase in
run_queue length, having directed yield may worsen the scenario.

(In the case Vinod explained, even-though we will succeed in setting
other vcpu task as next_buddy, caller itself gets scheduled out, so
ganging effect reduces. on top of this we always have a question, have we chosen right guy OR a really bad guy for yielding.)


On a related note, I wonder if we should increase the ple_gap
significantly.

Did you mean ple_window?


After all, 4096 cycles of spinning is not that much, when you
consider how much time is spent doing the subsequent vmexit,
scanning the other VCPU's status (200 cycles per cache miss),
deciding what to do, maybe poking another CPU, and eventually
a vmenter.

A factor 4 increase in ple_gap might be what it takes to
get the amount of time spent spinning equal to the amount of
time spent on the host side doing KVM stuff...


I agree, I am experimenting with all these things left and right, along
with several optimization ideas I have. Hope to comeback on the
experiments soon.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/