On 06/28/2012 06:55 PM, Vinod, Chegu wrote:Hello,
I am just catching up on this email thread...
Perhaps one of you may be able to help answer this query.. preferably
along with some data. [BTW, I do understand the basic intent behind
PLE in a typical [sweet spot] use case where there is over
subscription etc. and the need to optimize the PLE handler in the host
etc. ]
In a use case where the host has fewer but much larger guests (say
40VCPUs and higher) and there is no over subscription (i.e. # of vcpus
across guests<= physical cpus in the host and perhaps each guest has
their vcpu's pinned to specific physical cpus for other reasons), I
would like to understand if/how the PLE really helps ? For these use
cases would it be ok to turn PLE off (ple_gap=0) since is no real need
to take an exit and find some other VCPU to yield to ?
Yes, that should be ok.
On a related note, I wonder if we should increase the ple_gap
significantly.
After all, 4096 cycles of spinning is not that much, when you
consider how much time is spent doing the subsequent vmexit,
scanning the other VCPU's status (200 cycles per cache miss),
deciding what to do, maybe poking another CPU, and eventually
a vmenter.
A factor 4 increase in ple_gap might be what it takes to
get the amount of time spent spinning equal to the amount of
time spent on the host side doing KVM stuff...