Re: [patch 0/3] KVM CPU frequency change hypercalls

From: Paolo Bonzini
Date: Wed Mar 01 2017 - 09:30:15 EST




On 28/02/2017 03:45, Marcelo Tosatti wrote:
> On Fri, Feb 24, 2017 at 04:34:52PM +0100, Paolo Bonzini wrote:
>>
>>
>> On 24/02/2017 14:04, Marcelo Tosatti wrote:
>>>>>>> Whats the current usecase, or forseeable future usecase, for save/restore
>>>>>>> across preemption again? (which would validate the broken by design
>>>>>>> claim).
>>>>>> Stop a guest that is using cpufreq, start a guest that is not using it.
>>>>>> The second guest's performance now depends on the state that the first
>>>>>> guest left in cpufreq.
>>>>> Nothing forbids the host to implement switching with the
>>>>> current hypercall interface: all you need is a scheduler
>>>>> hook.
>>>> Can it be done in vcpu_load/vcpu_put? But you still would have two
>>>> components (KVM and sysfs) potentially fighting over the frequency, and
>>>> that's still a bit ugly.
>>>
>>> Change the frequency at vcpu_load/vcpu_put? Yes: call into
>>> cpufreq-userspace. But there is no notion of "per-task frequency" on the
>>> Linux kernel (which was the starting point of this subthread).
>>
>> There isn't, but this patchset is providing a direct path from a task to
>> cpufreq-userspace. This is as close as you can get to a per-task frequency.
>
> Cpufreq-userspace is supposed to be used by tasks in userspace.
> Thats why its called "userspace".

I think the intended usecase is to have a daemon handling a systemwide
policy. Examples are the historical (and now obsolete) users such as
cpufreqd, cpudyn, powernowd, or cpuspeed. The user alternatively can
play the role of the daemon by writing to sysfs.

I've never seen userspace tasks talking to cpufreq-userspace to set
their own running frequency. If DPDK does it, that's nasty in my
opinion and we should find an interface that works best for both DPDK
and KVM. Which should be done on linux-pm like Rafael suggested.

>>> But if you configure all CPUs in the system as cpufreq-userspace,
>>> then some other (userspace program) has to decide the frequency
>>> for the other CPUs.
>>>
>>> Which agent would do that and why? Thats why i initially said "whats the
>>> usecase".
>>
>> You could just pin them at the highest non-TurboBoost frequency until a
>> guest runs. That's assuming that they are idle and, because of
>> isol_cpus/nohz_full, they would be almost always in deep C state anyway.
>
> The original claim of the thread was: "this feature (frequency
> hypercalls) works for pinned vcpu<->pcpu, pcpu dedicated exclusively
> to vcpu case, lets try to extend this to other cases".
>
> Which is a valid and useful direction to go.
>
> However there is no user for multiple vcpus in the same pcpu now.

You are still ignoring the case of one guest started after another, or
of another program started on a CPU that formerly was used by KVM. They
don't have to be multiple users at the same time.

> If there were multiple vcpus, all of them requesting a given
> frequency, it would be necessary to:
>
> 1) Maintain frequency of the pcpu to the highest
> frequencies.
>
> OR
>
> 2) Since switching frequencies can take up to 70us (*)
> (depends on processor), its generally not worthwhile
> to switch frequencies between task switches.

Is latency that important, or is rather overhead the one to pay
attention to? The slides you linked
(http://www.ena-hpc.org/2013/pdf/04.pdf) at page 17 suggest it's around
10us.

One possibility is to do (1) if you have multiple tasks on the run queue
(or fallback to what is specified in sysfs) and (2) if you only have one
task.

Anyway, please repost with Cc to linux-pm so that we can restart the
discussion there.

Paolo