Re: [patch 0/3] KVM CPU frequency change hypercalls

From: Marcelo Tosatti
Date: Fri Feb 03 2017 - 13:25:01 EST


On Fri, Feb 03, 2017 at 05:43:50PM +0100, Radim Krcmar wrote:
> 2017-02-02 15:47-0200, Marcelo Tosatti:
> > Implement KVM hypercalls for the guest
> > to issue frequency changes.
> >
> > Current situation with DPDK and frequency changes is as follows:
> > An algorithm in the guest decides when to increase/decrease
> > frequency based on the queue length of the device.
>
> Does the algorithm compute with the magnitude of frequency steps?
>
> (e.g. if CPU can step with 200 MHz granularity, does the algorithm ever
> do 400 MHz at once, because it assumes that frequency would be enough
> to handle the load?)

No, it does not know the frequency directly. It only "knows" the
frequency indirectly by the size of the network queue (that is, if the
network queue is above a threshold, then frequency is "too low" and
should be increased).

> > On the host, a power manager daemon is used to listen for
> > frequency change requests (on another core) and issue these
> > requests.
> >
> > However frequency changes are performance sensitive events because:
> > On a change from low load condition to max load condition,
> > the frequency should be raised as soon as possible.
> > Sending a virtio-serial notification to another pCPU,
> > waiting for that pCPU to initiate an IPI to the requestor pCPU
> > to change frequency, is slower and more cache costly than
> > a direct hypercall to host to switch the frequency.
> >
> > If the pCPU where the power manager daemon is running
> > is not busy spinning on requests from the isolated DPDK vcpus,
> > there is also the cost of HLT wakeup for that pCPU.
> >
> > Moreover, the daemon serves multiple VMs, meaning that
> > the scheme is subject to additional delays from
> > queueing of power change requests from VMs.
>
> (Wow, this must be bringing humanity to its doom faster than the heat it
> helps to eliminate.)
> > A direct hypercall from userspace is the fastest most direct
> > method for the guest to change frequency and does not suffer
> > from the issues above.
>
> Right, userspace on bare-metal cannot change frequency directly.

Yes it can: write to sysfs (not sure what you meant).

> > The usage scenario for this hypercalls is for pinned vCPUs <-> pCPUs.
>
> And pinned tasks <-> vCPUs, because the guest kernel has no idea what
> frequency is being used or desired on its virtualware,

And it does not have to know...

> so the kernel
> cannot even change frequency without introducing a bug ...

Not sure what are you thinking, please be more verbose.

> I'm not happy about this hole through layers of isolations.
>
> The domain of valid users is very small and a problem is that any
> program with access to /dev/kvm gains the ability to change host CPU
> frequency if the host happens to use the userspace governor.

Yes.

> We should at least enable this feature only if /dev/kvm is root-only.

Fine, can change that, will fix in -v2. Maybe there is a capability
to change frequency... should require that capability (or root
if there is none).