Re: [patch 3/3] KVM: x86: frequency change hypercalls

From: Radim Krcmar
Date: Fri Feb 03 2017 - 12:40:48 EST


2017-02-02 15:47-0200, Marcelo Tosatti:
> Implement min/max/up/down frequency change
> KVM hypercalls. To be used by DPDK implementation.
>
> Also allow such hypercalls from guest userspace.
>
> Signed-off-by: Marcelo Tosatti <mtosatti@xxxxxxxxxx>
>
> ---
> Index: kvm-pvfreq/arch/x86/kvm/x86.c
> ===================================================================
> --- kvm-pvfreq.orig/arch/x86/kvm/x86.c 2017-02-02 11:17:17.063756725 -0200
> +++ kvm-pvfreq/arch/x86/kvm/x86.c 2017-02-02 11:17:17.822752510 -0200
> @@ -6219,10 +6219,58 @@

[Here lived copy-paste.]

> int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
> {
> unsigned long nr, a0, a1, a2, a3, ret;
> int op_64_bit, r;
> + bool cpl_check;
>
> r = kvm_skip_emulated_instruction(vcpu);
>
> @@ -6246,7 +6294,13 @@
> a3 &= 0xFFFFFFFF;
> }
>
> - if (kvm_x86_ops->get_cpl(vcpu) != 0) {
> + cpl_check = true;
> + if (nr == KVM_HC_FREQ_UP || nr == KVM_HC_FREQ_DOWN ||
> + nr == KVM_HC_FREQ_MIN || nr == KVM_HC_FREQ_MAX)
> + if (vcpu->arch.allow_freq_hypercall == true)
> + cpl_check = false;
> +
> + if (cpl_check == true && kvm_x86_ops->get_cpl(vcpu) != 0) {
> ret = -KVM_EPERM;
> goto out;
> }
> @@ -6262,6 +6316,21 @@
> case KVM_HC_CLOCK_PAIRING:
> ret = kvm_pv_clock_pairing(vcpu, a0, a1);
> break;
> +#ifdef CONFIG_CPU_FREQ_GOV_USERSPACE

CONFIG_CPU_FREQ_GOV_USERSPACE should be checked when enabling the
capability.

> + case KVM_HC_FREQ_UP:
> + ret = kvm_pvfreq_up(vcpu);
> + break;
> + case KVM_HC_FREQ_DOWN:
> + ret = kvm_pvfreq_down(vcpu);
> + break;
> + case KVM_HC_FREQ_MAX:
> + ret = kvm_pvfreq_max(vcpu);
> + break;
> + case KVM_HC_FREQ_MIN:
> + ret = kvm_pvfreq_min(vcpu);
> + break;

Having 4 hypercalls for this is an overkill.
You can make it one hypercall with an argument.

And the argument doesn't have to be enum {UP, DOWN, MAX, MIN}, but an
int, which would also allow you to do -2 steps.
A number over the capabilites of stepping would just map to MAX/MIN.

Avoiding an absolute scale for interface simplifies migration, where the
guest cannot really depend much on this. Except that calling it with
MIN (INT_MIN) will get the minimum and MAX (INT_MAX) the maximum
frequency.

Plese explictly say in documentation that things like the number of
steps, which the guest can learn by doing MAX and then -1 until the
hypercall fails, is undefined and should not be depended upon.

Userspace might still want know the number of steps to avoid useless
hypercall -- I think we should return a different value when the limit
is reached, not just after the guest wants to go past it.

> +#endif
> +
> default:
> ret = -KVM_ENOSYS;
> break;

And thinking more about migration, userspace cannot learn the current
frequency (at least MIN/MAX), so the new host will just pick at random,
which will break userspace's expectations that it cannot increase or
decrease the frequency. Is migration left for the future, because DPDK
doesn't migrate anyway?

Thanks.