Re: [PATCH] KVM: x86/svm: Add module param to control PMU virtualization

From: Jim Mattson
Date: Mon Jan 10 2022 - 22:26:00 EST


On Mon, Jan 10, 2022 at 6:11 PM Like Xu <like.xu.linux@xxxxxxxxx> wrote:
>
> On 11/1/2022 2:13 am, Jim Mattson wrote:
> > On Sun, Jan 9, 2022 at 10:23 PM Like Xu <like.xu.linux@xxxxxxxxx> wrote:
> >>
> >> On 9/1/2022 9:23 am, Jim Mattson wrote:
> >>> On Fri, Dec 10, 2021 at 7:48 PM Jim Mattson <jmattson@xxxxxxxxxx> wrote:
> >>>>
> >>>> On Fri, Dec 10, 2021 at 6:15 PM Paolo Bonzini <pbonzini@xxxxxxxxxx> wrote:
> >>>>>
> >>>>> On 12/10/21 20:25, Jim Mattson wrote:
> >>>>>> In the long run, I'd like to be able to override this system-wide
> >>>>>> setting on a per-VM basis, for VMs that I trust. (Of course, this
> >>>>>> implies that I trust the userspace process as well.)
> >>>>>>
> >>>>>> How would you feel if we were to add a kvm ioctl to override this
> >>>>>> setting, for a particular VM, guarded by an appropriate permissions
> >>>>>> check, like capable(CAP_SYS_ADMIN) or capable(CAP_SYS_MODULE)?
> >>>>>
> >>>>> What's the rationale for guarding this with a capability check? IIRC
> >>>>> you don't have such checks for perf_event_open (apart for getting kernel
> >>>>> addresses, which is not a problem for virtualization).
> >>>>
> >>>> My reasoning was simply that for userspace to override a mode 0444
> >>>> kernel module parameter, it should have the rights to reload the
> >>>> module with the parameter override. I wasn't thinking specifically
> >>>> about PMU capabilities.
> >>
> >> Do we have a precedent on any module parameter rewriting for privileger ?
> >>
> >> A further requirement is whether we can dynamically change this part of
> >> the behaviour when the guest is already booted up.
> >>
> >>>
> >>> Assuming that we trust userspace to decide whether or not to expose a
> >>> virtual PMU to a guest (as we do on the Intel side), perhaps we could
> >>> make use of the existing PMU_EVENT_FILTER to give us per-VM control,
> >>> rather than adding a new module parameter for per-host control. If
> >>
> >> Various granularities of control are required to support vPMU production
> >> scenarios, including per-host, per-VM, and dynamic-guest-alive control.
> >>
> >>> userspace calls KVM_SET_PMU_EVENT_FILTER with an action of
> >>> KVM_PMU_EVENT_ALLOW and an empty list of allowed events, KVM could
> >>> just disable the virtual PMU for that VM.
> >>
> >> AMD will also have "CPUID Fn8000_0022_EBX[NumCorePmc, 3:0]".
> >
> > Where do you see this? Revision 3.33 (November 2021) of the AMD APM,
> > volume 3, only goes as high as CPUID Fn8000_0021.
>
> Try APM Revision: 4.04 (November 2021), page 1849/3273,
> "CPUID Fn8000_0022_EBX Extended Performance Monitoring and Debug".

Is this a public document?

> Given the current ambiguity in this revision, the AMD folks will reveal more
> details bout this field in the next revision.
>
> >
> >>>
> >>> Today, the semantics of an empty allow list are quite different from
> >>> the proposed pmuv module parameter being false. However, it should be
> >>> an easy conversion. Would anyone be concerned about changing the
> >>> current semantics of an empty allow list? Is there a need for
> >>> disabling PMU virtualization for legacy userspace implementations that
> >>> can't be modified to ask for an empty allow list?
> >>>
> >>
> >> AFAI, at least one user-space agent has integrated with it plus additional
> >> "action"s.
> >>
> >> Once the API that the kernel presents to user space has been defined,
> >> it's best not to change it and instead fall into remorse.
> >
> > Okay.
> >
> > I propose the following:
> > 1) The new module parameter should apply to Intel as well as AMD, for
> > situations where userspace is not trusted.
> > 2) If the module parameter allows PMU virtualization, there should be
> > a new KVM_CAP whereby userspace can enable/disable PMU virtualization.
> > (Since you require a dynamic toggle, and there is a move afoot to
> > refuse guest CPUID changes once a guest is running, this new KVM_CAP
> > is needed on Intel as well as AMD).
>
> Both hands in favour. Do you need me as a labourer, or you have a ready-made one ?

We could split the work. Since (1) is a modification of the change you
proposed in this thread, perhaps you could apply it to both AMD and
Intel in v2? We can find someone for (2).

> > 3) If the module parameter does not allow PMU virtualization, there
> > should be no userspace override, since we have no precedent for
> > authorizing that kind of override.
>
> Uh, I thought you (Google) had a lot of these (interesting) use cases internally.

We have modified some module parameters so that they can be changed at
runtime, but we don't have any concept of a privileged userspace
overriding a module parameter restriction.

> >
> >> "But I am not a decision maker. " :D
> >>
> >> Thanks,
> >> Like Xu
> >>