Re: [PATCH v4 08/16] KVM: x86/pmu: Add IA32_DS_AREA MSR emulation to manage guest DS buffer

From: Peter Zijlstra
Date: Thu Apr 08 2021 - 03:53:21 EST


On Thu, Apr 08, 2021 at 01:39:49PM +0800, Xu, Like wrote:
> Hi Peter,
>
> Thanks for your detailed comments.
>
> If you have more comments for other patches, please let me know.
>
> On 2021/4/7 23:39, Peter Zijlstra wrote:
> > On Mon, Mar 29, 2021 at 01:41:29PM +0800, Like Xu wrote:
> > > @@ -3869,10 +3876,12 @@ static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr, void *data)
> > > if (arr[1].guest)
> > > arr[0].guest |= arr[1].guest;
> > > - else
> > > + else {
> > > arr[1].guest = arr[1].host;
> > > + arr[2].guest = arr[2].host;
> > > + }
> > What's all this gibberish?
> >
> > The way I read that it says:
> >
> > if guest has PEBS_ENABLED
> > guest GLOBAL_CTRL |= PEBS_ENABLED
> > otherwise
> > guest PEBS_ENABLED = host PEBS_ENABLED
> > guest DS_AREA = host DS_AREA
> >
> > which is just completely random garbage afaict. Why would you leak host
> > msrs into the guest?
>
> In fact, this is not a leak at all.
>
> When we do "arr[i].guest = arr[i].host;" assignment in the
> intel_guest_get_msrs(), the KVM will check "if (msrs[i].host ==
> msrs[i].guest)" and if so, it disables the atomic switch for this msr
> during vmx transaction in the caller atomic_switch_perf_msrs().

Another marvel of bad coding style that function is :-( Lots of missing
{} and indentation fail.

This is terrible though, why would we clear the guest MSRs when it
changes PEBS_ENABLED. The guest had better clear them itself. Removing
guest DS_AREA just because we don't have any bits set in PEBS_ENABLED is
wrong and could very break all sorts of drivers.

> In that case, the msr value doesn't change and any guest write will be
> trapped. If the next check is "msrs[i].host != msrs[i].guest", the
> atomic switch will be triggered again.
>
> Compared to before, this part of the logic has not changed, which helps to
> reduce overhead.

It's unreadable garbage at best. If you don't want it changed, then
don't add it to the arr[] thing in the first place.

> > Why would you change guest GLOBAL_CTRL implicitly;
>
> This is because in the early part of this function, we have operations:
>
>     if (x86_pmu.flags & PMU_FL_PEBS_ALL)
>         arr[0].guest &= ~cpuc->pebs_enabled;
>     else
>         arr[0].guest &= ~(cpuc->pebs_enabled & PEBS_COUNTER_MASK);
>
> and if guest has PEBS_ENABLED, we need these bits back for PEBS counters:
>
>     arr[0].guest |= arr[1].guest;

I don't think that's right, who's to say they were set in the first
place? The guest's GLOBAL_CTRL could have had the bits cleared at VMEXIT
time. You can't unconditionally add PEBS_ENABLED into GLOBAL_CTRL,
that's wrong.

> > guest had better wrmsr that himself to control when stuff is enabled.
>
> When vm_entry, the msr value of GLOBAL_CTRL on the hardware may be
> different from trapped value "pmu->global_ctrl" written by the guest.
>
> If the perf scheduler cross maps guest counter X to the host counter Y,
> we have to enable the bit Y in GLOBAL_CTRL before vm_entry rather than X.

Sure, but I don't see that happening here.