Re: [PATCH v4 08/16] KVM: x86/pmu: Add IA32_DS_AREA MSR emulation to manage guest DS buffer

From: Xu, Like
Date: Thu Apr 08 2021 - 04:44:22 EST


On 2021/4/8 15:52, Peter Zijlstra wrote:
On Thu, Apr 08, 2021 at 01:39:49PM +0800, Xu, Like wrote:
Hi Peter,

Thanks for your detailed comments.

If you have more comments for other patches, please let me know.

On 2021/4/7 23:39, Peter Zijlstra wrote:
On Mon, Mar 29, 2021 at 01:41:29PM +0800, Like Xu wrote:
@@ -3869,10 +3876,12 @@ static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr, void *data)
if (arr[1].guest)
arr[0].guest |= arr[1].guest;
- else
+ else {
arr[1].guest = arr[1].host;
+ arr[2].guest = arr[2].host;
+ }
What's all this gibberish?

The way I read that it says:

if guest has PEBS_ENABLED
guest GLOBAL_CTRL |= PEBS_ENABLED
otherwise
guest PEBS_ENABLED = host PEBS_ENABLED
guest DS_AREA = host DS_AREA

which is just completely random garbage afaict. Why would you leak host
msrs into the guest?
In fact, this is not a leak at all.

When we do "arr[i].guest = arr[i].host;" assignment in the
intel_guest_get_msrs(), the KVM will check "if (msrs[i].host ==
msrs[i].guest)" and if so, it disables the atomic switch for this msr
during vmx transaction in the caller atomic_switch_perf_msrs().
Another marvel of bad coding style that function is :-( Lots of missing
{} and indentation fail.

Sorry for that and I'll fix them.


This is terrible though, why would we clear the guest MSRs when it
changes PEBS_ENABLED.

The values of arr[1].host and arr[1].guest depend on the arrangement of host perf:

        arr[1].host = cpuc->pebs_enabled & ~cpuc->intel_ctrl_guest_mask;
        arr[1].guest = cpuc->pebs_enabled & ~cpuc->intel_ctrl_host_mask;

rather than the guest value of PEBS_ENABLE.

When the value of this msr is different across vmx-transaction,
we will load arr[1].host after vm-exit and load arr[1].guest before vm-entry.

If the value of this msr is the same before and after vmx-transaction,
we do nothing and keep the original value on the register.

The guest had better clear them itself.

I don't understand what you are referring to here.

Can you explain what you think is the correct behavior here ?

Removing
guest DS_AREA just because we don't have any bits set in PEBS_ENABLED is
wrong and could very break all sorts of drivers.

Except for PEBS, other features that rely on DS_AREA are not available in the guest .

Can you explain more of your concerns for DS_AREA switch ?


In that case, the msr value doesn't change and any guest write will be
trapped. If the next check is "msrs[i].host != msrs[i].guest", the
atomic switch will be triggered again.

Compared to before, this part of the logic has not changed, which helps to
reduce overhead.
It's unreadable garbage at best. If you don't want it changed, then
don't add it to the arr[] thing in the first place.

Thanks, adding GLOBAL_CTRL to arr[] in the last step is a better choice.


Why would you change guest GLOBAL_CTRL implicitly;
This is because in the early part of this function, we have operations:

    if (x86_pmu.flags & PMU_FL_PEBS_ALL)
        arr[0].guest &= ~cpuc->pebs_enabled;
    else
        arr[0].guest &= ~(cpuc->pebs_enabled & PEBS_COUNTER_MASK);

and if guest has PEBS_ENABLED, we need these bits back for PEBS counters:

    arr[0].guest |= arr[1].guest;
I don't think that's right, who's to say they were set in the first
place? The guest's GLOBAL_CTRL could have had the bits cleared at VMEXIT
time.

Please note the guest GLOBAL_CTRL value is stored in the pmu->global_ctrl,
while the actual loaded value for GLOBAL_CTRL msr after vm-entry is
"x86_pmu.intel_ctrl & ~cpuc->intel_ctrl_host_mask".

You can't unconditionally add PEBS_ENABLED into GLOBAL_CTRL,
that's wrong.

The determination of the msr values ​​before and after vmx-transaction
are always in the context of host perf which means the PEBS perf_events
created by the KVM are all scheduled on and used legally , and it does not
depend on the guest values at all.


guest had better wrmsr that himself to control when stuff is enabled.
When vm_entry, the msr value of GLOBAL_CTRL on the hardware may be
different from trapped value "pmu->global_ctrl" written by the guest.

If the perf scheduler cross maps guest counter X to the host counter Y,
we have to enable the bit Y in GLOBAL_CTRL before vm_entry rather than X.
Sure, but I don't see that happening here.

Just fire questions if we're not on the same page or you're out of KVM context.