Re: [PATCH v4 08/16] KVM: x86/pmu: Add IA32_DS_AREA MSR emulation to manage guest DS buffer

From: Xu, Like
Date: Thu Apr 08 2021 - 01:40:16 EST


Hi Peter,

Thanks for your detailed comments.

If you have more comments for other patches, please let me know.

On 2021/4/7 23:39, Peter Zijlstra wrote:
On Mon, Mar 29, 2021 at 01:41:29PM +0800, Like Xu wrote:
@@ -3869,10 +3876,12 @@ static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr, void *data)
if (arr[1].guest)
arr[0].guest |= arr[1].guest;
- else
+ else {
arr[1].guest = arr[1].host;
+ arr[2].guest = arr[2].host;
+ }
What's all this gibberish?

The way I read that it says:

if guest has PEBS_ENABLED
guest GLOBAL_CTRL |= PEBS_ENABLED
otherwise
guest PEBS_ENABLED = host PEBS_ENABLED
guest DS_AREA = host DS_AREA

which is just completely random garbage afaict. Why would you leak host
msrs into the guest?

In fact, this is not a leak at all.

When we do "arr[i].guest = arr[i].host;" assignment in the intel_guest_get_msrs(),
the KVM will check "if (msrs[i].host == msrs[i].guest)" and if so, it disables the atomic
switch for this msr during vmx transaction in the caller atomic_switch_perf_msrs().

In that case, the msr value doesn't change and any guest write will be trapped.
If the next check is "msrs[i].host != msrs[i].guest", the atomic switch will be triggered again.

Compared to before, this part of the logic has not changed, which helps to reduce overhead.

Why would you change guest GLOBAL_CTRL implicitly;

This is because in the early part of this function, we have operations:

    if (x86_pmu.flags & PMU_FL_PEBS_ALL)
        arr[0].guest &= ~cpuc->pebs_enabled;
    else
        arr[0].guest &= ~(cpuc->pebs_enabled & PEBS_COUNTER_MASK);

and if guest has PEBS_ENABLED, we need these bits back for PEBS counters:

    arr[0].guest |= arr[1].guest;

guest had better wrmsr that himself to control when stuff is enabled.

When vm_entry, the msr value of GLOBAL_CTRL on the hardware may be
different from trapped value "pmu->global_ctrl" written by the guest.

If the perf scheduler cross maps guest counter X to the host counter Y,
we have to enable the bit Y in GLOBAL_CTRL before vm_entry rather than X.


This just cannot be right.