Re: [PATCH] KVM: x86/pmu: Isolate TSX specific perf_event_attr.attr logic for AMD

From: Jim Mattson
Date: Mon Mar 07 2022 - 16:43:05 EST


On Sun, Mar 6, 2022 at 10:38 PM Like Xu <like.xu.linux@xxxxxxxxx> wrote:
>
> From: Like Xu <likexu@xxxxxxxxxxx>
>
> HSW_IN_TX* bits are used in generic code which are not supported on
> AMD. Worse, these bits overlap with AMD EventSelect[11:8] and hence
> using HSW_IN_TX* bits unconditionally in generic code is resulting in
> unintentional pmu behavior on AMD. For example, if EventSelect[11:8]
> is 0x2, pmc_reprogram_counter() wrongly assumes that
> HSW_IN_TX_CHECKPOINTED is set and thus forces sampling period to be 0.
>
> Opportunistically remove two TSX specific incoming parameters for
> the generic interface reprogram_counter().
>
> Fixes: 103af0a98788 ("perf, kvm: Support the in_tx/in_tx_cp modifiers in KVM arch perfmon emulation v5")
> Co-developed-by: Ravi Bangoria <ravi.bangoria@xxxxxxx>
> Signed-off-by: Ravi Bangoria <ravi.bangoria@xxxxxxx>
> Signed-off-by: Like Xu <likexu@xxxxxxxxxxx>
> ---
> Note: this patch is based on [1] which is considered to be a necessary cornerstone.
> [1] https://lore.kernel.org/kvm/20220302111334.12689-1-likexu@xxxxxxxxxxx/
>
> arch/x86/kvm/pmu.c | 29 ++++++++++++++---------------
> 1 file changed, 14 insertions(+), 15 deletions(-)
>
> diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
> index 17c61c990282..d0f9515c37dd 100644
> --- a/arch/x86/kvm/pmu.c
> +++ b/arch/x86/kvm/pmu.c
> @@ -99,8 +99,7 @@ static void kvm_perf_overflow(struct perf_event *perf_event,
>
> static void pmc_reprogram_counter(struct kvm_pmc *pmc, u32 type,
> u64 config, bool exclude_user,
> - bool exclude_kernel, bool intr,
> - bool in_tx, bool in_tx_cp)
> + bool exclude_kernel, bool intr)
> {
> struct perf_event *event;
> struct perf_event_attr attr = {
> @@ -116,16 +115,18 @@ static void pmc_reprogram_counter(struct kvm_pmc *pmc, u32 type,
>
> attr.sample_period = get_sample_period(pmc, pmc->counter);
>
> - if (in_tx)
> - attr.config |= HSW_IN_TX;
> - if (in_tx_cp) {
> - /*
> - * HSW_IN_TX_CHECKPOINTED is not supported with nonzero
> - * period. Just clear the sample period so at least
> - * allocating the counter doesn't fail.
> - */
> - attr.sample_period = 0;
> - attr.config |= HSW_IN_TX_CHECKPOINTED;
> + if (guest_cpuid_is_intel(pmc->vcpu)) {

This is not the right condition to check. Per the SDM, both bits 32
and 33 "may only be set if the processor supports HLE or RTM." On
other Intel processors, this bit is reserved and any attempts to set
them result in a #GP.

> + if (pmc->eventsel & HSW_IN_TX)
> + attr.config |= HSW_IN_TX;

This statement does nothing. If HSW_IN_TX is set in pmc->eventsel, it
is set in attr.config already.

> + if (pmc->eventsel & HSW_IN_TX_CHECKPOINTED) {
> + /*
> + * HSW_IN_TX_CHECKPOINTED is not supported with nonzero
> + * period. Just clear the sample period so at least
> + * allocating the counter doesn't fail.
> + */
> + attr.sample_period = 0;
> + attr.config |= HSW_IN_TX_CHECKPOINTED;

As above, this statement does nothing. We should just set
attr.sample_period to 0. Note, however, that the SDM documents an
additional constraint which is ignored here: "This bit may only be set
for IA32_PERFEVTSEL2." I have confirmed that a #GP is raised for an
attempt to set bit 33 in any PerfEvtSeln other than PerfEvtSel2 on a
Broadwell Xeon E5.

> + }
> }
>
> event = perf_event_create_kernel_counter(&attr, -1, current,
> @@ -268,9 +269,7 @@ void reprogram_counter(struct kvm_pmc *pmc)
> (eventsel & AMD64_RAW_EVENT_MASK),
> !(eventsel & ARCH_PERFMON_EVENTSEL_USR),
> !(eventsel & ARCH_PERFMON_EVENTSEL_OS),
> - eventsel & ARCH_PERFMON_EVENTSEL_INT,
> - (eventsel & HSW_IN_TX),
> - (eventsel & HSW_IN_TX_CHECKPOINTED));
> + eventsel & ARCH_PERFMON_EVENTSEL_INT);
> }
> EXPORT_SYMBOL_GPL(reprogram_counter);
>
> --
> 2.35.1
>