[PATCH v2 12/12] KVM: x86/pmu: Clear reserved bit PERF_CTL2[43] for AMD erratum 1292

From: Like Xu
Date: Wed Mar 02 2022 - 06:14:59 EST


From: Like Xu <likexu@xxxxxxxxxxx>

The AMD Family 19h Models 00h-0Fh Processors may experience sampling
inaccuracies that cause the following performance counters to overcount
retire-based events. To count the non-FP affected PMC events correctly,
a patched guest with a target vCPU model would:

- Use Core::X86::Msr::PERF_CTL2 to count the events, and
- Program Core::X86::Msr::PERF_CTL2[43] to 1b, and
- Program Core::X86::Msr::PERF_CTL2[20] to 0b.

To support this use of AMD guests, KVM should not reserve bit 43
only for counter #2. Treatment of other cases remains unchanged.

AMD hardware team clarified that the conditions under which the
overcounting can happen, is quite rare. This change may make those
PMU driver developers who have read errata #1292 less disappointed.

Reported-by: Jim Mattson <jmattson@xxxxxxxxxx>
Signed-off-by: Like Xu <likexu@xxxxxxxxxxx>
---
arch/x86/kvm/svm/pmu.c | 20 +++++++++++++++++++-
1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c
index 41c9b9e2aec2..05b4e4f2bb66 100644
--- a/arch/x86/kvm/svm/pmu.c
+++ b/arch/x86/kvm/svm/pmu.c
@@ -18,6 +18,20 @@
#include "pmu.h"
#include "svm.h"

+/*
+ * As a workaround of "Retire Based Events May Overcount" for erratum 1292,
+ * some patched guests may set PERF_CTL2[43] to 1b and PERF_CTL2[20] to 0b
+ * to count the non-FP affected PMC events correctly.
+ *
+ * Note, tests show that the counter difference before and after using the
+ * workaround is not significant. Host will be scheduling CTR2 indiscriminately.
+ */
+static inline bool vcpu_overcount_retire_events(struct kvm_vcpu *vcpu)
+{
+ return guest_cpuid_family(vcpu) == 0x19 &&
+ guest_cpuid_model(vcpu) < 0x10;
+}
+
enum pmu_type {
PMU_TYPE_COUNTER = 0,
PMU_TYPE_EVNTSEL,
@@ -224,6 +238,7 @@ static int amd_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
struct kvm_pmc *pmc;
u32 msr = msr_info->index;
u64 data = msr_info->data;
+ u64 reserved_bits;

/* MSR_PERFCTRn */
pmc = get_gp_pmc_amd(pmu, msr, PMU_TYPE_COUNTER);
@@ -236,7 +251,10 @@ static int amd_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
if (pmc) {
if (data == pmc->eventsel)
return 0;
- if (!(data & pmu->reserved_bits)) {
+ reserved_bits = pmu->reserved_bits;
+ if (pmc->idx == 2 && vcpu_overcount_retire_events(vcpu))
+ reserved_bits &= ~BIT_ULL(43);
+ if (!(data & reserved_bits)) {
pmc->eventsel = data;
reprogram_counter(pmc);
return 0;
--
2.35.1