[PATCH] perf/x86/intel: Fix guest vPMU warning on hybrid CPUs

From: kan . liang
Date: Wed Jan 25 2023 - 15:28:51 EST


From: Kan Liang <kan.liang@xxxxxxxxxxxxxxx>

The below error can be observed in a Linux guest, when the hypervisor
is on a hybrid machine.

[ 0.118214] unchecked MSR access error: WRMSR to 0x38f (tried to
write 0x00011000f0000003f) at rIP: 0xffffffff83082124
(native_write_msr+0x4/0x30)
[ 0.118949] Call Trace:
[ 0.119092] <TASK>
[ 0.119215] ? __intel_pmu_enable_all.constprop.0+0x88/0xe0
[ 0.119533] intel_pmu_enable_all+0x15/0x20
[ 0.119778] x86_pmu_enable+0x17c/0x320

The current perf wrongly assumes that the perf metrics feature is always
enabled on p-core. It unconditionally enables the feature to workaround
the unreliable enumeration of the PERF_CAPABILITIES MSR. The assumption
is safe to bare metal. However, KVM doesn't support the perf metrics
feature yet. Setting the corresponding bit triggers MSR access error
in a guest.

Only unconditionally enable the core specific PMU feature for bare metal
on ADL and RPL, which includes the perf metrics on p-core and
PEBS-via-PT on e-core.
For the future platforms, perf doesn't need to hardcode the PMU feature.
The per-core PMU features can be enumerated by the enhanced
PERF_CAPABILITIES MSR and CPUID leaf 0x23. There is no such issue.

Fixes: f83d2f91d259 ("perf/x86/intel: Add Alder Lake Hybrid support")
Link: https://lore.kernel.org/lkml/e161b7c0-f0be-23c8-9a25-002260c2a085@xxxxxxxxxxxxxxx/
Reported-by: Pengfei Xu <pengfei.xu@xxxxxxxxx>
Signed-off-by: Kan Liang <kan.liang@xxxxxxxxxxxxxxx>
Cc: stable@xxxxxxxxxxxxxxx
---


Based on my limit knowledge regarding KVM and guest, I use the
HYPERVISOR bit to tell whether it's a guest. But I'm not sure whether
it's reliable. Please let me know if there is a better way. Thanks.


arch/x86/events/intel/core.c | 19 +++++++++++++++----
1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index bbb7846d3c1e..8d08929a7250 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -6459,8 +6459,17 @@ __init int intel_pmu_init(void)
__EVENT_CONSTRAINT(0, (1ULL << pmu->num_counters) - 1,
0, pmu->num_counters, 0, 0);
pmu->intel_cap.capabilities = x86_pmu.intel_cap.capabilities;
- pmu->intel_cap.perf_metrics = 1;
- pmu->intel_cap.pebs_output_pt_available = 0;
+ /*
+ * The capability bits are not reliable on ADL and RPL.
+ * For bare metal, it's safe to assume that some features
+ * are always enabled, e.g., the perf metrics on p-core,
+ * but we cannot do the same assumption for a hypervisor.
+ * Only update the core specific PMU feature for bare metal.
+ */
+ if (!boot_cpu_has(X86_FEATURE_HYPERVISOR)) {
+ pmu->intel_cap.perf_metrics = 1;
+ pmu->intel_cap.pebs_output_pt_available = 0;
+ }

memcpy(pmu->hw_cache_event_ids, spr_hw_cache_event_ids, sizeof(pmu->hw_cache_event_ids));
memcpy(pmu->hw_cache_extra_regs, spr_hw_cache_extra_regs, sizeof(pmu->hw_cache_extra_regs));
@@ -6480,8 +6489,10 @@ __init int intel_pmu_init(void)
__EVENT_CONSTRAINT(0, (1ULL << pmu->num_counters) - 1,
0, pmu->num_counters, 0, 0);
pmu->intel_cap.capabilities = x86_pmu.intel_cap.capabilities;
- pmu->intel_cap.perf_metrics = 0;
- pmu->intel_cap.pebs_output_pt_available = 1;
+ if (!boot_cpu_has(X86_FEATURE_HYPERVISOR)) {
+ pmu->intel_cap.perf_metrics = 0;
+ pmu->intel_cap.pebs_output_pt_available = 1;
+ }

memcpy(pmu->hw_cache_event_ids, glp_hw_cache_event_ids, sizeof(pmu->hw_cache_event_ids));
memcpy(pmu->hw_cache_extra_regs, tnt_hw_cache_extra_regs, sizeof(pmu->hw_cache_extra_regs));
--
2.35.1