[PATCH v1 02/11] perf/x86/ds: Handle guest PEBS events overflow and inject fake PMI

From: Luwei Kang
Date: Thu Mar 05 2020 - 04:59:06 EST


From: Kan Liang <kan.liang@xxxxxxxxxxxxxxx>

With PEBS virtualization, the PEBS record gets delivered to the guest,
but host still sees the PMI. This would normally result in a spurious
PEBS PMI that is ignored. But we need to inject the PMI into the guest,
so that the guest PMI handler can handle the PEBS record.

Check for this case in the perf PEBS handler. If a guest PEBS counter
overflowed, a fake event will be triggered. The fake event results in
calling the KVM PMI callback, which injects the PMI into the guest.

No matter how many PEBS counters are overflowed, only triggering one
fake event is enough. Then the guest handler would retrieve the correct
information from its own PEBS records including the guest state.

Originally-by: Andi Kleen <ak@xxxxxxxxxxxxxxx>
Signed-off-by: Kan Liang <kan.liang@xxxxxxxxxxxxxxx>
---
arch/x86/events/intel/ds.c | 59 ++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 59 insertions(+)

diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index dc43cc1..6722f39 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -1721,6 +1721,62 @@ void intel_pmu_auto_reload_read(struct perf_event *event)
return 0;
}

+/*
+ * We may be running with virtualized PEBS, so the PEBS record
+ * was logged into the guest's DS and is invisible to host.
+ *
+ * For guest-dedicated counters we always have to check if the
+ * counters are overflowed, because PEBS thresholds
+ * are not reported in the PERF_GLOBAL_STATUS.
+ *
+ * In this case we just trigger a fake event for KVM to forward
+ * to the guest as PMI. The guest will then see the real PEBS
+ * record and read the counter values.
+ *
+ * The contents of the event do not matter.
+ */
+static int intel_pmu_handle_guest_pebs(struct cpu_hw_events *cpuc,
+ struct pt_regs *iregs,
+ struct debug_store *ds)
+{
+ struct perf_sample_data data;
+ struct perf_event *event;
+ int bit;
+
+ /*
+ * Ideally, we should check guest DS to understand if the
+ * guest-dedicated PEBS counters are overflowed.
+ *
+ * However, it brings high overhead to retrieve guest DS in host.
+ * The host and guest cannot have pending PEBS events simultaneously.
+ * So we check host DS instead.
+ *
+ * If PEBS interrupt threshold on host is not exceeded in a NMI,
+ * the guest-dedicated counter must be overflowed.
+ */
+ if (!cpuc->intel_ctrl_guest_dedicated_mask || !in_nmi() ||
+ (ds->pebs_interrupt_threshold <= ds->pebs_index))
+ return 0;
+
+ for_each_set_bit(bit,
+ (unsigned long *)&cpuc->intel_ctrl_guest_dedicated_mask,
+ INTEL_PMC_IDX_FIXED + x86_pmu.num_counters_fixed) {
+
+ event = cpuc->events[bit];
+ if (!event->attr.precise_ip)
+ continue;
+
+ perf_sample_data_init(&data, 0, event->hw.last_period);
+ if (perf_event_overflow(event, &data, iregs))
+ x86_pmu_stop(event, 0);
+
+ /* Inject one fake event is enough. */
+ return 1;
+ }
+
+ return 0;
+}
+
static void __intel_pmu_pebs_event(struct perf_event *event,
struct pt_regs *iregs,
void *base, void *top,
@@ -1954,6 +2010,9 @@ static void intel_pmu_drain_pebs_icl(struct pt_regs *iregs)
if (!x86_pmu.pebs_active)
return;

+ if (intel_pmu_handle_guest_pebs(cpuc, iregs, ds))
+ return;
+
base = (struct pebs_basic *)(unsigned long)ds->pebs_buffer_base;
top = (struct pebs_basic *)(unsigned long)ds->pebs_index;

--
1.8.3.1