Re: [RFC PATCH 0/2] KVM: VMX: Fix VM entry failure on PT_MODE_HOST_GUEST while host is using PT

From: Xiaoyao Li
Date: Thu Sep 08 2022 - 03:25:53 EST


On 8/29/2022 3:49 PM, Wang, Wei W wrote:
On Thursday, August 25, 2022 4:56 PM, Xiaoyao Li wrote:
There is one bug in KVM that can hit vm-entry failure 100% on platform
supporting PT_MODE_HOST_GUEST mode following below steps:

1. #modprobe -r kvm_intel
2. #modprobe kvm_intel pt_mode=1
3. start a VM with QEMU
4. on host: #perf record -e intel_pt//

The vm-entry failure happens because it violates the requirement stated in
Intel SDM 26.2.1.1 VM-Execution Control Fields

If the logical processor is operating with Intel PT enabled (if
IA32_RTIT_CTL.TraceEn = 1) at the time of VM entry, the "load
IA32_RTIT_CTL" VM-entry control must be 0.

On PT_MODE_HOST_GUEST node, PT_MODE_HOST_GUEST is always set. Thus
KVM needs to ensure IA32_RTIT_CTL.TraceEn is 0 before VM-entry. Currently
KVM manually WRMSR(IA32_RTIT_CTL) to clear TraceEn bit. However, it
doesn't work everytime since there is a posibility that IA32_RTIT_CTL.TraceEn
is re-enabled in PT PMI handler before vm-entry. This series tries to fix the
issue by exposing two interfaces from Intel PT driver for the purose to stop and
resume Intel PT on host. It prevents PT PMI handler from re-enabling PT. By the
way, it also fixes another issue that PT PMI touches PT MSRs whihc leads to
what KVM stores for host bemomes stale.

I'm thinking about another approach to fixing it. I think we need to have the
running host pt event disabled when we switch to guest and don't expect to
receive the host pt interrupt at this point. Also, the host pt context can be
save/restored by host perf core (instead of KVM) when we disable/enable
the event.

diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c
index 82ef87e9a897..1d3e03ecaf6a 100644
--- a/arch/x86/events/intel/pt.c
+++ b/arch/x86/events/intel/pt.c
@@ -1575,6 +1575,7 @@ static void pt_event_start(struct perf_event *event, int mode)

pt_config_buffer(buf);
pt_config(event);
+ pt->event = event;

return;

@@ -1600,6 +1601,7 @@ static void pt_event_stop(struct perf_event *event, int mode)
return;

event->hw.state = PERF_HES_STOPPED;
+ pt->event = NULL;

if (mode & PERF_EF_UPDATE) {
struct pt_buffer *buf = perf_get_aux(&pt->handle);
@@ -1624,6 +1626,15 @@ static void pt_event_stop(struct perf_event *event, int mode)
}
}

+
+struct perf_event *pt_get_curr_event(void)
+{
+ struct pt *pt = this_cpu_ptr(&pt_ctx);

Wei,

I'm not sure if we can use pt->handle.event instead or not.

+ return pt->event;
+}
+EXPORT_SYMBOL_GPL(pt_get_curr_event);
+
static long pt_event_snapshot_aux(struct perf_event *event,
struct perf_output_handle *handle,
unsigned long size)
diff --git a/arch/x86/events/intel/pt.h b/arch/x86/events/intel/pt.h
index 96906a62aacd..d46a85bb06bb 100644
--- a/arch/x86/events/intel/pt.h
+++ b/arch/x86/events/intel/pt.h
@@ -121,6 +121,7 @@ struct pt_filters {
* @output_mask: cached RTIT_OUTPUT_MASK MSR value
*/
struct pt {
+ struct perf_event *event;
struct perf_output_handle handle;
struct pt_filters filters;
int handle_nmi;
diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index f6fc8dd51ef4..be8dd24922a7 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -553,11 +553,14 @@ static inline int x86_perf_get_lbr(struct x86_pmu_lbr *lbr)

#ifdef CONFIG_CPU_SUP_INTEL
extern void intel_pt_handle_vmx(int on);
+ extern struct perf_event *pt_get_curr_event(void);
#else
static inline void intel_pt_handle_vmx(int on)
{

+
}
+struct perf_event *pt_get_curr_event(void) { }
#endif