Re: [PATCH v3 00/17] KVM: x86/pmu: Add support to enable Guest PEBS via DS

From: Xu, Like
Date: Tue Jan 26 2021 - 12:26:36 EST


On 2021/1/25 22:47, Liuxiangdong (Aven, Cloud Infrastructure Service Product Dept.) wrote:
Thanks for replying,

On 2021/1/25 10:41, Like Xu wrote:
+ kvm@xxxxxxxxxxxxxxx

Hi Liuxiangdong,

On 2021/1/22 18:02, Liuxiangdong (Aven, Cloud Infrastructure Service Product Dept.) wrote:
Hi Like,

Some questions about https://lore.kernel.org/kvm/20210104131542.495413-1-like.xu@xxxxxxxxxxxxxxx/ <https://lore.kernel.org/kvm/20210104131542.495413-1-like.xu@xxxxxxxxxxxxxxx/>

Thanks for trying the PEBS feature in the guest,
and I assume you have correctly applied the QEMU patches for guest PEBS.

Is there any other patch that needs to be apply? I use qemu 5.2.0. (download from github on January 14th)

Two qemu patches are attached against qemu tree
(commit 31ee895047bdcf7387e3570cbd2a473c6f744b08)
and then run the guest with "-cpu,pebs=true".

Note, this two patch are just for test and not finalized for qemu upstream.


1)Test in IceLake

In the [PATCH v3 10/17] KVM: x86/pmu: Expose CPUIDs feature bits PDCM, DS, DTES64, we only support Ice Lake with the following x86_model(s):

#define INTEL_FAM6_ICELAKE_X        0x6A
#define INTEL_FAM6_ICELAKE_D        0x6C

you can check the eax output of "cpuid -l 1 -1 -r",
for example "0x000606a4" meets this requirement.
It's INTEL_FAM6_ICELAKE_X

Yes, it's the target hardware.

cpuid -l 1 -1 -r

CPU:
   0x00000001 0x00: eax=0x000606a6 ebx=0xb4800800 ecx=0x7ffefbf7 edx=0xbfebfbff


HOST:

CPU family:                      6

Model:                           106

Model name:                      Intel(R) Xeon(R) Platinum 8378A CPU $@ $@

microcode: sig=0x606a6, pf=0x1, revision=0xd000122

As long as you get the latest BIOS from the provider,
you may check 'cat /proc/cpuinfo | grep code | uniq' with the latest one.
OK. I'll do it later.


Guest:  linux kernel 5.11.0-rc2

I assume it's the "upstream tag v5.11-rc2" which is fine.
Yes.


We can find pebs/intel_pt flag in guest cpuinfo, but there still exists error when we use perf

Just a note, intel_pt and pebs are two features and we can write
pebs records to intel_pt buffer with extra hardware support.
(by default, pebs records are written to the pebs buffer)

You may check the output of "dmesg | grep PEBS" in the guest
to see if the guest PEBS cpuinfo is exposed and use "perf record
–e cycles:pp" to see if PEBS feature actually  works in the guest.

I apply only pebs patch set to linux kernel 5.11.0-rc2, test perf in guest and dump stack when return -EOPNOTSUPP

Yes, you may apply the qemu patches and try it again.


(1)
# perf record -e instructions:pp
Error:
instructions:pp: PMU Hardware doesn't support sampling/overflow-interrupts. Try 'perf stat'

[  117.793266] Call Trace:
[  117.793270]  dump_stack+0x57/0x6a
[  117.793275]  intel_pmu_setup_lbr_filter+0x137/0x190
[  117.793280]  intel_pmu_hw_config+0x18b/0x320
[  117.793288]  hsw_hw_config+0xe/0xa0
[  117.793290]  x86_pmu_event_init+0x8e/0x210
[  117.793293]  perf_try_init_event+0x40/0x130
[  117.793297]  perf_event_alloc.part.22+0x611/0xde0
[  117.793299]  ? alloc_fd+0xba/0x180
[  117.793302]  __do_sys_perf_event_open+0x1bd/0xd90
[  117.793305]  do_syscall_64+0x33/0x40
[  117.793308]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Do we need lbr when we use pebs?

No, lbr ane pebs are two features and we enable it separately.


I tried to apply lbr patch set(https://lore.kernel.org/kvm/911adb63-ba05-ea93-c038-1c09cff15eda@xxxxxxxxx/) to kernel and qemu, but there is still other problem.
Error:
The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event
...

We don't need that patch for PEBS feature.


(2)
# perf record -e instructions:ppp
Error:
instructions:ppp: PMU Hardware doesn't support sampling/overflow-interrupts. Try 'perf stat'

[  115.188498] Call Trace:
[  115.188503]  dump_stack+0x57/0x6a
[  115.188509]  x86_pmu_hw_config+0x1eb/0x220
[  115.188515]  intel_pmu_hw_config+0x13/0x320
[  115.188519]  hsw_hw_config+0xe/0xa0
[  115.188521]  x86_pmu_event_init+0x8e/0x210
[  115.188524]  perf_try_init_event+0x40/0x130
[  115.188528]  perf_event_alloc.part.22+0x611/0xde0
[  115.188530]  ? alloc_fd+0xba/0x180
[  115.188534]  __do_sys_perf_event_open+0x1bd/0xd90
[  115.188538]  do_syscall_64+0x33/0x40
[  115.188541]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

This is beacuse x86_pmu.intel_cap.pebs_format is always 0 in x86_pmu_max_precise().

We rdmsr MSR_IA32_PERF_CAPABILITIES(0x00000345)  from HOST, it's f4c5.
From guest, it's 2000


# perf record –e cycles:pp

Error:

cycles:pp: PMU Hardware doesn’t support sampling/overflow-interrupts. Try ‘perf stat’

Could you give some advice?

If you have more specific comments or any concerns, just let me know.


2)Test in Skylake

HOST:

CPU family:                      6

Model:                           85

Model name:                      Intel(R) Xeon(R) Gold 6146 CPU @

                                   3.20GHz

microcode        : 0x2000064

Guest: linux 4.18

we cannot find intel_pt flag in guest cpuinfo because cpu_has_vmx_intel_pt() return false.

You may check vmx_pebs_supported().
It's true.


SECONDARY_EXEC_PT_USE_GPA/VM_EXIT_CLEAR_IA32_RTIT_CTL/VM_ENTRY_LOAD_IA32_RTIT_CTL are both disable.

Is it because microcode is not supported?

And, isthere a new macrocode which can support these bits? How can we get this?

Currently, this patch set doesn't support guest PEBS on the Skylake
platforms, and if we choose to support it, we will let you know.

And now, we want to use pebs in skylake. If we develop based on pebs patch set, do you have any suggestions?

- At least you need to pin guest memory such as "-overcommit mem-lock=true" for qemu
- You may rewrite the patches 13 - 17 for Skylake specific because the records format is different with Ice Lake.

I think microcode requirements need to be satisfied.  Can we use https://github.com/intel/Intel-Linux-Processor-Microcode-Data-Files ?

You may try it at your risk and again,
this patch set doesn't support guest PEBS on the Skylake platforms currently.


---
thx,likexu


Thanks,

Liuxiangdong


Thanks. Liuxiangdong


From 24a04b800d24e3b493e5094f88649402923147a2 Mon Sep 17 00:00:00 2001
From: Like Xu <like.xu@xxxxxxxxxxxxxxx>
Date: Fri, 4 Sep 2020 10:19:27 +0800
Subject: [PATCH 1/2] target/i386: Expose PEBS capabilities in the
FEAT_PERF_CAPABILITIES

The IA32_PERF_CAPABILITIES MSR provides enumeration of a variety of
PEBS feature interfaces:

- PEBSTrap[6]: Trap/Fault-like indicator of PEBS recording assist;
- PEBSArchRegs[7]: Indicator of PEBS assist save architectural registers;
- PEBS_FMT[bits 11:8]: Specifies the encoding of the layout of PEBS records;
- PEBS_BASELINE [bit 14]: If set, the following is true:
(1) Extended PEBS is supported. All counters support the PEBS facility,
and all events can generate PEBS records when PEBS is enabled.
(2) Adaptive PEBS is supported. The PEBS_DATA_CFG MSR and adaptive record
enable bits are supported.

Signed-off-by: Like Xu <like.xu@xxxxxxxxxxxxxxx>
---
target/i386/cpu.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 72a79e6019..14262c7bf7 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -1136,9 +1136,9 @@ static FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
.type = MSR_FEATURE_WORD,
.feat_names = {
NULL, NULL, NULL, NULL,
- NULL, NULL, NULL, NULL,
- NULL, NULL, NULL, NULL,
- NULL, "full-width-write", NULL, NULL,
+ NULL, NULL, "pebs-trap", "pebs-arch-reg",
+ "pebs-fmt-0", "pebs-fmt-1", "pebs-fmt-2", "pebs-fmt-3",
+ NULL, "full-width-write", "pebs-baseline", NULL,
NULL, NULL, NULL, NULL,
NULL, NULL, NULL, NULL,
NULL, NULL, NULL, NULL,
--
2.29.2

From be5246694aaf2132396ee0b907e679f5c9ccd089 Mon Sep 17 00:00:00 2001
From: Like Xu <like.xu@xxxxxxxxxxxxxxx>
Date: Fri, 4 Sep 2020 10:42:28 +0800
Subject: [PATCH 2/2] target/i386: add -cpu,pebs=true support to enable guest
PEBS

The PEBS feature would be enabled on the guest if:
- the KVM is enabled and the PMU is enabled and,
- the msr-based-feature IA32_PERF_CAPABILITIES is supporterd and,
- the supported returned value for PEBS from this msr is not zero.

The PEBS feature would be disabled on the guest if:
- the msr-based-feature IA32_PERF_CAPABILITIES is unsupporterd OR,
- qemu set the IA32_PERF_CAPABILITIES msr feature without pebs_fmt values OR,
- the requested guest vcpu model doesn't support PDCM.

Signed-off-by: Like Xu <like.xu@xxxxxxxxxxxxxxx>
---
hw/i386/pc.c | 1 +
target/i386/cpu.c | 20 ++++++++++++++++++++
target/i386/cpu.h | 7 +++++++
target/i386/kvm/kvm.c | 10 ++++++++++
4 files changed, 38 insertions(+)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 5458f61d10..8e9c1b7545 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -330,6 +330,7 @@ GlobalProperty pc_compat_1_5[] = {
{ "Nehalem-" TYPE_X86_CPU, "min-level", "2" },
{ "virtio-net-pci", "any_layout", "off" },
{ TYPE_X86_CPU, "pmu", "on" },
+ { TYPE_X86_CPU, "pebs", "on" },
{ "i440FX-pcihost", "short_root_bus", "0" },
{ "q35-pcihost", "short_root_bus", "0" },
};
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 14262c7bf7..9dffc85542 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -4228,6 +4228,12 @@ static bool lmce_supported(void)
return !!(mce_cap & MCG_LMCE_P);
}

+static inline bool lbr_supported(void)
+{
+ return kvm_enabled() && (kvm_arch_get_supported_msr_feature(kvm_state,
+ MSR_IA32_PERF_CAPABILITIES) & PERF_CAP_PEBS_FORMAT);
+}
+
#define CPUID_MODEL_ID_SZ 48

/**
@@ -4332,6 +4338,9 @@ static void max_x86_cpu_initfn(Object *obj)
}

object_property_set_bool(OBJECT(cpu), "pmu", true, &error_abort);
+ if (lbr_supported()) {
+ object_property_set_bool(OBJECT(cpu), "pebs", true, &error_abort);
+ }
}

static const TypeInfo max_x86_cpu_type_info = {
@@ -5545,6 +5554,10 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
}
if (!cpu->enable_pmu) {
*ecx &= ~CPUID_EXT_PDCM;
+ if (cpu->enable_pebs) {
+ warn_report("PEBS is unsupported since guest PMU is disabled.");
+ exit(1);
+ }
}
break;
case 2:
@@ -6610,6 +6623,12 @@ static void x86_cpu_realizefn(DeviceState *dev, Error **errp)
}
}

+ if (!cpu->max_features && cpu->enable_pebs &&
+ !(env->features[FEAT_1_ECX] & CPUID_EXT_PDCM)) {
+ warn_report("requested vcpu model doesn't support PDCM for PEBS.");
+ exit(1);
+ }
+
if (cpu->ucode_rev == 0) {
/* The default is the same as KVM's. */
if (IS_AMD_CPU(env)) {
@@ -7192,6 +7211,7 @@ static Property x86_cpu_properties[] = {
#endif
DEFINE_PROP_INT32("node-id", X86CPU, node_id, CPU_UNSET_NUMA_NODE_ID),
DEFINE_PROP_BOOL("pmu", X86CPU, enable_pmu, false),
+ DEFINE_PROP_BOOL("pebs", X86CPU, enable_pebs, false),

DEFINE_PROP_UINT32("hv-spinlocks", X86CPU, hyperv_spinlock_attempts,
HYPERV_SPINLOCK_NEVER_NOTIFY),
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index d23a5b340a..eac8d8c68e 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -354,6 +354,12 @@ typedef enum X86Seg {
#define ARCH_CAP_TSX_CTRL_MSR (1<<7)

#define MSR_IA32_PERF_CAPABILITIES 0x345
+#define PERF_CAP_PEBS_TRAP BIT_ULL(6)
+#define PERF_CAP_ARCH_REG BIT_ULL(7)
+#define PERF_CAP_PEBS_FORMAT 0xf00
+#define PERF_CAP_PEBS_BASELINE BIT_ULL(14)
+#define PERF_CAP_PEBS_MASK (PERF_CAP_PEBS_TRAP | PERF_CAP_ARCH_REG | \
+ PERF_CAP_PEBS_FORMAT | PERF_CAP_PEBS_BASELINE)

#define MSR_IA32_TSX_CTRL 0x122
#define MSR_IA32_TSCDEADLINE 0x6e0
@@ -1708,6 +1714,7 @@ struct X86CPU {
* capabilities) directly to the guest.
*/
bool enable_pmu;
+ bool enable_pebs;

/* LMCE support can be enabled/disabled via cpu option 'lmce=on/off'. It is
* disabled by default to avoid breaking migration between QEMU with
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 6dc1ee052d..8fe1d2feea 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -2705,6 +2705,13 @@ static void kvm_msr_entry_add_perf(X86CPU *cpu, FeatureWordArray f)
MSR_IA32_PERF_CAPABILITIES);

if (kvm_perf_cap) {
+ if (!cpu->enable_pebs) {
+ kvm_perf_cap &= ~PERF_CAP_PEBS_MASK;
+ }
+ if (!(kvm_perf_cap & PERF_CAP_PEBS_MASK) && cpu->enable_pebs) {
+ warn_report("MSR_IA32_PERF_CAPABILITIES reported by KVM does not support PEBS.");
+ exit(1);
+ }
kvm_msr_entry_add(cpu, MSR_IA32_PERF_CAPABILITIES,
kvm_perf_cap & f[FEAT_PERF_CAPABILITIES]);
}
@@ -2744,6 +2751,9 @@ static void kvm_init_msrs(X86CPU *cpu)

if (has_msr_perf_capabs && cpu->enable_pmu) {
kvm_msr_entry_add_perf(cpu, env->features);
+ } else if (!has_msr_perf_capabs && cpu->enable_pebs) {
+ warn_report("KVM doesn't support MSR_IA32_PERF_CAPABILITIES for PEBS.");
+ exit(1);
}

if (has_msr_ucode_rev) {
--
2.29.2