Re: [RESEND PATCH] perf/x86/intel: Fix unchecked MSR access error for Alder Lake N

From: Andi Kleen
Date: Mon Aug 22 2022 - 10:31:51 EST



On 8/22/2022 3:48 PM, Peter Zijlstra wrote:
On Mon, Aug 22, 2022 at 09:28:31AM -0400, Liang, Kan wrote:

On 2022-08-19 10:38 a.m., Peter Zijlstra wrote:
On Thu, Aug 18, 2022 at 11:15:30AM -0700, kan.liang@xxxxxxxxxxxxxxx wrote:
From: Kan Liang <kan.liang@xxxxxxxxxxxxxxx>

For some Alder Lake N machine, the below unchecked MSR access error may be
triggered.

[ 0.088017] rcu: Hierarchical SRCU implementation.
[ 0.088017] unchecked MSR access error: WRMSR to 0x38f (tried to write
0x0001000f0000003f) at rIP: 0xffffffffb5684de8 (native_write_msr+0x8/0x30)
[ 0.088017] Call Trace:
[ 0.088017] <TASK>
[ 0.088017] __intel_pmu_enable_all.constprop.46+0x4a/0xa0
FWIW, I seem to get the same error when booting KVM on my ADL. I'm
fairly sure the whole CPUID vs vCPU thing is a trainwreck.
We will enhance the CPUID to address the issues. Hopefully, we can have
them supported in the next generation.
How!? A vCPU can readily migrate between a big and small CPU. There is
no way the guest can sanely program the (v)MSRs and expect it to work.

In principle this can be fixed by affinitizing the vcpus to their respective type and reporting the right type, and I thought qemu was supported to support this. But it would be certainly a much more complex command line.

If you don't do this, architectural events should work, but yes any non architectural will not count correctly.

I guess one way to detect this situation would be if the CPUID is Alderlake, but there is no hybrid support reported in CPUID. Then it's likely a situation like this and it could be special cased in the perf tools and only show a limited event list.

-Andi