Re: [perf] unchecked MSR access error: WRMSR to 0x689 in intel_pmu_lbr_restore

From: Liang, Kan
Date: Tue Jul 12 2022 - 19:08:45 EST




On 2022-07-12 5:26 p.m., Vince Weaver wrote:
> On Tue, 12 Jul 2022, Pawan Gupta wrote:
>
>> On Tue, Jul 12, 2022 at 03:39:56PM -0400, Vince Weaver wrote:
>> It appears this CPU does not support TSX feature (or disabling TSX). If
>> the bug is easy to reproduce, bisecting can help.
>
> I thought TSX was disabled via firmware update for all Haswell machines?
>
> In any case, the fuzzer is triggering the
> unchecked MSR access error: WRMSR to 0x689
> in intel_pmu_lbr_restore. So either this is a false error and should be
> disabled, or else it's a real issue and should be fixed.
>

Could you please double check if the quirk can fix the issue on your
machine?

#Try write the exact same value from the error log to 0x689. The write
should fail.
wrmsr -p 0 0x689 0x1fffffff8101349e

#The quirk copy bits 59:60 to bits 61:62. The below write should succeed.
wrmsr -p 0 0x689 0x7fffffff8101349e

> Unfortunately the fuzzer can take up to a few days to trigger the message
> (it's not easily repeatable) so doing a kernel bisect would take a very
> long time.
>

The lbr_from_signext_quirk_needed() is only invoked at boot time. Maybe
we can dump some logs to understand which variable is not expected.

Could you please apply the below patch, reboot to the patched kernel and
share the dmesg log?

diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c
index 13179f31fe10..50435ab627ad 100644
--- a/arch/x86/events/intel/lbr.c
+++ b/arch/x86/events/intel/lbr.c
@@ -300,6 +300,9 @@ static inline bool lbr_from_signext_quirk_needed(void)
bool tsx_support = boot_cpu_has(X86_FEATURE_HLE) ||
boot_cpu_has(X86_FEATURE_RTM);

+ pr_info("%s %s. LBR has tsx %d\n", boot_cpu_has(X86_FEATURE_HLE) ?
"HLE" : "NO HLE",
+ boot_cpu_has(X86_FEATURE_RTM) ? "RTM" : "NO RTM",
+ x86_pmu.lbr_has_tsx);
return !tsx_support && x86_pmu.lbr_has_tsx;
}


Thanks,
Kan