"Force HWP min perf before offline" triggers unchecked MSR access errors

From: Qian Cai
Date: Tue Oct 29 2019 - 16:55:11 EST


The commit af3b7379e2d7 ("cpufreq: intel_pstate: Force HWP min perf before
offline") triggers an error below while doing CPU hotplug. Reverted it (on the
top of the linux-next) fixed it.

https://raw.githubusercontent.com/cailca/linux-mm/master/x86.config

# lscpu
Architecture:ÂÂÂÂÂÂÂÂx86_64
CPU op-mode(s):ÂÂÂÂÂÂ32-bit, 64-bit
Byte Order:ÂÂÂÂÂÂÂÂÂÂLittle Endian
CPU(s):ÂÂÂÂÂÂÂÂÂÂÂÂÂÂ144
On-line CPU(s) list: 0-143
Thread(s) per core:ÂÂ2
Core(s) per socket:ÂÂ18
Socket(s):ÂÂÂÂÂÂÂÂÂÂÂ4
NUMA node(s):ÂÂÂÂÂÂÂÂ4
Vendor ID:ÂÂÂÂÂÂÂÂÂÂÂGenuineIntel
CPU family:ÂÂÂÂÂÂÂÂÂÂ6
Model:ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ85
Model name:ÂÂÂÂÂÂÂÂÂÂIntel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz
Stepping:ÂÂÂÂÂÂÂÂÂÂÂÂ4
CPU MHz:ÂÂÂÂÂÂÂÂÂÂÂÂÂ1200.001
CPU max MHz:ÂÂÂÂÂÂÂÂÂ3700.0000
CPU min MHz:ÂÂÂÂÂÂÂÂÂ1200.0000
BogoMIPS:ÂÂÂÂÂÂÂÂÂÂÂÂ6000.00
Virtualization:ÂÂÂÂÂÂVT-x
L1d cache:ÂÂÂÂÂÂÂÂÂÂÂ32K
L1i cache:ÂÂÂÂÂÂÂÂÂÂÂ32K
L2 cache:ÂÂÂÂÂÂÂÂÂÂÂÂ1024K
L3 cache:ÂÂÂÂÂÂÂÂÂÂÂÂ25344K
NUMA node0 CPU(s):ÂÂÂ0-17,72-89
NUMA node1 CPU(s):ÂÂÂ18-35,90-107
NUMA node2 CPU(s):ÂÂÂ36-53,108-125
NUMA node3 CPU(s):ÂÂÂ54-71,126-143
Flags:ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂfpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx
pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology
nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2
ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt
tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch
cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd mba ibrs ibpb
stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle
avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap
clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1
xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts
hwp hwp_act_window hwp_pkg_req pku ospke md_clear flush_l1d

[17670.190223][T69701] LTP: starting cpuhotplug02 (cpuhotplug02.sh -c 1 -l 1)
[17676.195430][ÂÂÂT15] unchecked MSR access error: WRMSR to 0x1b0 (tried to
write 0x00000000000000c0) at rIP: 0xffffffff82b0a97c (__wrmsr_on_cpu+0xbc/0x130)
[17676.209251][ÂÂÂT15] Call Trace:
[17676.212410][ÂÂÂT15]ÂÂ? rdmsrl_on_cpu+0xf0/0xf0
[17676.216882][ÂÂÂT15]ÂÂgeneric_exec_single+0x13e/0x1d0
[17676.221876][ÂÂÂT15]ÂÂ? rdmsrl_on_cpu+0xf0/0xf0
[17676.226344][ÂÂÂT15]ÂÂsmp_call_function_single+0x1aa/0x200
[17676.231774][ÂÂÂT15]ÂÂ? generic_exec_single+0x1d0/0x1d0
[17676.236942][ÂÂÂT15]ÂÂ? rdmsrl_on_cpu+0xb1/0xf0
[17676.241410][ÂÂÂT15]ÂÂwrmsrl_on_cpu+0xa6/0xe0
[17676.245705][ÂÂÂT15]ÂÂ? wrmsr_on_cpu+0xf0/0xf0
[17676.250091][ÂÂÂT15]ÂÂ? kasan_slab_free+0xe/0x10
[17676.254650][ÂÂÂT15]ÂÂ? intel_pstate_get_epp+0x168/0x190
[17676.259905][ÂÂÂT15]ÂÂ? store_energy_performance_preference+0x370/0x370
[17676.266469][ÂÂÂT15]ÂÂintel_pstate_set_epb+0xc8/0x110
[17676.271463][ÂÂÂT15]ÂÂ? show_status+0x80/0x80
[17676.275760][ÂÂÂT15]ÂÂ? down_write_killable+0x160/0x160
[17676.280927][ÂÂÂT15]ÂÂintel_pstate_stop_cpu+0x126/0x150
[17676.286094][ÂÂÂT15]ÂÂcpufreq_offline+0x17c/0x3a0
[17676.290737][ÂÂÂT15]ÂÂ? cpufreq_offline+0x3a0/0x3a0
[17676.295556][ÂÂÂT15]ÂÂcpuhp_cpufreq_offline+0xe/0x20
[17676.300464][ÂÂÂT15]ÂÂcpuhp_invoke_callback+0x197/0x1120
[17676.305724][ÂÂÂT15]ÂÂ? lock_acquire+0x126/0x280
[17676.310280][ÂÂÂT15]ÂÂ? cpuhp_thread_fun+0x69/0x2f0
[17676.315098][ÂÂÂT15]ÂÂcpuhp_thread_fun+0x252/0x2f0
[17676.319830][ÂÂÂT15]ÂÂ? __cpuhp_state_remove_instance+0x350/0x350
[17676.325876][ÂÂÂT15]ÂÂsmpboot_thread_fn+0x255/0x440
[17676.330695][ÂÂÂT15]ÂÂ? smpboot_register_percpu_thread+0x110/0x110
[17676.336824][ÂÂÂT15]ÂÂ? __kasan_check_read+0x11/0x20
[17676.341731][ÂÂÂT15]ÂÂ? __kthread_parkme+0xc6/0xe0
[17676.346463][ÂÂÂT15]ÂÂ? smpboot_register_percpu_thread+0x110/0x110
[17676.352590][ÂÂÂT15]ÂÂkthread+0x1e6/0x210
[17676.356534][ÂÂÂT15]ÂÂ? kthread_create_worker_on_cpu+0xc0/0xc0
[17676.362314][ÂÂÂT15]ÂÂret_from_fork+0x3a/0x50
[17676.895221][ÂÂÂT16] IRQ 273: no longer affine to CPU1
[17676.901373][ÂÂÂT16] process 69725 (cpuhotplug_do_s) no longer affine to cpu1