[tip: ras/urgent] x86/mce/therm_throt: Do not access uninitialized therm_work

From: tip-bot2 for Chuansheng Liu
Date: Wed Jan 15 2020 - 06:37:19 EST


The following commit has been merged into the ras/urgent branch of tip:

Commit-ID: 978370956d2046b19313659ce65ed12d5b996626
Gitweb: https://git.kernel.org/tip/978370956d2046b19313659ce65ed12d5b996626
Author: Chuansheng Liu <chuansheng.liu@xxxxxxxxx>
AuthorDate: Tue, 07 Jan 2020 00:41:16
Committer: Borislav Petkov <bp@xxxxxxx>
CommitterDate: Wed, 15 Jan 2020 11:31:33 +01:00

x86/mce/therm_throt: Do not access uninitialized therm_work

It is relatively easy to trigger the following boot splat on an Ice Lake
client platform. The call stack is like:

kernel BUG at kernel/timer/timer.c:1152!

Call Trace:
__queue_delayed_work
queue_delayed_work_on
therm_throt_process
intel_thermal_interrupt
...

The reason is that a CPU's thermal interrupt is enabled prior to
executing its hotplug onlining callback which will initialize the
throttling workqueues.

Such a race can lead to therm_throt_process() accessing an uninitialized
therm_work, leading to the above BUG at a very early bootup stage.

Therefore, unmask the thermal interrupt vector only after having setup
the workqueues completely.

[ bp: Heavily massage commit message and correct comment formatting. ]

Fixes: f6656208f04e ("x86/mce/therm_throt: Optimize notifications of thermal throttle")
Signed-off-by: Chuansheng Liu <chuansheng.liu@xxxxxxxxx>
Signed-off-by: Borislav Petkov <bp@xxxxxxx>
Acked-by: Tony Luck <tony.luck@xxxxxxxxx>
Link: https://lkml.kernel.org/r/20200107004116.59353-1-chuansheng.liu@xxxxxxxxx
---
arch/x86/kernel/cpu/mce/therm_throt.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/therm_throt.c b/arch/x86/kernel/cpu/mce/therm_throt.c
index b38010b..6c3e1c9 100644
--- a/arch/x86/kernel/cpu/mce/therm_throt.c
+++ b/arch/x86/kernel/cpu/mce/therm_throt.c
@@ -467,6 +467,7 @@ static int thermal_throttle_online(unsigned int cpu)
{
struct thermal_state *state = &per_cpu(thermal_state, cpu);
struct device *dev = get_cpu_device(cpu);
+ u32 l;

state->package_throttle.level = PACKAGE_LEVEL;
state->core_throttle.level = CORE_LEVEL;
@@ -474,6 +475,10 @@ static int thermal_throttle_online(unsigned int cpu)
INIT_DELAYED_WORK(&state->package_throttle.therm_work, throttle_active_work);
INIT_DELAYED_WORK(&state->core_throttle.therm_work, throttle_active_work);

+ /* Unmask the thermal vector after the above workqueues are initialized. */
+ l = apic_read(APIC_LVTTHMR);
+ apic_write(APIC_LVTTHMR, l & ~APIC_LVT_MASKED);
+
return thermal_throttle_add_dev(dev, cpu);
}

@@ -722,10 +727,6 @@ void intel_init_thermal(struct cpuinfo_x86 *c)
rdmsr(MSR_IA32_MISC_ENABLE, l, h);
wrmsr(MSR_IA32_MISC_ENABLE, l | MSR_IA32_MISC_ENABLE_TM1, h);

- /* Unmask the thermal vector: */
- l = apic_read(APIC_LVTTHMR);
- apic_write(APIC_LVTTHMR, l & ~APIC_LVT_MASKED);
-
pr_info_once("CPU0: Thermal monitoring enabled (%s)\n",
tm2 ? "TM2" : "TM1");