Re: [PATCH] x86/mce/therm_throt: Handle case where throttle_active_work() is called on behalf of an offline CPU

From: Chris Wilson
Date: Sat Feb 22 2020 - 11:53:52 EST


Quoting Srinivas Pandruvada (2020-02-22 16:24:32)
> During cpu-hotplug test with CONFIG_PREEMPTION and CONFIG_DEBUG_PREEMPT
> enabled, Chris reported error:
>
> BUG: using smp_processor_id() in preemptible [00000000] code: kworker/1:0/17
> caller is throttle_active_work+0x12/0x280
>
> Here throttle_active_work() is a work queue callback scheduled with
> schedule_delayed_work_on(). This will not cause this error for the use
> of smp_processor_id() under normal conditions as there is a check for
> "current->nr_cpus_allowed == 1".
> But when the target CPU is offline the workqueue becomes unbound.
> Then the work queue callback can be scheduled on another CPU and the
> error is printed for the use of smp_processor_id() in preemptible context.
>
> When the workqueue is not getting called on the target CPU, simply return.
> This is done by adding a cpu field in the _thermal_state struct and match
> the current CPU id.
>
> Once workqueue is scheduled, prevent CPU offline. In this way, the log
> bits are checked and cleared on the correct CPU. Also use get_cpu() to
> get current CPU id and prevent preemption before we finish processing.
>
> Fixes: f6656208f04e ("x86/mce/therm_throt: Optimize notifications of thermal throttle")
> Reported-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
> Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@xxxxxxxxxxxxxxx>
> Reviewed-by: Tony Luck <tony.luck@xxxxxxxxx>

I've pushed the patch to our CI, but it's not a frequent occurrence, so
it may be some time before I can state a t-b with any confidence.
-Chris