[PATCH v2 1/1] kernel/cpu: to track which CPUHP callback is failed

From: Dongli Zhang
Date: Fri Apr 09 2021 - 01:54:11 EST


During bootup or cpu hotplug, the cpuhp_up_callbacks() or
cpuhp_down_callbacks() call many CPUHP callbacks (e.g., perf, mm,
workqueue, RCU, kvmclock and more) for each cpu to online/offline. It may
roll back to its previous state if any of callbacks is failed. As a result,
the user will not be able to know which callback is failed and usually the
only symptom is cpu online/offline failure.

This patch is to print more debug log to help user narrow down where is the
root cause.

Below is the example that how the patch helps narrow down the root cause
for the issue fixed by commit d7eb79c6290c ("KVM: kvmclock: Fix vCPUs > 64
can't be online/hotpluged").

We will have below dynamic debug log once we add
dyndbg="file kernel/cpu.c +p" to kernel command line and when issue is
reproduced.

"CPUHP up callback failure (-12) for cpu 64 at kvmclock:setup_percpu (66)"

Cc: Joe Jin <joe.jin@xxxxxxxxxx>
Signed-off-by: Dongli Zhang <dongli.zhang@xxxxxxxxxx>
---
Changed since v1 RFC:
- use pr_debug() but not pr_err_once() (suggested by Qais Yousef)
- print log for cpuhp_down_callbacks() as well (suggested by Qais Yousef)

kernel/cpu.c | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index 1b6302ecbabe..bcd4dd7de9c3 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -621,6 +621,10 @@ static int cpuhp_up_callbacks(unsigned int cpu, struct cpuhp_cpu_state *st,
st->state++;
ret = cpuhp_invoke_callback(cpu, st->state, true, NULL, NULL);
if (ret) {
+ pr_debug("CPUHP up callback failure (%d) for cpu %u at %s (%d)\n",
+ ret, cpu, cpuhp_get_step(st->state)->name,
+ st->state);
+
if (can_rollback_cpu(st)) {
st->target = prev_state;
undo_cpu_up(cpu, st);
@@ -990,6 +994,10 @@ static int cpuhp_down_callbacks(unsigned int cpu, struct cpuhp_cpu_state *st,
for (; st->state > target; st->state--) {
ret = cpuhp_invoke_callback(cpu, st->state, false, NULL, NULL);
if (ret) {
+ pr_debug("CPUHP down callback failure (%d) for cpu %u at %s (%d)\n",
+ ret, cpu, cpuhp_get_step(st->state)->name,
+ st->state);
+
st->target = prev_state;
if (st->state < prev_state)
undo_cpu_down(cpu, st);
--
2.17.1