synchronize_rcu_expedited gets stuck in hotplug path

From: Mukesh Ojha
Date: Tue Jan 18 2022 - 06:46:51 EST


Hi ,

We are facing one issue in hotplug test where cpuhp/2 gets stuck in below path [1] in
synchronize_rcu_expedited at state CPUHP_AP_ONLINE_DYN and it is not able to proceed.
We see wait_rcu_exp_gp() is queued to cpu2  and it looks like it did not get chance to
run as we see it as in pending state at cpu2 [2].

So, when exactly cpu2 gets available for scheduling in hotplug path, is it after
CPUHP_AP_ACTIVE?

It looks to be dead lock here. Can it be fixed by making wait_rcu_exp_gp() queued on another wq ?
or is it a wrong usage of synchronise_rcu in hotplug path?

[1]

=======================================================
Process: cpuhp/2, [affinity: 0x4] cpu: 2 pid: 24 start: 0xffffff87803e4a00
=====================================================
    Task name: cpuhp/2 [affinity: 0x4] pid: 24 cpu: 2 prio: 120 start: ffffff87803e4a00
    state: 0x2[D] exit_state: 0x0 stack base: 0xffffffc010160000
    Last_enqueued_ts:      59.022215498 Last_sleep_ts: 59.022922946
    Stack:
    [<ffffffe9f4074354>] __switch_to+0x248
    [<ffffffe9f5c02474>] __schedule+0x5b0
    [<ffffffe9f5c02b28>] schedule+0x80
    [<ffffffe9f42321a4>] synchronize_rcu_expedited+0x1c4
    [<ffffffe9f423b294>] synchronize_rcu+0x4c
    [<ffffffe9f6d04ab0>] waltgov_stop[sched_walt]+0x78
    [<ffffffe9f512fa28>] cpufreq_add_policy_cpu+0xc0
    [<ffffffe9f512e48c>] cpufreq_online[jt]+0x10f4
    [<ffffffe9f51323b8>] cpuhp_cpufreq_online+0x14
    [<ffffffe9f4128d3c>] cpuhp_invoke_callback+0x2f8
    [<ffffffe9f412c30c>] cpuhp_thread_fun+0x130
    [<ffffffe9f4187a58>] smpboot_thread_fn+0x180
    [<ffffffe9f417d98c>] kthread+0x150
    [<ffffffe9f4013918>] ret_to_user[jt]+0x0


[2]

CPU 2
pool 0
IDLE Workqueue worker: kworker/2:3 current_work: (None)
IDLE Workqueue worker: kworker/2:2 current_work: (None)
IDLE Workqueue worker: kworker/2:1 current_work: (None)
IDLE Workqueue worker: kworker/2:0 current_work: (None)
Pending entry: wait_rcu_exp_gp[jt]
Pending entry: lru_add_drain_per_cpu[jt]
Pending entry: wq_barrier_func[jt]

Thanks,
Mukesh