From: Wen Gong <quic_wgong@xxxxxxxxxxx>Reviewed-by: Aditya Kumar Singh <aditya.kumar.singh@xxxxxxxxxxxxxxxx>
Running this test in a loop it is easy to reproduce an rtnl deadlock:
iw reg set FI
ifconfig wlan0 down
What happens is that thread A (workqueue) tries to update the regulatory:
try to acquire the rtnl_lock of ar->regd_update_work
rtnl_lock
ath12k_regd_update [ath12k]
ath12k_regd_update_work [ath12k]
process_one_work
worker_thread
kthread
ret_from_fork
And thread B (ifconfig) tries to stop the interface:
try to cancel_work_sync(&ar->regd_update_work) in ath12k_mac_op_stop().
ifconfig 3109 [003] 2414.232506: probe:
ath12k_mac_op_stop [ath12k]
drv_stop [mac80211]
ieee80211_do_stop [mac80211]
ieee80211_stop [mac80211]
The sequence of deadlock is:
1. Thread B calls rtnl_lock().
2. Thread A starts to run and calls rtnl_lock() from within
ath12k_regd_update_work(), then enters wait state because the lock is
owned by thread B.
3. Thread B tries to call cancel_work_sync(&ar->regd_update_work), but
thread A is in ath12k_regd_update_work() waiting for rtnl_lock(). So
cancel_work_sync() forever waits for ath12k_regd_update_work() to
finish and we have a deadlock.
Change to use regulatory_set_wiphy_regd(), which is the asynchronous
version of regulatory_set_wiphy_regd_sync(). This way rtnl & wiphy locks
are not required so can be removed, and in the end the deadlock issue can
be avoided.
But a side effect introduced by the asynchronous regd update is that,
some essential information used in ath12k_reg_update_chan_list(), which
would be called later in ath12k_regd_update(), might has not been updated
by cfg80211, as a result wrong channel parameters sent to firmware.
To handle this side effect, move ath12k_reg_update_chan_list() to
ath12k_reg_notifier(), and advertise WIPHY_FLAG_NOTIFY_REGDOM_BY_DRIVER
to cfg80211. This works because, in the process of the asynchronous regd
update, after the new regd is processed, cfg80211 will notify ath12k by
calling ath12k_reg_notifier(). Since all essential information is updated
at that time, we are good to do channel list update.
Please note ath12k_reg_notifier() could also be called due to other
reasons, like core/beacon/user hints etc. For them we are not allowed to
call ath12k_reg_update_chan_list() because regd has not been updated.
This is done by verifying the initiator.
Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.0.c5-00481-QCAHMTSWPL_V1.0_V2.0_SILICONZ-3
Signed-off-by: Wen Gong <quic_wgong@xxxxxxxxxxx>
Co-developed-by: Baochen Qiang <quic_bqiang@xxxxxxxxxxx>
Signed-off-by: Baochen Qiang <quic_bqiang@xxxxxxxxxxx>
Acked-by: Jeff Johnson <quic_jjohnson@xxxxxxxxxxx>
---