Re: [PATCH] ath10k: Don't touch the CE interrupt registers after power up

From: Amit Pundir
Date: Wed Jan 03 2024 - 11:32:29 EST


On Sat, 1 Jul 2023 at 03:49, Douglas Anderson <dianders@xxxxxxxxxxxx> wrote:
>
> As talked about in commit d66d24ac300c ("ath10k: Keep track of which
> interrupts fired, don't poll them"),

Hi Douglas, does this fix has a dependency on the above upstream
commit d66d24ac300c, that you refer to?

Asking because this patch landed on stable v5.4.y branch recently and
now I see RCU stalls and lockups around "ath10k_snoc 18800000.wifi:
failed to receive control response completion, polling.." message
during ath10k_snoc initialization/bringup on DB845c. Here is the
relevant log https://www.irccloud.com/pastebin/raw/NjKm3mLc, with
DB845c rebooting into USB crash dump mode eventually.

I wonder if commit d66d24ac300c need to be backported to v5.4.y as
well? I tried cherry-picking it but ran into non-trivial conflicts, so
didn't spend much time on it.

Regards,
Amit Pundir

> if we access the copy engine
> register at a bad time then ath10k can go boom. However, it's not
> necessarily easy to know when it's safe to access them.
>
> The ChromeOS test labs saw a crash that looked like this at
> shutdown/reboot time (on a chromeos-5.15 kernel, but likely the
> problem could also reproduce upstream):
>
> Internal error: synchronous external abort: 96000010 [#1] PREEMPT SMP
> ...
> CPU: 4 PID: 6168 Comm: reboot Not tainted 5.15.111-lockdep-19350-g1d624fe6758f #1 010b9b233ab055c27c6dc88efb0be2f4e9e86f51
> Hardware name: Google Kingoftown (DT)
> ...
> pc : ath10k_snoc_read32+0x50/0x74 [ath10k_snoc]
> lr : ath10k_snoc_read32+0x24/0x74 [ath10k_snoc]
> ...
> Call trace:
> ath10k_snoc_read32+0x50/0x74 [ath10k_snoc ...]
> ath10k_ce_disable_interrupt+0x190/0x65c [ath10k_core ...]
> ath10k_ce_disable_interrupts+0x8c/0x120 [ath10k_core ...]
> ath10k_snoc_hif_stop+0x78/0x660 [ath10k_snoc ...]
> ath10k_core_stop+0x13c/0x1ec [ath10k_core ...]
> ath10k_halt+0x398/0x5b0 [ath10k_core ...]
> ath10k_stop+0xfc/0x1a8 [ath10k_core ...]
> drv_stop+0x148/0x6b4 [mac80211 ...]
> ieee80211_stop_device+0x70/0x80 [mac80211 ...]
> ieee80211_do_stop+0x10d8/0x15b0 [mac80211 ...]
> ieee80211_stop+0x144/0x1a0 [mac80211 ...]
> __dev_close_many+0x1e8/0x2c0
> dev_close_many+0x198/0x33c
> dev_close+0x140/0x210
> cfg80211_shutdown_all_interfaces+0xc8/0x1e0 [cfg80211 ...]
> ieee80211_remove_interfaces+0x118/0x5c4 [mac80211 ...]
> ieee80211_unregister_hw+0x64/0x1f4 [mac80211 ...]
> ath10k_mac_unregister+0x4c/0xf0 [ath10k_core ...]
> ath10k_core_unregister+0x80/0xb0 [ath10k_core ...]
> ath10k_snoc_free_resources+0xb8/0x1ec [ath10k_snoc ...]
> ath10k_snoc_shutdown+0x98/0xd0 [ath10k_snoc ...]
> platform_shutdown+0x7c/0xa0
> device_shutdown+0x3e0/0x58c
> kernel_restart_prepare+0x68/0xa0
> kernel_restart+0x28/0x7c
>
> Though there's no known way to reproduce the problem, it makes sense
> that it would be the same issue where we're trying to access copy
> engine registers when it's not allowed.
>
> Let's fix this by changing how we "disable" the interrupts. Instead of
> tweaking the copy engine registers we'll just use disable_irq() and
> enable_irq(). Then we'll configure the interrupts once at power up
> time.
>
> Tested-on: WCN3990 hw1.0 SNOC WLAN.HL.3.2.2.c10-00754-QCAHLSWMTPL-1
>
> Signed-off-by: Douglas Anderson <dianders@xxxxxxxxxxxx>
> ---
>
> drivers/net/wireless/ath/ath10k/snoc.c | 18 ++++++++++++++----
> 1 file changed, 14 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/wireless/ath/ath10k/snoc.c b/drivers/net/wireless/ath/ath10k/snoc.c
> index 26214c00cd0d..2c39bad7ebfb 100644
> --- a/drivers/net/wireless/ath/ath10k/snoc.c
> +++ b/drivers/net/wireless/ath/ath10k/snoc.c
> @@ -828,12 +828,20 @@ static void ath10k_snoc_hif_get_default_pipe(struct ath10k *ar,
>
> static inline void ath10k_snoc_irq_disable(struct ath10k *ar)
> {
> - ath10k_ce_disable_interrupts(ar);
> + struct ath10k_snoc *ar_snoc = ath10k_snoc_priv(ar);
> + int id;
> +
> + for (id = 0; id < CE_COUNT_MAX; id++)
> + disable_irq(ar_snoc->ce_irqs[id].irq_line);
> }
>
> static inline void ath10k_snoc_irq_enable(struct ath10k *ar)
> {
> - ath10k_ce_enable_interrupts(ar);
> + struct ath10k_snoc *ar_snoc = ath10k_snoc_priv(ar);
> + int id;
> +
> + for (id = 0; id < CE_COUNT_MAX; id++)
> + enable_irq(ar_snoc->ce_irqs[id].irq_line);
> }
>
> static void ath10k_snoc_rx_pipe_cleanup(struct ath10k_snoc_pipe *snoc_pipe)
> @@ -1090,6 +1098,8 @@ static int ath10k_snoc_hif_power_up(struct ath10k *ar,
> goto err_free_rri;
> }
>
> + ath10k_ce_enable_interrupts(ar);
> +
> return 0;
>
> err_free_rri:
> @@ -1253,8 +1263,8 @@ static int ath10k_snoc_request_irq(struct ath10k *ar)
>
> for (id = 0; id < CE_COUNT_MAX; id++) {
> ret = request_irq(ar_snoc->ce_irqs[id].irq_line,
> - ath10k_snoc_per_engine_handler, 0,
> - ce_name[id], ar);
> + ath10k_snoc_per_engine_handler,
> + IRQF_NO_AUTOEN, ce_name[id], ar);
> if (ret) {
> ath10k_err(ar,
> "failed to register IRQ handler for CE %d: %d\n",
> --
> 2.41.0.255.g8b1d071c50-goog
>