Re: [PATCH v2 1/2] ath10k: Keep track of which interrupts fired, don't poll them

From: Doug Anderson
Date: Fri Aug 21 2020 - 17:26:19 EST


Kalle,

On Thu, Jul 9, 2020 at 8:22 AM Douglas Anderson <dianders@xxxxxxxxxxxx> wrote:
>
> If we have a per CE (Copy Engine) IRQ then we have no summary
> register. Right now the code generates a summary register by
> iterating over all copy engines and seeing if they have an interrupt
> pending.
>
> This has a problem. Specifically if _none_ if the Copy Engines have
> an interrupt pending then they might go into low power mode and
> reading from their address space will cause a full system crash. This
> was seen to happen when two interrupts went off at nearly the same
> time. Both were handled by a single call of ath10k_snoc_napi_poll()
> but, because there were two interrupts handled and thus two calls to
> napi_schedule() there was still a second call to
> ath10k_snoc_napi_poll() which ran with no interrupts pending.
>
> Instead of iterating over all the copy engines, let's just keep track
> of the IRQs that fire. Then we can effectively generate our own
> summary without ever needing to read the Copy Engines.
>
> Tested-on: WCN3990 SNOC WLAN.HL.3.2.2-00490-QCAHLSWMTPL-1
>
> Signed-off-by: Douglas Anderson <dianders@xxxxxxxxxxxx>
> Reviewed-by: Rakesh Pillai <pillair@xxxxxxxxxxxxxx>
> Reviewed-by: Brian Norris <briannorris@xxxxxxxxxxxx>
> ---
> This patch continues work to try to squash all instances of the crash
> we've been seeing while reading CE registers and hopefully this patch
> addresses the true root of the issue.
>
> The first patch that attempted to address these problems landed as
> commit 8f9ed93d09a9 ("ath10k: Wait until copy complete is actually
> done before completing"). After that Rakesh Pillai posted ("ath10k:
> Add interrupt summary based CE processing") [1] and this patch is
> based atop that one. Both of those patches significantly reduced the
> instances of problems but didn't fully eliminate them. Crossing my
> fingers that they're all gone now.
>
> [1] https://lore.kernel.org/r/1593193967-29897-1-git-send-email-pillair@xxxxxxxxxxxxxx
>
> Changes in v2:
> - Add bitmap_clear() in ath10k_snoc_hif_start().
>
> drivers/net/wireless/ath/ath10k/ce.c | 84 ++++++++++----------------
> drivers/net/wireless/ath/ath10k/ce.h | 14 ++---
> drivers/net/wireless/ath/ath10k/snoc.c | 19 ++++--
> drivers/net/wireless/ath/ath10k/snoc.h | 1 +
> 4 files changed, 52 insertions(+), 66 deletions(-)

I'm wondering if there's anything else you're looking for here. If I
just need to sit tight that's fine, but I want to make sure this patch
isn't lost and you're not waiting for any actions on my part. The
patch it depends on from Rakesh (see above or patchwork ID 11628289)
is also still marked as "Under Review".

We have been using this patch for the last few months and we haven't
hit a single crash like we were getting before. At the same time, we
haven't found any regressions that have been attributed to this patch.

Anyway, just figured I'd check in. Thanks!

-Doug