Re: ath9k: panic on tip/master

From: Luis R. Rodriguez
Date: Fri Oct 03 2008 - 14:53:51 EST


On Fri, Oct 03, 2008 at 11:09:31AM -0700, John W. Linville wrote:
> On Fri, Oct 03, 2008 at 11:35:23AM -0400, John W. Linville wrote:
> > On Fri, Oct 03, 2008 at 12:02:11PM +0200, Ingo Molnar wrote:
> > >
> > > * Steven Noonan <steven@xxxxxxxxxxxxxx> wrote:
> > >
> > > > Hey folks,
> > > >
> > > > Just got a panic on tip. According to the stack trace, ath9k is what
> > > > decided to bomb.
> > > >
> > > > http://www.uplinklabs.net/~tycho/linux/ath9k_panic_tip_10.3.2008.jpg
> > > >
> > > > Note: Although it says 'sudo modprobe radeon' on the bash prompt above
> > > > the panic, I never got to hit 'enter' on that command before the panic
> > > > occurred.
> > >
> > > it appears to me that ath9k's eth_rx_input() takes a spinlock that is
> > > not initialized (or already destroyed by the allocator).
> >
> > Seems reasonable...
> >
> > > this would be consistent with an IRQ storm hitting some race in the
> > > ath9k driver init sequence. For example if request_irq() is done before
> > > all structures that the IRQ handler relies on are properly initialized.
> > >
> > > i.e. this has the signature of a genuine ath9k bug.
> >
> > Agreed, although I don't see anything specifically relating to
> > request_irq or the like.
> >
> > I think the spin_lock call may actually be in ath_ampdu_input (called
> > from ath_rx_input), which perhaps is getting called simultaneous
> > with ath_rx_node_init still running? With no locks in between them,
> > it seems like this could be the culprit?
> >
> > Sorry to not be more immediately helpful, but I'm going to have to
> > run in a few minutes. Perhaps this insight is helpful for someone
> > more familiar with the internals of this driver?
>
> This is probably a dead-end...I don't think the ath_node_find
> in ath__rx_indicate will be able to find the ath_node used
> in ath_ampdu_input unless ath_rx_node_init had already complete.
> Back to square one...

Well Steven, please give this a shot, we think this is the culprit.

[PATCH] ath9k: fix oops on trying to hold the wrong spinlock

We were trying to hold the wrong spinlock due to a typo
on IEEE80211_BAR_CTL_TID_S's definition. We use this to
compute the tid number and then hold this this tid number's
spinlock during ath_bar_rx().

Signed-off-by: Vasanthakumar Thiagarajan <vasanth@xxxxxxxxxxx>
Signed-off-by: Sujith <Sujith.Manoharan@xxxxxxxxxxx>
Signed-off-by: Luis R. Rodriguez <lrodriguez@xxxxxxxxxxx>
---
drivers/net/wireless/ath9k/core.h | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/wireless/ath9k/core.h b/drivers/net/wireless/ath9k/core.h
index 2f84093..88f4cc3 100644
--- a/drivers/net/wireless/ath9k/core.h
+++ b/drivers/net/wireless/ath9k/core.h
@@ -316,7 +316,7 @@ void ath_descdma_cleanup(struct ath_softc *sc,
#define ATH_RX_TIMEOUT 40 /* 40 milliseconds */
#define WME_NUM_TID 16
#define IEEE80211_BAR_CTL_TID_M 0xF000 /* tid mask */
-#define IEEE80211_BAR_CTL_TID_S 2 /* tid shift */
+#define IEEE80211_BAR_CTL_TID_S 12 /* tid shift */

enum ATH_RX_TYPE {
ATH_RX_NON_CONSUMED = 0,
--
1.5.6.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/