Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts

From: Eric W. Biederman
Date: Tue Aug 25 2009 - 17:37:38 EST


David Dillow <dave@xxxxxxxxxxxxxx> writes:

> On Mon, 2009-08-24 at 17:51 -0700, Eric W. Biederman wrote:
>> When I decode the bits in status they are TxOK, RxOK and TxDescUnavail so it looks
>> there is some bidirectional communication going on.
>>
>> Do we really want to loop when those bits are set?
>
> Maybe not when only those bits are set, but I worry that we would trade
> one race for another where we stop getting interrupts from the card.
>
>> Perhaps we want to remove them from rtl_cfg_infos for the part?
>
> Then you'd never get an interrupt for them in the first place, I think.
>
> I'm not real happy with the interrupt handling in the driver; it makes a
> certain amount of sense to split the MSI vs non-MSI interrupt cases out.
> It also means another pass through re-auditing things against the vendor
> driver. That's more work than I'm able to commit to at the moment.
>
> I've not been able to reproduce it locally on my r8169d, running for ~30
> minutes straight at full speed. I've not tried running it in UP, though.
> Perhaps I can do that tomorrow.
>
> Here's a possible patch to mask the NAPI events while we're running in
> NAPI mode. I'm not sure it is going to help, since the intr_mask was
> 0xffff when you hit the loop guard, so I left it in for now.

Ok. I now get what your patch was trying to do and that does look like
a reasonable test.

I prefer:
while ((status != 0xffff) && (status & tp->intr_mask))

The presence of TxDescUnavail suggests as is usually the case
that the interrupt status bits are independent of which interrupts
are actually enabled to fire.

I will take a moment and give that a try.

I still like the idea of masking everything having poll do all
of the work and then unmasking everything. That seems a little less
fragile to me.

Eric

> diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
> index b82780d..12755b7 100644
> --- a/drivers/net/r8169.c
> +++ b/drivers/net/r8169.c
> @@ -3556,6 +3556,7 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
> void __iomem *ioaddr = tp->mmio_addr;
> int handled = 0;
> int status;
> + int count = 0;
>
> /* loop handling interrupts until we have no new ones or
> * we hit a invalid/hotplug case.
> @@ -3564,6 +3565,15 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
> while (status && status != 0xffff) {
> handled = 1;
>
> + if (count++ > 100) {
> + printk_once("r8169 screaming irq status %08x "
> + "mask %08x event %08x napi %08x\n",
> + status, tp->intr_mask, tp->intr_event,
> + tp->napi_event);
> + break;
> + }
> +
> +
> /* Handle all of the error cases first. These will reset
> * the chip, so just exit the loop.
> */
> @@ -3613,6 +3623,7 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
> RTL_W16(IntrStatus,
> (status & RxFIFOOver) ? (status | RxOverflow) : status);
> status = RTL_R16(IntrStatus);
> + status &= tp->intr_mask;
> }
>
> return IRQ_RETVAL(handled);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/