Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts

From: Eric W. Biederman
Date: Tue Aug 25 2009 - 16:22:33 EST


David Dillow <dave@xxxxxxxxxxxxxx> writes:

> On Mon, 2009-08-24 at 17:51 -0700, Eric W. Biederman wrote:
>> When I decode the bits in status they are TxOK, RxOK and TxDescUnavail so it looks
>> there is some bidirectional communication going on.
>>
>> Do we really want to loop when those bits are set?
>
> Maybe not when only those bits are set, but I worry that we would trade
> one race for another where we stop getting interrupts from the card.
>
>> Perhaps we want to remove them from rtl_cfg_infos for the part?
>
> Then you'd never get an interrupt for them in the first place, I think.
>
> I'm not real happy with the interrupt handling in the driver; it makes a
> certain amount of sense to split the MSI vs non-MSI interrupt cases out.
> It also means another pass through re-auditing things against the vendor
> driver. That's more work than I'm able to commit to at the moment.
>
> I've not been able to reproduce it locally on my r8169d, running for ~30
> minutes straight at full speed. I've not tried running it in UP, though.
> Perhaps I can do that tomorrow.
>
> Here's a possible patch to mask the NAPI events while we're running in
> NAPI mode. I'm not sure it is going to help, since the intr_mask was
> 0xffff when you hit the loop guard, so I left it in for now.

Interesting.

If I understand this correctly the situation is that we have on the
chip there is correct logic for a level triggered interrupt and that
the msi logic sits on it and sends an event when the interrupt signal
goes high, but when we acknowledge some bits but not all it does not
send another interrupt.

Baring playing games with what version of the card has working logic
and which does not we seem to have to simple choices (if we don't want
to loop possibly forever).
- Don't use the msi logic on this card.
- Move all of the logic into rtl8169_poll and only come out of NAPI
mode when we have caught up with all of the interrupt work.

Is that how you understand the hardware issue you are trying to work
around?

Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/