Re: [patch] revert: [NET]: Fix races in net_rx_action vs netpoll

From: Olaf Kirch
Date: Thu Jul 19 2007 - 05:46:02 EST


On Thursday 19 July 2007 11:09, Ingo Molnar wrote:
> the e1000 in this laptop is historically pretty robust. The only problem
> i ever had with it were some rx/tx hw-engine latency problems [pings
> from the outside took up to 1 second to propagate] that were quickly
> fixed by the e1000 driver guys. Maybe that's related. (although it never
> caused total inavailability of networking - it was only latency
> problems)

I've been poring over this code for 3 days now, and I'm facing a blank
wall, mind-wise :-)

- it is pretty clear that net_rx_action is invoked every once
in a while only. netdev watchdog timeouts are a pretty
unmistakable sign for that.

- You say that netconsole output continues to trickle after
the network gets wedged. This could be caused by the
e1000 watchdog, which triggers a NIC interrupt "to ensure
rx ring is cleaned". I assume that this triggers the
regular e1000_intr, which succeeds in putting the NIC on
the poll_list, and net_rx_action call dev->poll once.

If this assumption is true, this means that
- once an interrupt gets through, NAPI is working
as designed
- no other interrupts are arriving (Rx, Tx-completion)

So, can you verify whether there are any interrupts arriving on the
NIC after the network got wedged? You could also try
ethtool -s eth0 msglevel 65535 - would be interesting to see what
dmesg contains. If there's little to no debug output from the
driver, let it run for 10 seconds or so, in order to catch the
e1000 watchdog timer a few times.

Olaf
--
Olaf Kirch | --- o --- Nous sommes du soleil we love when we play
okir@xxxxxx | / | \ sol.dhoop.naytheet.ah kin.ir.samse.qurax
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/