Re: NIU - Sun Neptune 10g - Transmit timed out reset (2.6.24)

From: Jesper Krogh
Date: Mon May 26 2008 - 16:55:35 EST


David Miller wrote:
From: David Miller <davem@xxxxxxxxxxxxx>
Date: Mon, 26 May 2008 12:33:38 -0700 (PDT)

From: Jesper Krogh <jesper@xxxxxxxx>
Date: Mon, 26 May 2008 21:03:34 +0200

Ok. Now I also hit it in production with the NFS-server, so this
is definately a real bug somewhere in the driver. Should I register it
at bugzilla?
Please feel free to do that.

BTW, I did stare at some of the transmit code of the NIU driver
while flying from Tokyo to Seattle a few hours ago, and I
found one possible theory on the transmit timeouts.

Can you try the patch below and let us know if the symptoms
continue?

[ Note to Matheos: The IRQ marking scheme of the NIU doesn't mesh
well with how things work under Linux. We really needs a
"TX queue empty" interrupt status in order to handle all cases
properly. Otherwise we really cannot decide not mark some TX
descriptors without potentially entering a deadlock condition. ]

diff --git a/drivers/net/niu.c b/drivers/net/niu.c
index 918f802..7ab7f8e 100644
--- a/drivers/net/niu.c
+++ b/drivers/net/niu.c
@@ -6165,7 +6165,7 @@ static int niu_start_xmit(struct sk_buff *skb, struct net_device *dev)
rp->tx_buffs[prod].mapping = mapping;
mrk = TX_DESC_SOP;
- if (++rp->mark_counter == rp->mark_freq) {
+ if (1 /*++rp->mark_counter == rp->mark_freq*/) {
rp->mark_counter = 0;
mrk |= TX_DESC_MARK;
rp->mark_pending++;

Applied and running.. I've now pushed 400GB of data through it trying to
get it to hit the bug but it is still running.

So without saying that it solved the problem, it definately seems so.
2.6.26-rc4 + above patch.

Jesper
--
Jesper
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/