Just had to reboot our server after some problems... Had a rack of
"transmit timed out, tx_status 00 status e000"s (one with 8000)
interspersed with "Host error, FIFO diag reg [08]400". And then
networking was no more :-)
rmmodded and reinsmodded the driver and then got a number of
similar looking oopses.
I've tried to point out to Donald that there is a questionable race in
his hard_start_xmit() routines.
I have seen the same tx timeouts when I got the 3c59x driver working
on the Cobalt Qube. The dev->tbusy flag can be cleared by quite a few
conditions in the interrupt handler, and this makes me feel very
uneasy about possible corruptions in the hard_start_xmit handler.
Also registers are spun upon in this piece of code as well, which an
interrupt handler can cause trouble for.
So in boomerang_start_xmit() I do a save_flags/cli around the whole
packet sending setup sequence. Ie. the function is now:
static int
boomerang_start_xmit(struct sk_buff *skb, struct device *dev)
{
struct vortex_private *vp = (struct vortex_private *)dev->priv;
int ioaddr = dev->base_addr;
#ifndef final_version
if (skb == NULL || skb->len <= 0) {
printk("%s: Obsolete driver layer request made: skbuff==NULL.\n",
dev->name);
dev_tint(dev);
return 0;
}
#endif
if (test_and_set_bit(0, (void*)&dev->tbusy) != 0) {
if (jiffies - dev->trans_start >= TX_TIMEOUT)
vortex_tx_timeout(dev);
return 1;
} else {
/* Calculate the next Tx descriptor entry. */
int entry = vp->cur_tx % TX_RING_SIZE;
struct boom_tx_desc *prev_entry =
&vp->tx_ring[(vp->cur_tx-1) % TX_RING_SIZE];
unsigned long flags;
int i;
if (vortex_debug > 3)
printk("%s: Trying to send a packet, Tx index %d.\n",
dev->name, vp->cur_tx);
if (vp->tx_full) {
if (vortex_debug >0)
printk("%s: Tx Ring full, refusing to send buffer.\n",
dev->name);
return 1;
}
/* end change 06/25/97 M. Sievers */
vp->tx_skbuff[entry] = skb;
vp->tx_ring[entry].next = 0;
vp->tx_ring[entry].addr = virt_to_bus(skb->data);
vp->tx_ring[entry].length = skb->len | LAST_FRAG;
vp->tx_ring[entry].status = skb->len | TxIntrUploaded;
save_flags(flags);
cli();
outw(DownStall, ioaddr + EL3_CMD);
/* Wait for the stall to complete. */
for (i = 60; i >= 0 ; i--)
if ( (inw(ioaddr + EL3_STATUS) & CmdInProgress) == 0)
break;
prev_entry->next = virt_to_bus(&vp->tx_ring[entry]);
if (inl(ioaddr + DownListPtr) == 0) {
outl(virt_to_bus(&vp->tx_ring[entry]), ioaddr + DownListPtr);
queued_packet++;
}
outw(DownUnstall, ioaddr + EL3_CMD);
restore_flags(flags);
vp->cur_tx++;
if (vp->cur_tx - vp->dirty_tx > TX_RING_SIZE - 1)
vp->tx_full = 1;
else { /* Clear previous interrupt enable. */
prev_entry->status &= ~TxIntrUploaded;
dev->tbusy = 0;
}
dev->trans_start = jiffies;
return 0;
}
}
I have yet to see a tx timeout no matter how much load or stress or
disk activity I put on the machine. I still have not heard back from
Donald on any of the mails I sent him about this problem.
I have a fix, so I figured I'd share it with everyone else.
Later,
David S. Miller
davem@dm.cobaltmicro.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu