Re: ISSUE: DFE530-TX REV-A3-1 times out on transmit

From: Urban Widmark (urban@teststation.com)
Date: Sat Aug 25 2001 - 12:05:26 EST


On Fri, 24 Aug 2001, David Schmitt wrote:

> Aug 24 11:15:07 cheesy kernel: NETDEV WATCHDOG: eth0: transmit timed out
> Aug 24 11:15:07 cheesy kernel: eth0: Transmit timed out, status 0000, PHY status 782d, resetting...
> Aug 24 11:15:07 cheesy kernel: eth0: reset did not complete in 10 ms.
> Aug 24 11:15:07 cheesy kernel: eth0: reset finished after 10005 microseconds.
> Aug 24 11:15:07 cheesy kernel: eth0: Transmit frame #1 queued in slot 0.
[snip]
> Aug 24 11:15:07 cheesy kernel: eth0: Transmit frame #10 queued in slot 9.
> Aug 24 11:15:09 cheesy kernel: eth0: VIA Rhine monitor tick, status 0000.
> Aug 24 11:15:11 cheesy kernel: NETDEV WATCHDOG: eth0: transmit timed out
> Aug 24 11:15:11 cheesy kernel: eth0: Transmit timed out, status 0000, PHY status 782d, resetting...
> Aug 24 11:15:11 cheesy kernel: eth0: reset did not complete in 10 ms.
> Aug 24 11:15:11 cheesy kernel: eth0: reset finished after 10005 microseconds.
>
> Reloading the module doesn't help either. Only a reboot
> reenables network connectivity.

There is a patch in the 2.4.8-acX kernels that fixes a problem with
reseting the card when it is first used. I can't say that I know that it
fixes anything you are seeing, but it could be worth trying.

Did this start with recent versions, or have you never run older kernels
on this hw?

Reloading the module is to the hardware about the same as the watchdog
reset.

Rebooting obviously triggers something else too ... perhaps the BIOS talks
some sense to the card.

> [6.] A small shell script or example program which triggers the
> problem (if possible)
>
> Downloading amounts of data (>50MB) will eventually trigger
> the problem. Transmitting data at less than full speed will
> not trigger it (or at least I haven't waited long enough?)

What do you use to download? from a server on the LAN or something remote?
and how do you slow down the speed of your transmission? How fast is it
when it is fast, and how much do you slow it down?

My other machine does not have anything useful installed, but it did have
chargen and discard open.

nc other.machine chargen > /dev/null
        iptraf says about 64Mbps
nc other.machine discard < /dev/zero
        iptraf says about 44Mbps

Sending about 1.5G in both directions, without problems. I used to have a
netperf setup and that would (more or less) fill the 100Mbps.

> [X.] Other notes, patches, fixes, workarounds
>
> Further information from lspci, via-diag and ifconfig output as well
> as well as complete kernel syslog from boot to network-lock can be
> found on http://www.heureka.co.at/~david/dfe530tx/

The syslog gives a few hints that something is wrong ...

eth0: Transmit error, Tx status 00008100.
        8 - transmit error
        1 - transmit aborted after excessive collisions

but at the same time the 00 part means that the "collision retry count" is
0 and that it hasn't set a flag that it "experienced collisions in this
transmit event".

I think there were 3 of these, and from all but the last it recovers by
itself. Perhaps the collisions (or whatever it is that the card sees as
collisions) continued for a longer period.

It ends up in "eth0: transmit timed out" and the driver tries to reset the
card. That does not appear to work at all.

It's a nice report, I wish I had something more useful to reply with.

The driver source has links to some datasheets. They might be useful in
improving the reset code.
(Hmm, the tx_timeout code does: reset -> initialise ring -> wait for hw
 but initialise ring talks to the hw, perhaps it should wait for hw first
 ...)

/Urban

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Fri Aug 31 2001 - 21:00:18 EST