Re: via-rhine: NETDEV WATCHDOG: eth0: transmit timed out

From: Marco Colombo (marco@esi.it)
Date: Mon Jun 05 2000 - 05:23:16 EST


On Sun, 4 Jun 2000, Urban Widmark wrote:

> On Sat, 3 Jun 2000, Marco Colombo wrote:
>
> > While testing 2.4.0-test1-ac7 I've got a similar problem with a
> > D-Link 530TX (via-rhine driver). But I think this is an old bug
> > because i can reproduce it with 2.2.15 (and RHL 2.2.14-12),
> > drivers version v1.01 2/27/99, v1.05 4/08/2000.
> > With 2.4.0-test1-ac7 i'm using the included 1.05-LK1.1.5 5/2/2000.
>
> Do you get this problem with other 2.3/2.4.0-test versions too?
> (Since you can make it happen with 2.2 I guess you probably can)

I'll try. I'm not into 2.3 testing really. I've tested 2.4.0-test1
because i saw the driver has been updated.

> However, looking at the fix for the tulip problem I don't think it's the
> same problem. Just the same message.
>
> > via-rhine.c:v1.05-LK1.1.5 5/2/2000 Written by Donald Becker
> > http://www.scyld.com/network/via-rhine.html
> > eth0: VIA VT3043 Rhine at 0xa400, 00:50:ba:c1:e8:93, IRQ 10.
> > eth0: MII PHY found at address 8, status 0x782d advertising 05e1 Link 41e1.
> > eth0: Setting full-duplex based on MII #8 link partner capability of 41e1.
> >
> > NETDEV WATCHDOG: eth0: transmit timed out
> > eth0: Transmit timed out, status 0000, PHY status 782d, resetting...
> > NETDEV WATCHDOG: eth0: transmit timed out
> > eth0: Transmit timed out, status 0000, PHY status 782d, resetting...
>
> Does it send anything at all? (sbin/ifconfig, TX packets)
> Do you get any interrupts counted in /proc/interrupts?

Sorry for being unclear. The empty line means quite a lot of time.
It works fine, usually, under low load. I've actually used it for
a week or so, without problems. Also, a few tests with samba
(as a server) were fine. The card works, both at 100BaseT
full-duplex (connected to a D-Link switch) and at 10BaseT
(connected to a hub). Let me give more details of my setup:

I run into this problem a few months ago, I believe on a K6, Asus P5A
system. It worked fine, but a large FTP transfers triggered the same
problem. I tried replacing the hub, the cables, with no luck. Since I
had little time to test it, I finally replaced the card.
Recently I put it into a Athlon, Asus K7V system. Knowning of the possible
problem, I've tested both FTP and samba workloads, but all went fine
(under 2.2.xx). But playing with X made it hang. See below for details...

> There is an old thread on lkml (named '[2.3.51] via-rhine died' around
> March 11, links to archives are of course at http://www.tux.org/lkml)
> where some, myself included, started getting these messages. It turned out
> to be a misaligned buffer that prevented the card from sending anything.

Found it.
I've tried the patch, but nothing changed.

> For what it's worth, things are working fine for me. Here is my dmesg for
> 2.4.0-test1-ac7.
>
> via-rhine.c:v1.05-LK1.1.5 5/2/2000 Written by Donald Becker
> http://www.scyld.com/network/via-rhine.html
> eth1: VIA VT3043 Rhine at 0xd400, 00:50:ba:a4:15:86, IRQ 19.
> eth1: MII PHY found at address 8, status 0x7809 advertising 05e1 Link 0000.
> (and Link becomes 41e1 when I turn on the machine "at the other end")
>
>
> > Under 2.2.1[45] the messages were slighlty different:
> >
> > via-rhine.c:v1.01 2/27/99 Written by Donald Becker
> > http://cesdis.gsfc.nasa.gov/linux/drivers/via-rhine.html
> > eth0: VIA VT3043 Rhine at 0xa400, 00:50:ba:c1:e8:93, IRQ 10.
> > eth0: MII PHY found at address 8, status 0x782d advertising 05e1 Link 41e1.
> > eth0: Setting full-duplex based on MII #8 link partner capability of 41e1.
> >
> > eth0: Something Wicked happened! 001a.
> > last message repeated 2 times
>
> These are easy to generate with the 2.2 driver, using 'ping -f', netperf,
> apache bench (ab), or similar ...
>
> > eth0: Transmit timed out, status 0000, PHY status 782d, resetting...
> > last message repeated 7 times
>
> ... but I have never managed to get this with 2.2. Hmm.
>
> What do you have to do in 2.2 to generate these?

It happens with X traffic. I just open a gnome-terminal with
DISPLAY=some.host:0, choose Preference, Colors, fire the color picker,
and start moving the color point around with the mouse button held down
(this makes all the color indicators and the bars move). After a few
seconds the card hangs. I've managed to get this also playing with the
Gimp. Sometime just a find / on the terminal causes that. The "color
picker method" is just the faster (and more reliable) in making it happen.
The host on the other side (the one which runs the X server) has a
10Mbps card (it's Sun Ultra 1 running Linux, BTW).

I've made a few other tests while writing this message. Oddly enough,
the same "color picker method" works if the X server runs on a PC
running Linux, but I wasn't able to reproduce the problem with a
Solaris box.

Now I'll swap the Asus MB with a GA-71XE.

> Oh, and which compiler do you use? Does it go away if you switch to
> something like "good old" gcc 2.7.2.3? (using egcs-1.1.2 myself). Some
> more experimental/recent gcc's have miscompiled some via-rhine versions.

The behavior is just the same with the (precompiled) RedHat 2.2.14-12
kernel. Anyway:

# gcc -v
gcc version egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)

>
> /Urban
>
>

.TM.

-- 
      ____/  ____/   /
     /      /       /			Marco Colombo
    ___/  ___  /   /		      Technical Manager
   /          /   /			 ESI s.r.l.
 _____/ _____/  _/		       Colombo@ESI.it

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Wed Jun 07 2000 - 21:00:20 EST