Re: via-rhine: NETDEV WATCHDOG: eth0: transmit timed out

From: Marco Colombo (marco@esi.it)
Date: Fri Jun 09 2000 - 04:15:45 EST


On Thu, 8 Jun 2000, Urban Widmark wrote:

> On Thu, 8 Jun 2000, Marco Colombo wrote:
>
> > > 001a is: transmit buffer underflow, packet transmission aborted because of
> > > excessive collision, packet transmitted with no errors.
> > > or IntrTxDone | IntrTxAbort | IntrTxUnderrun.
> > >
> > > With debug > 1 you should get "Transmitter underrun" messages too. Do you?
> >
> > Yes, but very few of them. They seem to be unrelated.
>
> You should get as many as you get 001a's (anything 0x0010). If you look at
> via_rhine_error it does:
> if (intr_status & IntrTxUnderrun) {
> and later it prints the "Something wicked" message. But maybe it's not
> always 001a that is the error interrupt status? That would explain it.

No, you're right! There's one "Transmitter underrun" every
"Something Wicked".

> > Yes, the card recovers (I'll try your patch, just let me complete tests
> > with the GA-7IXE). But it's almost useless. If I keep doing the "color
> > picker trick", it stops for 4-5 seconds (tcpdump shows that the card
>
> CmdReset shouldn't be any better, CmdStop is probably just fine. It's the
> same idea and probably a waste of time, unless you don't have any
> better things to test.
>
> > on the K7V:
> > - it happens while the X server is running on Linux/Sparc, with 10Mbps eth;
> > - it happens while the X server is running on Linux/i386, with 100Mbps eth
> > (the other card is a DFE530TX, the switch is a D-Link 10/100);
> > - it does NOT happen while the X server is running on Solaris/Sparc,
> > with 10Mbps eth;
>
> eh ... ?

Yes, weird. Somehow it does NOT happen with Solaris... when the K7V
is back I'll test also a Windows X server, just to see. I'll also try and
find some other way to make it fail in a reproducible way. The only thing
about Solaris i can think of is that it has slightly different timings
in sending out ACK packets... but I need to take a couple of tcpdumps to
support that. I've tried it many times, because I could not believe it...
the setup is:

                        +-------+ ----------- Xa
        test ---------- | switch|
                        +-------+ ----------- Xb
                            |
                        +-------+
                        | hub | ----------- Xc
                        +-------+

Xa, Xb, Xc are workstations running X11:
- Xa SS4, Solaris 2.5.1
- Xb PC, booting Linux, FreeBDS, Win98 (K6 on P5A, DFE530TX)
- Xc Ultra 1, RHL6.1 + kernel-2.2.15

test the Athlon PC with the DFE530TX, the one with problem
                if the MB is K7V. With GA-7IXE it works fine...

switch is a 5 ports D-Link 10/100 switch
hub is a 16 ports 10 hub

I run gnome-terminal on test, with DISPLAY=X[abc]:0, and play the
"color picker trick". On Xb and Xc it stops after 1-2 seconds, 100%
reproducible (of course it's 'test' that stops sending). But on Xa
everything's fine. Of course, I've tested different cables, ports, and
so on. I've tried connecting Xc to the switch, or Xb to the hub.
I find this interesting enough, because of course all this is costing me
a lot more than just replacing the card B-) (but replacements are already
on their way, since I also need the job done). I still think that a
software solution can be found, and it's somehow hidden in the Linux-Solaris
interaction.

> > I remember it was a failure). So it can't be (only) a driver bug, i think.
>
> I hope, a hardware bug is an easy explanation. The problem is how to show
> its only hw (but you seem to be working hard on it :).
>
> > On the K7V, I've also played a little with setpci:
> >
> > With lspci I saw:
> > # lspci -d 1106:3043 -vv | grep Latency
> > Latency: 118 min, 152 max, 64 set, cache line size 08
>
> Same here.
>
> /Urban
>
>

.TM.

-- 
      ____/  ____/   /
     /      /       /			Marco Colombo
    ___/  ___  /   /		      Technical Manager
   /          /   /			 ESI s.r.l.
 _____/ _____/  _/		       Colombo@ESI.it

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Thu Jun 15 2000 - 21:00:17 EST