eepro100: wait_for_cmd_done timeout (2.4.19-pre2/8)

From: Paul Jakma (paulj@alphyra.ie)
Date: Tue May 07 2002 - 09:23:36 EST


hi,

i have a problem with a Dell poweredge with onboard Intel eepro NICs.

The network card basically doesnt work. The system logs are filled
with:

        eepro100: wait_for_cmd_done timeout!

and of course attendant "last message repeated x times". at less
frequent intervals we get NETDEV watchdog messages:

        NETDEV WATCHDOG: eth0: transmit timed out

always followed by an error message which may be descriptive:

        eth0: Transmit timed out: status 0090 0cf0 at 13
        70/1430 command 000c0000

the parameter following command is always 000c0000.
the parameter following status varies between:

        0050 0c80
        0050 0cf0
        0090 0c80
        0090 0cf0

distribution of the above is:

     5 0050 0c80
    227 0050 0cf0
     22 0090 0c80
    120 0090 0cf0

the xxxxx/yyyyy number is always different.

lspci of the network interfaces concerned:

00:01.0 Ethernet controller: Intel Corporation 82557 [Ethernet Pro 100] (rev 08)
        Subsystem: Dell Computer Corporation: Unknown device 00da
        Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR+ FastB2B-
        Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
        Interrupt: pin A routed to IRQ 16
        Region 0: Memory at fe2ff000 (32-bit, non-prefetchable) [size=4K]
        Region 1: I/O ports at ecc0 [size=64]
        Region 2: Memory at fe100000 (32-bit, non-prefetchable) [size=1M]
        Capabilities: [dc] Power Management version 2
                Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
                Status: D0 PME-Enable- DSel=0 DScale=2 PME-
00:02.0 Ethernet controller: Intel Corporation 82557 [Ethernet Pro 100] (rev 08)
        Subsystem: Dell Computer Corporation: Unknown device 00da
        Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR+ FastB2B-
        Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
        Interrupt: pin A routed to IRQ 17
        Region 0: Memory at fe2fe000 (32-bit, non-prefetchable) [size=4K]
        Region 1: I/O ports at ec80 [size=64]
        Region 2: Memory at fe000000 (32-bit, non-prefetchable) [size=1M]
        Capabilities: [dc] Power Management version 2
                Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
                Status: D0 PME-Enable- DSel=0 DScale=2 PME-

kernel version is 2.4.19-pre8, however, exact same thing occurs with
2.4.19-pre2. (its running pre8 cause we hoped it was a problem fixed
since pre2)

mii-tool -v -v eth0 shows no difference (that i see) between the
interface on the working machine and this "problem" machine:

non-working:

eth0: negotiated 100baseTx-FD flow-control, link ok
  registers for MII PHY 1:
    3000 782d 02a8 0154 05e1 45e1 0001 0000
    0000 0000 0000 0000 0000 0000 0000 0000
    0a03 0000 0001 0000 0000 0000 0000 0000
    0000 0000 0000 0000 0000 0000 0000 0000
  product info: Intel 82555 rev 4
  basic mode: autonegotiation enabled
  basic status: autonegotiation complete, link ok
  capabilities: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD
  advertising: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD flow-control
  link partner: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD flow-control

working machine:

eth0: negotiated 100baseTx-FD flow-control, link ok
  registers for MII PHY 1:
    3000 782d 02a8 0154 05e1 45e1 0001 0000
    0000 0000 0000 0000 0000 0000 0000 0000
    0a03 0000 0001 0000 0000 0000 0000 0000
    0000 0000 0000 0000 0000 0000 0000 0000
  product info: Intel 82555 rev 4
  basic mode: autonegotiation enabled
  basic status: autonegotiation complete, link ok
  capabilities: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD
  advertising: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD flow-control
  link partner: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD flow-control

The strange thing is this machine has a sister machine, an identical
poweredge bought at the same time, hooked up to the same switch,
running the same software, (exact same kernel 2.4.19-pre2 as other
machine used to run), same link negotiated, which does not have this
problem. we have changed the cable obviously, but this made no
difference.

looking at the code concerned:

static inline void wait_for_cmd_done(long cmd_ioaddr)
{
        int wait = 1000;
        do udelay(1) ;
        while(inb(cmd_ioaddr) && --wait >= 0);
#ifndef final_version
        if (wait < 0)
                printk(KERN_ALERT "eepro100: wait_for_cmd_done timeout!\n");
#endif
}

it seems the driver simply wants to read from the NIC and this doesnt
succeed (after trying 1000 times).

this, along with the fact than an identical machine has no problems,
would suggest to me i have a hardware problem. Is this a valid
assumption or are there "funnies" with the eepro100 driver or hardware
that i should be aware of? (eg is it possible the eepro100 has gotten
into some weird state?).

NB: i also tried the intel e100 driver, and curiously it prints a very
similar message to the eepro100 driver (wait_for_exec... in the case
of the intel e100 driver).

NB2: this problem may be multicast related. it started happening after
we installed and ran zebra ospfd on the machines which uses multicast.
however, running without ospfd does not cure it.

if anyone needs further info, i can provide it.

regards,

--paulj

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Tue May 07 2002 - 22:00:29 EST