What happened to this NIC? Any ideas?

From: markus reichelt
Date: Sat Jan 24 2009 - 14:25:10 EST


Hi,

I had fun with a remote system (custom vanilla kernel 2.6.24.4), a
NIC sort-of died all of a sudden.

(note: eth4 is udev's idea, not mine - the box has just 2 NICs).

/var/log/messages:
Jan 19 09:25:59 A kernel: NETDEV WATCHDOG: eth4: transmit timed out
Jan 19 09:26:02 A kernel: eth4: link up, 100Mbps, full-duplex, lpa 0xFFFF
Jan 19 09:26:11 A kernel: NETDEV WATCHDOG: eth4: transmit timed out
Jan 19 09:26:14 A kernel: eth4: link up, 100Mbps, full-duplex, lpa 0xFFFF
[goes on for 8 more minutes]

/var/log/debug:
Jan 19 09:26:02 A kernel: eth4: Transmit timeout, status ff ffff ffff media ff.
Jan 19 09:26:02 A kernel: eth4: Tx queue start entry 73297348 dirty entry 73297344.
Jan 19 09:26:02 A kernel: eth4: Tx descriptor 0 is ffffffff. (queue head)
Jan 19 09:26:02 A kernel: eth4: Tx descriptor 1 is ffffffff.
Jan 19 09:26:02 A kernel: eth4: Tx descriptor 2 is ffffffff.
Jan 19 09:26:02 A kernel: eth4: Tx descriptor 3 is ffffffff.
Jan 19 09:26:14 A kernel: eth4: Transmit timeout, status ff ffff ffff media ff.
Jan 19 09:26:14 A kernel: eth4: Tx queue start entry 4 dirty entry 0.
Jan 19 09:26:14 A kernel: eth4: Tx descriptor 0 is ffffffff. (queue head)
Jan 19 09:26:14 A kernel: eth4: Tx descriptor 1 is ffffffff.
Jan 19 09:26:14 A kernel: eth4: Tx descriptor 2 is ffffffff.
Jan 19 09:26:14 A kernel: eth4: Tx descriptor 3 is ffffffff.
[goes on for 8 more minutes]

(FWIW, there also was a BUG with trace info via dmesg but I did not
save that info, nor did it show up in any logfile, doh)


System was rebooted but instead of getting rid of the problem, it
showed eth4 was not usable anymore. It did not show up in dmesg/via
ifconfig and the corresponding lspci -vvv entry states:

00:09.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
Unknown device 8119 (rev 10)
Subsystem: Realtek Semiconductor Co., Ltd. Unknown device 8119
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV-
VGASnoop-ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium
>TAbort- <TAbort- <MAbort- >SERR- <PERR+ INTx-
Latency: 32 (16000ns max)
Interrupt: pin A routed to IRQ 12
Region 0: I/O ports at a400 [size=256]
Region 1: Memory at d5800000 (32-bit, non-prefetchable)
[size=256]
[virtual] Expansion ROM at 30000000 [disabled] [size=64K]
Capabilities: [50] Power Management version 2
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA
PME(D0-,D1+,D2+,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-

I replaced the NIC and everything works as expected:

00:09.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
RTL-8139/8139C/8139C+ (rev 10)
Subsystem: Allied Telesyn International AT-2500TX/ACPI
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium
>TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 32 (8000ns min, 16000ns max)
Interrupt: pin A routed to IRQ 12
Region 0: I/O ports at a400 [size=256]
Region 1: Memory at d5800000 (32-bit, non-prefetchable)
[size=256]
Capabilities: [50] Power Management version 2
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA
PME(D0+,D1+,D2+,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=0 PME+
Kernel driver in use: 8139too
Kernel modules: 8139too, 8139cp



Just for the fun of it, I plugged the faulty NIC into a sole win
machine, it worked. Doesn't work in the linux machine afterwards,
though. First time that I observe such strange behaviour, any ideas
what could be the cause?

--
left blank, right bald

Attachment: pgp00000.pgp
Description: PGP signature