Re: [PATCH] eepro100.c, kernel 2.4.1

From: Augustin Vidovic (vido@ldh.org)
Date: Thu Feb 08 2001 - 05:41:56 EST


On Wed, Feb 07, 2001 at 11:59:05PM -0800, Ion Badulescu wrote:
> I don't think it fixes *this* bug. However, the bug workaround effectively
> reinitializes the chip, so it might serve as a generic 'reset and try
> again' kind of workaround. In that case, we might as well enable it
> unconditionally... but I don't see it as a good solution. It's a stop-gap
> measure at best.
>
> We need to find out what exactly happens. Until he tells us more about how
> his boxes "were failing before", there really isn't much we can diagnose.

Ok, then let's go into a bit more details.

First, the part of the dmesg concerning the network interfaces:

eepro100.c:v1.09j-t 9/29/99 Donald Becker http://cesdis.gsfc.nasa.gov/linux/drivers/eepro100.html
eepro100.c: $Revision: 1.36 $ 2000/11/17 Modified by Andrey V. Savochkin <saw@saw.sw.com.sg> and others
PCI: Found IRQ 5 for device 00:0c.0
PCI: The same IRQ used for device 00:0d.0
eth0: PCI device 8086:1229, 00:D0:B7:00:BE:00, IRQ 5.
  Receiver lock-up bug exists -- enabling work-around.
  Board assembly 000000-000, Physical connectors present: RJ45
  Primary interface chip i82555 PHY #1.
  General self-test: passed.
  Serial sub-system self-test: passed.
  Internal registers self-test: passed.
  ROM checksum self-test: passed (0x04f4518b).
  Receiver lock-up workaround activated.
PCI: Found IRQ 5 for device 00:0d.0
PCI: The same IRQ used for device 00:0c.0
eth1: PCI device 8086:1229, 00:D0:B7:00:BE:01, IRQ 5.
  Receiver lock-up bug exists -- enabling work-around.
  Board assembly 000000-000, Physical connectors present: RJ45
  Primary interface chip i82555 PHY #1.
  General self-test: passed.
  Serial sub-system self-test: passed.
  Internal registers self-test: passed.
  ROM checksum self-test: passed (0x04f4518b).
  Receiver lock-up workaround activated.

Please note: the "Receiver lock-up workaround activated." message is printed
now only since I applied my patch. Before, only the "enabling work-around." part
appeared, which is a bit tricky.

Second, attached to this mail is an mrtg graph png. Beware that the timeline goes
from right to left. This covers the past week. Every day the big peak is the
midnight "masturbation rush" when nearly everyone connects at the same time to
browse pr0n sites. You'll notice that the midnight peak is castrated suddenly
last friday. This accident happened 3 times the previous week. Kind of frustrating.

You can see a kind of sudden blackout which lasts about 3 hours, and then the
situation resumes to normality.

At the same time, the /var/log/messages receives thousands of messages from the
NET: subsystem.

A rather long research on the various mailing lists and newsgroups about networking
shows that this behavior is shown the same way on systems using a bugged Intel EtherExpress
Pro 100 network card.

Since the dmesg of the kernel tells about a work-around for such a bug, I was assuming
that the work around was activated, but I had a doubt and after looking at the source,
I discovered that it wasn't.

On saturday I patched the kernels, and since the midnight peaks are no longer
broken, there is no more desperate messages from the NET subsystem in the logs,
so maybe the problem has been fixed.

Now, as Ion says, maybe it is not the "receiver lock-up bug" itself which is
worked-around, frankly I don't know.

-- 
Augustin Vidovic                   http://www.vidovic.org/augustin/
"Nous sommes tous quelque chose de naissance, musicien ou assassin,
 mais il faut apprendre le maniement de la harpe ou du couteau."


- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Thu Feb 15 2001 - 21:00:11 EST