RE: [2.6.0-mm2] e100 driver hangs after period of moderate receive load

From: Feldman, Scott
Date: Sun Jan 04 2004 - 19:28:19 EST


Lennert Buytenhek wrote:
> Make sure you have slab debugging enabled to see if you also
> get the slab corruption messages, and then hit the machine
> with anything above 50000 packets per second. pktgen from a
> different machine on the same subnet works nicely for that.
> I doubt that downloading a Red Hat iso would give you a load
> anywhere near that.
>
> Oh, do you have an SMP box? This was on a 2-way (4-way HT)
> SMP box. Not sure if that matters here.
>
> I'm just about to try 2.6.0-mm2 without NAPI.

Ok, I've repro'd this (w/ and w/o NAPI, but w/o is much harder). I
wasted a bunch of time having both page alloc debugging and slab
debugging on. Seems one masks the other. (Jeff warned me!) In any
case, what I know so far is the problem happens when HW runs out of Rx
resources, and SW tries to resume the receiver after supplying new
resources. Somehow HW is scribbling on resources already given back to
the OS. It's something about the list management in this new e100.
eepro100 and legacy e100 work fine. Investigation continuing...

-scott
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/