Re: [E1000-devel] 2.6.36 abrupt total e1000e carrier loss (cured by reboot)

From: Nix
Date: Sun Nov 14 2010 - 12:10:32 EST


On 8 Nov 2010, nix@xxxxxxxxxxxxx stated:

> On 8 Nov 2010, Emil S. Tantilov verbalised:
>
>> Nix wrote:
>>> For the record, cherry-picking
>>> ff10e13cd06f3dbe90e9fffc3c2dd2057a116e4b (the periodic
>>> phy-crash-and-reset check) atop 2.6.36 seems to have fixed it: at
>>> least, the machine has been up for a day now without trouble. This
>>> commit doesn't seem to be in Greg's stable-queue yet, but seems like
>>> a good candidate.
>>
>> This patch should have no effect on your issue if it is indeed ASPM related.
>
> Interesting. I just noticed that it was testing for exactly the same
> symptoms as I was observing (registers suddenly filled with 0xff) and
> resetting the card, and thought it might help (plus it's easier than
> installing an out-of-tree module and I'm lazy so I tried it first).

It didn't help. Unfortunately, neither did the upstream e1000e-1.2.17
module. I have now seen this network-dead bug with basic 2.6.36, with
2.6.36 plus the commit named above, and with 2.6.36 plus e1000e-1.2.17.

Any debugging I can do, just drop me a line. I'm really quite used to
rebooting this system now, what with this *and* the NFS rpc.mountd-
imploding-on-bootup bug biting simultaneously.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/