Re: [E1000-devel] e1000: "eeprom checksum is not valid" after kexec

From: Rafael J. Wysocki
Date: Fri Apr 24 2009 - 12:11:18 EST


On Thursday 23 April 2009, Thadeu Lima de Souza Cascardo wrote:
> On Thu, Apr 23, 2009 at 10:40:14PM +0200, Jiri Slaby wrote:
> > On 04/23/2009 04:41 PM, Thadeu Lima de Souza Cascardo wrote:
> > > On Thu, Apr 23, 2009 at 04:30:01PM +0200, Jiri Slaby wrote:
> > >> On 04/23/2009 04:10 PM, Thadeu Lima de Souza Cascardo wrote:
> > >>> Have you tried b43fcd7dc7b, found in v2.6.30-rc3?
> > >> I've tried 2.6.30-rc3-next-20090423 without success.
> > >
> > > You mean next-20090423. The patch is really found there.
> > >
> > > But, then, I realize you mean reverting these patches for the kernel
> > > that is running or the kernel that is being kexec'd?
> >
> > The latter.
> >
> > > If b43fcd7dc7b is applied to the running kernel, it fixes the shutdown
> > > issue, and the next loaded kernel probes e1000 fine.
> >
> > Makes sense.
> >
> > > If you are reverting 4a865905f in the kexec'd kernel and the running
> > > kernel does not have b43fcd7dc7b, then I'd like to test the revert for
> > > my case here, which is e100.
> >
> > To make things clear: on that machine, there was stock opensuse 11.1
> > distro kernel which is 2.6.27-based (no b43fcd7dc7b). I needed to debug
> > a wireless bug, so I kexec'ed wireless-testing (contains 4a865905f already).
> >
> > So in fact, 4a865905f from the testing kernel triggered a bug fixed in
> > near past by b43fcd7dc7b.
> >
> > Did the other two e100* drivers suffer from the same and were fixed
> > recently? It would render kexec pretty unusable from the older kernels
> > if this is not going to be fixed anyhow :(.
>
> Yes, as well as some other network drivers, it seems. My fix for e100
> should be in Jeffrey Kirsher's tree by now and go into netdev and rc4
> soon, I expect.
>
> But, since I also thought that it would be good to fix that and allow
> people to kexec from earlier kernels, I did a followup to e100-devel,
> linux-pci, netdev and Rafael Wysocki. I didn't include linux-kernel,
> which I have just fixed, bouncing the message (oops!). I may bounce it
> to you too, if you want that.
>
> Your findings shed a light into that problem. But I could find it in
> very early kernels too for some configurations, and these commits you
> are reverting may only fix the issue for the most common configurations
> out there. That is, it was very easy to trigger the shutdown bug with
> these patches. But I think there are some other bugs out there that will
> trigger it, and they are not that easy bisecting, it seems, since only
> some very particular configurations trigger it.
>
> I will do some tests with the commits you mention and reproduce the
> problem using as earlier kernels as I can and send the config.

Cascardo, Jiri, can you tell me please what the status here is?

My understanding is that the commit pointed to by Jiri caused a problem
if the current mainline kernel was kexeced from an older kernel (2.6.27.x from
openSUSE-11.1 in this particular case), because the older kernel didn't
have the recent network driver fixes applied. Is this correct?

Also, I'm still interested in whether or not removig the following three lines:

/* Check if we're already there */
if (dev->current_state == state)
return 0;

from pci_set_power_state() in the current mainline kernel fixes the problem
in the configuration where it is readily reproducible.

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/