Re: 2.6.29-rc3: tg3 dead after resume

From: Linus Torvalds
Date: Wed Jan 28 2009 - 20:10:03 EST




On Wed, 28 Jan 2009, Parag Warudkar wrote:
>
> This is similar to the issue reported back in Jul 2007 -
> http://kerneltrap.org/mailarchive/linux-kernel/2007/8/1/154073/thread
> which was fixed with a patch to unconditionally save/restore pci config
> space - that one is still in tg3.c.

In fact, the new PCI suspend/restore code should have made that
unnecessary, since the PCI layer now makes sure that a save/restore is
done even if the driver hadn't done it.

But at the same time, still having the driver do it certainly shouldn't
have _hurt_ anything either. But it's quite possible that the tg3 thing is
very sensitive to the exact order things happen in - there's a lot of
comments about bugs in there ;)

> After resume tg3 complains that no firmware is running and eth0 is
> non-existent. Rmmoding and modprobing tg3 again causes some timeouts and
> errors from tg3 and the link still doesn't work.

That seems to imply that even the reset failed, which is interesting.

But it also possibly means that the problem is not necessarily the driver
itself, but some cached state that we keep around in "struct pci_dev" even
across a module load/unload.

For example, if we get the "dev->current_state" cache wrong, then we may
not actually end up changing it when we should, because we think we
already match the target state. I don't _think_ that is it, but that's the
kind of thing that could happen.

Can you do a

lspci -vvxxx -s [tg3-device]

before-and-after suspend? Is there some state that looks like it got
corrupted?

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/