Re: [PATCH 2/5] e1000e: fix pci device enable counter balance

From: Konstantin Khlebnikov
Date: Tue Jan 29 2013 - 01:45:27 EST


Bjorn Helgaas wrote:
[+cc Rafael @sisk.pl]

On Mon, Jan 28, 2013 at 4:09 PM, Bjorn Helgaas<bhelgaas@xxxxxxxxxx> wrote:
[+cc Rafael]

On Fri, Jan 18, 2013 at 4:42 AM, Konstantin Khlebnikov
<khlebnikov@xxxxxxxxxx> wrote:
__e1000_shutdown() calls pci_disable_device() at the end, thus __e1000_resume()
should call pci_enable_device_mem() to keep enable counter in balance.

Bug was introduced in commit 23606cf5d1192c2b17912cb2ef6e62f9b11de133
("e1000e / PCI / PM: Add basic runtime PM support (rev. 4)") in v2.6.35

Signed-off-by: Konstantin Khlebnikov<khlebnikov@xxxxxxxxxx>
Cc: e1000-devel@xxxxxxxxxxxxxxxxxxxxx
Cc: Jeff Kirsher<jeffrey.t.kirsher@xxxxxxxxx>
Cc: Bruce Allan<bruce.w.allan@xxxxxxxxx>
---
drivers/net/ethernet/intel/e1000e/netdev.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
index 2853c11..6bce796 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -5598,6 +5598,13 @@ static int __e1000_resume(struct pci_dev *pdev)
pci_restore_state(pdev);
pci_save_state(pdev);

+ err = pci_enable_device_mem(pdev);
+ if (err) {
+ dev_err(&pdev->dev,
+ "Cannot re-enable PCI device after suspend.\n");
+ return err;
+ }

Reviewed-by: Bjorn Helgaas<bhelgaas@xxxxxxxxxx>

Seems right to me, and the other users I looked at (igb, azx,
virtio_pci) call pci_disable_device() in .suspend() and call
pci_enable_device() in .resume() as you propose to do here.

I assume the e1000 folks will handle this patch (and the previous one).

+
e1000e_set_interrupt_capability(adapter);
if (netif_running(netdev)) {
err = e1000_request_irq(adapter);


I'm still missing something. In your original report
(https://lkml.org/lkml/2013/1/1/25), you noticed that "enable_cnt ==
0" immediately after boot, after e1000e had claimed the device:

Yep, it rise counter from 0 to 1, and runtime-suspend immediately
decrease it back to 0.


Right after boot it looks like this:

root@zurg:/sys/bus/pci/devices# lspci
...
00:19.0 Ethernet controller: Intel Corporation 82579LM Gigabit Network Connection (rev 04)
...
root@zurg:/sys/bus/pci/devices# cat 0000\:00\:19.0/enable
0
here must be '1', not '0'

But these patches only change the e1000e suspend/resume path. How
could they change the enable_cnt before you've even done a suspend?

suspend/resume and runtime_suspend/runtime_resume callbacks calls the one
set of functions: __e1000_shutdown() / __e1000_resume()

Any suspend-resume cycle breaks enable_ent balance.
Thus right after boot and first runtime-suspend device cannot wake up
due to first sort of bugs and after first s2ram suspend-resume cycle
driver breaks it's enable_cnt and device no longer can sleep due to
second sort of bugs.


Bjorn

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/