Re: [patch] MSI-X: fix resume crash

From: Eric W. Biederman
Date: Thu Mar 29 2007 - 01:04:30 EST


Len Brown <lenb@xxxxxxxxxx> writes:

>> Tony, Len the way pci_disable_device is being used in a suspend/resume
>> path by a few drivers is completely incompatible with the way irqs are
>> allocated on ia64. In particular people the following sequence occurs
>> in several drivers.
>>
>> probe:
>> pci_enable_device(pdev);
>> request_irq(pdev->irq);
>> suspend:
>> pci_disable_device(pdev);
>> resume:
>> pci_enable_device(pdev);
>> remove:
>> free_irq(pdev->irq);
>> pci_disable_device(pdev);
>
> There are no IA64 machines that support system suspend/resume today --
> so you have 0 chance of breaking the IA64 suspend/resume installed base.

Ok. So that is why the inconsistency persists...

> My understanding is that Luming Yu has cobbled IA64 S4 support
> together for a future release though.
>
>> What I'm proposing we do is move the irq allocation code out of
>> pci_enable_device and the irq freeing code out of pci_disable_device in
>> the future. If we move ia64 to a model where the irq number equal the
>> gsi like we have for x86_64 and are in the middle of for i386 that
>> should be pretty straight forward. It would even be relatively simple
>> to delay vector allocation in that context until request_irq, if we
>> needed the delayed allocation benefit. Do you two have any problems
>> with moving in that direction?
>
> I think consistency here would be _wonderful_.
> Of course the beauty of having identity GSI=IRQ and a /proc/interrupts
> that tells you what IOAPIC pin you are using become moot with MSI --
> but hey, showing the IRQ number rather than the vector number
> is consistent and makes sense.

Yes. It also allows for bigger machines. And I can get a consistent
number out of MSI if we allocate irq numbers in a sufficiently non-sparse
way. Something like bus|device|func|irq which is 8+5+3+12 or 28 bits...
I'll never get there though if i keep unearthing this long standing bugs.

>> If fixing the arch code is unacceptable for some reason I'm not aware of
>> we need to audit the 10-20 drivers that call pci_disable_device in their
>> suspend/resume processing and ensure that they have freed all of the
>> irqs before that point. Given that I have bug reports on the msi path I
>> know that isn't true.
>
> I think the suspend/resume interrupt logic needs some serious attention.
> We've had several schemes for suspend/resume of interrupts, several
> changes in strategy, and right now I think we are inconsistent,
> and frankly, I'm amazed it works at all.

What I have been doing lately is to aim at consistency in how a function
is called (and thus how it is expected to be used) and how it is actually
implemented. When I have a choice I try to pick a forgiving implementation
so that driver writers don't have to follow a magic correct path for
things to work correctly.

Removing the irq assignment from pci_enable_device is something that
matches implementation with use.

As for the rest it seems reasonable to me to allow an irq to be held
requested over suspend/resume and to save and restore apic and msi
capability state. Especially since irq numbers are a kernel
abstraction we should be able to do with them what we need to.

Honestly the whole suspend/resume thing is beyond me at this point I'm
laptop free... But I do know how to make code consistent with itself.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/