Re: [linux-usb-devel] OHCI problems with suspend/resume

From: David Brownell (david-b@pacbell.net)
Date: Thu Jul 24 2003 - 12:10:16 EST


Pavel Machek wrote:
> Hi!

Good morning to you too!

>>>In 2.6.0-test1, OHCI is non-functional after first suspend/resume, and
>>>kills machine during secon suspend/resume cycle.
>>
>>Hmm, last time I tested suspend/resume it worked fine.
>>That was 2.5.67, but the OHCI code hasn't had any
>>relevant changes since then.
>
>
>>Evidently your system used different suspend/resume paths
>>than mine did ... :)
>
>
> Can you try echo 4 > /proc/acpi/sleep? echo 3 breaks it, too, but that
> is little harder to set up.

I usually test with "apm -s" ... since I've yet to come up with
an ACPI configuration that works properly. IRQ misconfiguration
for USB is still a blocking issue for many people (not just me).

Going through ACPI would certainly explain some breakage; it's
been sufficiently troublesome with USB that it's not gotten much
testing at all. I happened to notice this morning that ACPI's
USB IRQ problems are one of the longest-standing open 2.6 bugs:
http://bugme.osdl.org/show_bug.cgi?id=10 ... and it's now been
migrated into the 2.4.22-pre series (sigh).

Could you try reproducing this failure using just APM? I could
believe there's a generic PM issue (I've been expecting 2.6-test
to eventually start shaking PM out); but given the amount of
trouble ACPI has caused, we should first rule that factor out.

>>>What happens is that ohci_irq gets ohci->hcca == NULL, and kills
>>>machine. Why is ohci->hcca == NULL? ohci_stop was called from
>>>hcd_panic() and freed ohci->hcca.
>>
>>Then the problem is that an IRQ is still coming in after the
>>HCD panicked.
>
>
> Actually, as PCI interrupts are shared, I do not find that too
> surprising.

I do. Sharing is irrelevant. If it's been cleaned up, then
the IRQ should no longer be bound to that device.

>>>I believe that we should
>>>
>>>1) not free ohci->hcca so that system has better chance surviving
>>>hcd_panic()
>>
>>Not ever????
>>
>>It's freed in exactly one place, after everything should be
>>shut down. If it wasn't shut down, that was the problem.
>>
>>Could you instead figure out why it wasn't shut down?
>
>
> In case of hcd_panic(), where is interrupt deallocated?

I'll have to check that out, but I noticed that one of my
usual development machines (on which suspend/resume can
actually be made to work) is now unusable due to some PCI
initialization issue with Cardbus.

>>>and
>>>
>>>2) inform user when hcd panics.
>>
>>The OHCI code does that already on the normal panic path
>>(controller delivers a Unrecoverable Error interrupt), but
>>you're right that this would be better as a generic KERN_CRIT
>>diagnostic. (But one saying which HCD panicked, rather than
>>leaving folk to guess which of the N it applied to...) And
>>I'd print that message sooner, not waiting for that task to
>>be scheduled.
>
>
> That would be good. I definitely had another failure path, where it
> did not tell me that hcd is no K.O...

I'll likely submit that to Greg in the next few days, cc you.

- Dave

> Pavel

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Thu Jul 31 2003 - 22:00:23 EST