Re: [PATCH] USB:Fix ehci infinite suspend-resume loop issue in zhaoxin

From: WeitaoWang-oc@xxxxxxxxxxx
Date: Thu Apr 07 2022 - 02:15:41 EST


On 2022/4/7 00:20, Alan Stern wrote:
On Wed, Apr 06, 2022 at 10:38:28AM +0800, WeitaoWang-oc@xxxxxxxxxxx wrote:
On 2022/4/6 00:02, Alan Stern wrote:
In fact, the resume kernel doesn't call ehci_resume at all. Here's what
it does:

The resume kernel boots;

If your patch causes STS_PCD to be set at this point, the flag
should get cleared shortly afterward by ehci_irq;

ehci-hcd goes into runtime suspend;

The kernel reads the system image that was stored earlier when
hibernation began;

After the image is loaded, the system goes into the freeze
state (this does not call any routines in ehci-hcd);
On this phase, pci_pm_freeze will be called for pci device. In this
function, pm_runtime_resume will be called to resume already
runtime-suspend devices. which will cause ehci_resume to be called.
Thus STS_PCD flag will be set in ehci_resume function.

Aha! I was missing that piece of information, thanks.

But this still doesn't explain why check_root_hub_suspended is failing.
That routine checks the HCD_RH_RUNNING bit, which gets set in
hcd_bus_resume. hcd_bus_resume gets called as part of resuming the root
hub, and in ehci-hcd this happens when ehci_irq sees that STS_PCD is set
and calls usb_hcd_resume_root_hub. That routine queues a wakeup request
on the pm_wq work queue, which is then supposed to run hcd_resume_work
to actually restart the root hub.

But pm_wq is a freezable work queue! While the system is in the freeze
state, the work queue isn't running. This means that the root hub
should remain suspended until the end of the freeze phase, and so the
call to check_root_hub_suspended should succeed.

Can you check to see what's really happening on your system? Something
must be wrong with my analysis, but I can't tell what it is. I'm still
puzzled.

Alan Stern
Your analysis is right, my test platform's kernel version is not the
latest, this kernel not call freeze_kernel_threads on software_resume
function.
(https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/kernel/power/hibernate.c?h=v5.18-rc1&id=2351f8d295ed63393190e39c2f7c1fee1a80578f)
So pm_wq is active and can handle root hub power events.
Update my kernel to fix the issue in the url above, system hibernation
test was successful with our patch(not clear STS_PCD bit).
Thanks for your clarification.

Weitao Wang