Re: [PATCH] USB:Fix ehci infinite suspend-resume loop issue in zhaoxin

From: Alan Stern
Date: Thu Apr 07 2022 - 10:22:50 EST


On Thu, Apr 07, 2022 at 02:15:29PM +0800, WeitaoWang-oc@xxxxxxxxxxx wrote:
> On 2022/4/7 00:20, Alan Stern wrote:
> > On Wed, Apr 06, 2022 at 10:38:28AM +0800, WeitaoWang-oc@xxxxxxxxxxx wrote:
> > > On 2022/4/6 00:02, Alan Stern wrote:
> > > > In fact, the resume kernel doesn't call ehci_resume at all. Here's what
> > > > it does:
> > > >
> > > > The resume kernel boots;
> > > >
> > > > If your patch causes STS_PCD to be set at this point, the flag
> > > > should get cleared shortly afterward by ehci_irq;
> > > >
> > > > ehci-hcd goes into runtime suspend;
> > > >
> > > > The kernel reads the system image that was stored earlier when
> > > > hibernation began;
> > > >
> > > > After the image is loaded, the system goes into the freeze
> > > > state (this does not call any routines in ehci-hcd);
> > > On this phase, pci_pm_freeze will be called for pci device. In this
> > > function, pm_runtime_resume will be called to resume already
> > > runtime-suspend devices. which will cause ehci_resume to be called.
> > > Thus STS_PCD flag will be set in ehci_resume function.
> >
> > Aha! I was missing that piece of information, thanks.
> >
> > But this still doesn't explain why check_root_hub_suspended is failing.
> > That routine checks the HCD_RH_RUNNING bit, which gets set in
> > hcd_bus_resume. hcd_bus_resume gets called as part of resuming the root
> > hub, and in ehci-hcd this happens when ehci_irq sees that STS_PCD is set
> > and calls usb_hcd_resume_root_hub. That routine queues a wakeup request
> > on the pm_wq work queue, which is then supposed to run hcd_resume_work
> > to actually restart the root hub.
> >
> > But pm_wq is a freezable work queue! While the system is in the freeze
> > state, the work queue isn't running. This means that the root hub
> > should remain suspended until the end of the freeze phase, and so the
> > call to check_root_hub_suspended should succeed.
> >
> > Can you check to see what's really happening on your system? Something
> > must be wrong with my analysis, but I can't tell what it is. I'm still
> > puzzled.
> >
> > Alan Stern
> Your analysis is right, my test platform's kernel version is not the
> latest, this kernel not call freeze_kernel_threads on software_resume
> function.
> (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/kernel/power/hibernate.c?h=v5.18-rc1&id=2351f8d295ed63393190e39c2f7c1fee1a80578f)
> So pm_wq is active and can handle root hub power events.
> Update my kernel to fix the issue in the url above, system hibernation
> test was successful with our patch(not clear STS_PCD bit).
> Thanks for your clarification.

Great! I'm glad we sorted that out.

So check_root_hub_suspended doesn't need any changes, and the patch you
already submitted takes care of everything.

Alan Stern