Re: [PATCH] PM: sleep: core: Fix the handling of pending runtime resume requests

From: Rafael J. Wysocki
Date: Mon Aug 24 2020 - 13:31:18 EST


On Monday, August 24, 2020 5:04:21 PM CEST Alan Stern wrote:
> On Mon, Aug 24, 2020 at 03:36:36PM +0200, Rafael J. Wysocki wrote:
> > > Furthermore, by the logic used in this patch, the call to
> > > pm_wakeup_event() in the original code is also redundant: Any required
> > > wakeup event should have been generated when the runtime resume inside
> > > pm_runtime_barrer() was carried out.
> >
> > It should be redundant in the real wakeup event cases, but it may cause
> > spurious suspend aborts to occur when there are no real system wakeup
> > events.
> >
> > Actually, the original code is racy with respect to system wakeup events,
> > because it depends on the exact time when the runtime-resume starts. Namely,
> > if it manages to start before the freezing of pm_wq, the wakeup will be lost
> > unless the driver takes care of reporting it, which means that drivers really
> > need to do that anyway. And if they do that (which hopefully is the case), the
> > pm_wakeup_event() call in the core may be dropped.
>
> In other words, wakeup events are supposed to be reported at the time
> the wakeup request is first noticed, right?

That's correct.

> We don't want to wait until
> a resume or runtime_resume callback runs; thanks to this race the
> callback might not run at all if the event isn't reported first.

The callback will run, either through the wq or by the pm_runtime_barrier(),
but if it runs through the wq, pm_runtime_barrier() will return 0 and
pm_wakeup_event() will not called by the core, so it must be called from
elsewhere anyway.

> Therefore the reasoning behind the original code appears to have been
> highly suspect.

Indeed.

> If there already was a queued runtime-resume request
> for the device and the device was wakeup-enabled, the wakeup event
> should _already_ have been reported at the time the request was queued.
> And we shouldn't rely on it being reported by the runtime-resume
> callback routine.

Right.

> > > This means that the code could be simplified to just:
> > >
> > > pm_runtime_barrier(dev);
> >
> > Yes, it could, so I'm going to re-spin the patch with this code simplification
> > and updated changelog.
> >
> > > Will this fix the reported bug?
> >
> > I think so.
>
> Okay, we'll see!

Fair enough!