Re: Async suspend-resume patch w/ completions (was: Re: Async suspend-resume patch w/ rwsems)

From: Rafael J. Wysocki
Date: Sat Dec 12 2009 - 20:16:05 EST


On Saturday 12 December 2009, Linus Torvalds wrote:
>
> On Sat, 12 Dec 2009, Rafael J. Wysocki wrote:
> >
> > I'd like to put it into my tree in this form, if you don't mind.
>
> This version still has a major problem, which is not related to
> completions vs rwsems, but simply to the fact that you wanted to do this
> at the generic device layer level rather than do it at the actual
> low-level suspend/resume level.
>
> Namely that there's no apparent sane way to say "don't wait for children".
>
> PCI bridges that don't suspend at all - or any other device that only
> suspends in the 'suspend_late()' thing, for that matter - don't have any
> reason what-so-ever to wait for children, since they aren't actually
> suspending in the first place. But you make them wait regardless, which
> then serializes things unnecessarily (for example, two unrelated USB
> controllers).

This is a problem that needs to be solved.

One solution that we have discussed on linux-pm is to start a bunch of async
threads searching for async devices that can be suspended and suspending
them (assuming suspend is considered) out of order with respect to dpm_list.
For example, leaf async devices can always be suspended at the same time
regardless of their positions in dpm_list. This way we could get almost the
entire gain resulting from suspending or resuming devices in parallel without
bothering drivers with the problem of dependencies that need to be honoured.

That's something we can add on top of this patch, though, not to complicate
things from the start and it surely requires more discussion.

> And no, making _everything_ be async is _not_ the answer.

I'm not sure what you mean, really.

Speaking of PCI bridges, even though they don't "suspend" in the sense of
being put into low power states or something, we still need to save their
registers on suspend and restore them on resume, and that restore has to
be done before we start to access devices below the bridge.

There are devices with totally null suspend and resume routines that even
the bus type doesn't really handle, but those can be marked as "async" from
the start and they won't really get in the way any more (this creates another
issue to solve, namely that we shouldn't really start a new async thread for
each of them; we have considered that too).

Even if we move that all to drivers, the constraints won't go away and someone
will have to take care of them. Now, since _we_ have problems with reaching
an agreement about how to do it, the driver writers will be even less likely to
figure that out.

Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/