Re: Solving suspend-level confusion

From: David Brownell
Date: Sat Jul 31 2004 - 09:29:29 EST


On Friday 30 July 2004 22:49, Benjamin Herrenschmidt wrote:
> On Sat, 2004-07-31 at 14:02, David Brownell wrote:
> > On Friday 30 July 2004 09:44, Pavel Machek wrote:
> >
> > > * system-wide suspend level is always passed down (it is more
> > > detailed, and for example IDE driver cares)
> >
> > This bothers me -- why should a "system" suspend level matter
> > to a device-level suspend call? Seems like if IDE cares, it's
> > probably being given the wrong device-level suspend state,
> > or it needs more information than the target device state.
>
> Well, as I explained you during OLS ... Drivers will act differently
> for suspend-to-ram, suspend-to-disk, and eventually other states we
> may introduce, like a system "idle" state.

We didn't quite get into details enough to have it be "explain"
though ... it was more of "assert"!


> Disks in general are an example (IDE beeing the one that is currently
> implemented, but we'll probably have to do the same for SATA and SCSI
> at one point), you want to spin them off (with proper cache flush
> etc...) when suspending to RAM, while you don't when suspending to
> disk, as you really don't want them to be spun up again right away to
> write the suspend image.

So suspend-to-RAM more or less matches PCI D3hot, and
suspend-to-DISK matches PCI D3cold. If those power states
were passed to the device suspend(), the disk driver could act
appropriately. In my observation, D3cold was never passed
down, it was always D3hot.

These look to me like "wrong device-level suspend state" cases.


> USB is another example. Typically, suspend-to-RAM wants to do a bus
> suspend, eventually enabling remote wakeup on some devices, and expects
> to recover the bus on wakeup, while suspend-to-disk is roughtly
> equivalent to a full shutdown & reconnect on wakeup.

Same thing: an HCD could do the right thing if it was told to go
into D1, D2, or D3hot (supports USB suspend) vs D3cold (doesn't).

Though the PM core doesn't cooperate at all there. Neither the
suspend nor the resume codepaths cope well with disconnect
(and hence device removal), the PM core self-deadlocks since
suspend/resume outcalls are done while holding the semaphore
that device_pm_remove() needs, ugh.

And FWIW Greg's now merged the CONFIG_USB_SUSPEND code
into his tree, as experimental ... so now all the relevant integration
issues can start to get sorted out. Eventually, that PM core deadlock
can get fixed. And remote wakeup needs work ... x86 machines need
a way to tell ACPI to pay attention to PME# wakeups (which Len says
is in some recent ACPI patch), and maybe the PPC/Mac platforms
will need something similar.


> > The problem I'm clear on is with PCI suspend, which some
> > earlier driver model PM changes goofed up. It's trying to
> > pass a system state to driver suspend() methods that are
> > expecting a value appropriate for pci_set_power_state().
> > You're proposing to fix that by changing the call semantics,
> > while I'd rather just see the call site fixed.
>
> No, I don't agree. It's a driver policy to decide what PCI state
> to use based on the system suspend level.

You've not persuaded me on that point at all ...

Consider: device PM calls aren't only made as part of changing
a "system suspend level". Minimally, there's the ability to suspend
a single device through sysfs. And in general, we need to be able
to do that directly ... one device suspending **must** be able to
trigger another one suspending, without assuming that every
device on the system is suspending to the same state.

As for example, suspending a USB flash storage device must
be able to suspend its subsidiary storage and block devices,
without necessarily suspending the network link on an adjacent
hub port. Or a USB port on a camera (or cell phone) that must
act as either host or peripheral (USB OTG) ... at most one of those
controllers should be active at a time.

Consider a system PM policy including "suspend all idle devices".
With N devices supporting only "on" and "suspend" states, that's
something like 2^N system suspend levels. Or more; most
devices support more than two suspend states (add at least
"off", plus often light weight suspend states). And N is system-specific,
so there's no way all those system states can be given small
integer values ... much less fit into a "u32 state".

- Dave
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/