Re: Solving suspend-level confusion

From: Benjamin Herrenschmidt
Date: Sat Jul 31 2004 - 16:13:40 EST


On Sun, 2004-08-01 at 00:23, David Brownell wrote:

> So suspend-to-RAM more or less matches PCI D3hot, and
> suspend-to-DISK matches PCI D3cold. If those power states
> were passed to the device suspend(), the disk driver could act
> appropriately. In my observation, D3cold was never passed
> down, it was always D3hot.

Not really, they really have nothing to do with the PCI state
at all. Besides, the difference between PCI D3 hot and D3 cold
is dependent on what happens on the mobo after the chip gets
set to D3...

> These look to me like "wrong device-level suspend state" cases.

No. The driver interface provide a semantic that indicates in what
state the system is going to, it's driver policy to translate that into
appropriate actions, eventually involving some bus power state (or not,
for some archs, we'll haev no choice but have some driver level arch
hacks in there anyway, like some macs wanting video chips to be in D2
state etc...)

> > USB is another example. Typically, suspend-to-RAM wants to do a bus
> > suspend, eventually enabling remote wakeup on some devices, and expects
> > to recover the bus on wakeup, while suspend-to-disk is roughtly
> > equivalent to a full shutdown & reconnect on wakeup.
>
> Same thing: an HCD could do the right thing if it was told to go
> into D1, D2, or D3hot (supports USB suspend) vs D3cold (doesn't).

And your whole PM code would suddently become PCI specific ...

> Though the PM core doesn't cooperate at all there. Neither the
> suspend nor the resume codepaths cope well with disconnect
> (and hence device removal), the PM core self-deadlocks since
> suspend/resume outcalls are done while holding the semaphore
> that device_pm_remove() needs, ugh.

That's an old problem, I don't know how to fix this one best in USB,
I've always had problem with the whole PM semaphore for that reasons
among others but I'll let Greg & Patrick find a solution there.

> And FWIW Greg's now merged the CONFIG_USB_SUSPEND code
> into his tree, as experimental ... so now all the relevant integration
> issues can start to get sorted out. Eventually, that PM core deadlock
> can get fixed. And remote wakeup needs work ... x86 machines need
> a way to tell ACPI to pay attention to PME# wakeups (which Len says
> is in some recent ACPI patch), and maybe the PPC/Mac platforms
> will need something similar.
>
>
> > > The problem I'm clear on is with PCI suspend, which some
> > > earlier driver model PM changes goofed up. It's trying to
> > > pass a system state to driver suspend() methods that are
> > > expecting a value appropriate for pci_set_power_state().
> > > You're proposing to fix that by changing the call semantics,
> > > while I'd rather just see the call site fixed.
> >
> > No, I don't agree. It's a driver policy to decide what PCI state
> > to use based on the system suspend level.
>
> You've not persuaded me on that point at all ...
>
> Consider: device PM calls aren't only made as part of changing
> a "system suspend level". Minimally, there's the ability to suspend
> a single device through sysfs. And in general, we need to be able
> to do that directly ... one device suspending **must** be able to
> trigger another one suspending, without assuming that every
> device on the system is suspending to the same state.
>
> As for example, suspending a USB flash storage device must
> be able to suspend its subsidiary storage and block devices,
> without necessarily suspending the network link on an adjacent
> hub port. Or a USB port on a camera (or cell phone) that must
> act as either host or peripheral (USB OTG) ... at most one of those
> controllers should be active at a time.

And who resumes it ? Or you just lockup as soon as your VM try to
swap in from the flash ? If it self-wakes up then it's driver internal
policy. Again, the PM interface exposed to the kernel and userland has,
imho, nothing to do with the actualy low level bus states.

> Consider a system PM policy including "suspend all idle devices".
> With N devices supporting only "on" and "suspend" states, that's
> something like 2^N system suspend levels. Or more; most
> devices support more than two suspend states (add at least
> "off", plus often light weight suspend states). And N is system-specific,
> so there's no way all those system states can be given small
> integer values ... much less fit into a "u32 state".

And ? that's not the point. "standby" is defined as a kernel PM state
that could be used to implement that. It's per-driver policy to decide
how to react to "standby" of the HW can only do on/off (which usually
turns into go to "off" if the transition back to "on" is fast, or do
nothing if not).

There's also one thing we haven't dealt with is since queues are blocked
etc..., there need to be some explicit action waking things up. We don't
yet have provisions for devices themselves to trigger wakeup, either of
the whole tree, or themselves individually, based on activity. things like
USB remote wakeup usually hook through the firmware/motherboard to trigger
the system wakeup, but all of this is useless for such a standby state
where we actually want a pending request to the disk, for example, to
wakeup the whole subsystem.

Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/