Re: [RFC][PATCH 0/2] PM / Sleep: Extended control ofsuspend/hibernate interfaces

From: NeilBrown
Date: Sun Oct 16 2011 - 19:48:49 EST

Next message: Andrea Arcangeli: "Re: kernel 3.0: BUG: soft lockup: find_get_pages+0x51/0x110"
Previous message: Yinghai Lu: "Re: [PATCH 8/8] PCI, sys: only create rescan under /sys/.../pci/devices/...for pci bridges"
Next in thread: Alan Stern: "Re: [RFC][PATCH 0/2] PM / Sleep: Extended control of suspend/hibernateinterfaces"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Sun, 16 Oct 2011 00:10:40 +0200 "Rafael J. Wysocki" <rjw@xxxxxxx> wrote:

> Hi,
>
> On Friday, October 14, 2011, NeilBrown wrote:
> > On Thu, 13 Oct 2011 21:45:42 +0200 "Rafael J. Wysocki" <rjw@xxxxxxx> wrote:
> ...
> >
> > Hi Rafael,
> >
> > What do you mean by "too complicated to use in practice"? What is you
> > measure for complexity?
>
> I, personally, don't really know what the difficulty is, as I have already
> described this approach for a few times (for example, in Section 5 of the
> article at http://lwn.net/images/pdf/suspend_blockers.pdf). However, I've
> recently talked to a few people whom I regard as smart and who had tried
> to implement it and it didn't work for them, and they aren't really able
> to say why exactly. Thus I have concluded it has to be complicated, but
> obviously you're free to draw your own conclusions. :-)
>
> [BTW, attempts to defend the approach I have invented against myself are
> extremely likely to fail, pretty much by definition. ;-)]

:-) Maybe we can defend it together then.

>
> > Using suspend in a race-free way is certainly less complex than - for
> > example - configuring bluetooth.
> > And in what way is it "inadequate for other reasons"? What reasons?
>
> Consider the scenario described by John (the wakeup problem). A process
> has to do something at certain time and the system shouldn't be suspended
> while the process is doing that, although it very well may be suspended
> earlier or later. The process puts itself to sleep with the assumption
> that a wake alarm is set (presumably by another process) to wake the system
> up from suspend at the right time (if the suspend happens). However, the
> process itself doesn't know _exactly_ what time the wake alarm is set to.
>
> In the situation in which we only have the existing mechanism and a user space
> power manager daemon, this scenario appears to be inherently racy, such that it
> cannot be handled correctly.

I would suggest that we need a time-management daemon - whether it ends up
being cron or systemd or scripts in pm-utils or some new daemon is just an
implementation detail.

i.e. we need a way for a process to say "I have something to do a X o'clock".
Whatever provides this service needs to hook in to the suspend logic and when
a suspend is about to happen, it programs the RTC alarm to wake up at (or
just before) the earliest requested time.
When the time arrives, the service blocks suspend and replies to the original
process "OK, do your thing". The process takes over blocking of suspend,
acknowledges the wakeup, and does the backup or whatever.

So yes: we do need something new, but it is easy enough to do all in
user-space.

One important part of this is that an RTC alarm needs to be treated as a
wakeup_event and have a wakeup_source activated for it. I suspect you could
avoid the need for that by having the suspend daemon know about programming
the RTC alarm and to simply not suspend at a bad time.
John Stultz posted a patch to add a wakeup_source for the RTC. What do you
think of that. Is a wakeup_source sensible here, or should user-space just
be careful about not suspending when a RTC alarm is likely soon ??

.... Actually, the more I think about it, the more sense it makes to include
the wake-up-at-time service with the suspend-daemon. Then the RTC alarm
doesn't need a wakeup_source.
So my hypothetical suspend-daemon provides 2 services:
1/ Client can say "Don't suspend after X". If X is in the past it means
don't suspend at all. In the future it means "If you suspend before
this, be sure to wake up by X". This request must be explicitly
cancelled (though some mechanism is needed so that if the process dies
it is automatically cancelled).
2/ Client can say "check with me before entering suspend". Client needs to
respond to any callback promptly, but can register a "don't suspend after
now" request first.
(Client probably gets a callback both on suspend and resume)

>
> > The only sane way to handle suspend is for any (suitably privileged) process
> > to be able to request that suspend doesn't happen, and then for one process
> > to initiate suspend when no-one is blocking it.
>
> As long as you don't specify the exact way by which the request is made and
> how the suspend is blocked, the above statement is almost meaningless.

The meaning is in the style of request. Requests should be "don't suspend at
them moment", not "do suspend now". I didn't intend it to carry more meaning
than that.

>
> > This is very different from the way it is currently handled were the GUI
> > says "Hmm.. I'm not doing anything just now, I think I'll suspend".
> >
> > The later simply doesn't scale. It is broken. It has to be replaced.
> > And it is being replaced.
>
> Cool, good to hear that! :-)

I might have spoken too soon there :-(
I looked more deeply at how gnome power management works and it is deeply
structures around request to go to sleep, not requests to stay awake.

>
> > gnome-power-manage has a dbus interface on which you can request
> > "InhibitInactiveSleep". Call that will stop gnome-power-manager from
> > sleeping (I assume - haven't looked at the code).
> > It might not inhibit an explicit request for sleep - in that case it is
> > probably broken and needs to be fixed. But is can be fixed. Or replaced.
>
> Perhaps.
>
> Is KDE going to use the same mechanism, for one example? And what about other
> user space variants? MeeGo anyone? Tizen? Android??
>
> > So if someone is running gnome-power-manager and wants to perform a firmware
> > update, the correct thing to do is to use dbus to disable the inactive sleep.
> > If someone is using some other power manager they might need to use some
> > other mechanism. Presumably these things will be standardised at some stage.
>
> Unless you have a specific idea about how to make this standardization happen,
> I call it wishful thinking to put it lightly. Sorry about the harsh words, but
> that's how it goes IMNSHO.

Standardisation will happen when enough people see a problem. As yet it
seems that they don't.
Once there are enough Linux devices running open desktops and needing good
power management (i.e. suspend often) that people start seeing problems,
there will be more motivation to create solutions.

Currently, we just need to be sure that the kernel *can* provide the needed
functionality and, if we like, experiment with user-space code to make use of
that functionality in an effective way.

If enough people experiment, learn, and publish their results - then the more
successful implementations will eventually spread...

>
> > But I think it is very wrong to put some hack in the kernel like your
> > suspend_mode = disabled
>
> Why is it wrong and why do you think it is a "hack"?

I think it is a "hack" because it is addressing a specific complaint rather
than fixing a real problem.

Contrast that with your wakeup_events which are a carefully designed approach
addressing a real problem and taking into account the big picture.

i.e. it seems to be addressing a symptom rather addressing the cause.

(and it is wrong because "hacks" are almost always wrong - short-term gain,
long term cost).

>
> > just because the user-space community hasn't got its act together yet.
>
> Is there any guarantee that it will get its act together in any foreseeable
> time frame?
>
> > And if you really need a hammer to stop processes from suspending the system:
> >
> > cat /sys/power/state > /tmp/state
> > mount --bind /tmp/state /sys/power/state
> >
> > should to it.
>
> Except that (1) it appears to be racy (what if system suspend happens between
> the first and second line in your example - can you safely start to upgrade
> your firmware in that case?) and (2) it won't prevent the hibernate interface
> based on /dev/snapshot from being used.
>
> Do you honestly think I'd propose something like patch [1/2] if I didn't
> see any other _working_ approach?

I think there are other workable approaches (maybe not actually _working_,
but only because no-one has written the code).

I'm not saying we should definitely not add more functionality to the kernel,
but I am saying we should not do it at all hastily.

If someone has tried to use the current functionality, has really understood
it, has made an appropriate attempt to make use of it, and has found that
something cannot be make to work reliably, or efficiently, or securely or
whatever, then certainly consider ways to address the problems.

But I don't think we are there yet. We are only just getting to the
"understanding" stage (and I have found these conversations very helpful in
refining my understanding).

When I get my GTA04 (phone motherboard) I hope to write some code that
actually realises these idea properly (I have code on my GTA02, but it is
broken in various ways, and the kernel is too old to
have /sys/power/wakeup_count anyway).

>
> > You second patch has little to recommend it either.
> > In the first place it seems to be entrenching the notion that timeouts are a
> > good and valid way to think about suspend.
>
> That's because I think they are unavoidable. Even if we are able to eliminate
> all timeouts in the handling of wakeup events by the kernel and passing them
> to user space, which I don't think is a realistic expectation, the user will
> still have only so much time to wait for things to happen. For example, if
> a phone user doesn't see the screen turn on 0.5 sec after the button was
> pressed, the button is pretty much guaranteed to be pressed again. This
> observation applies to other wakeup events, more or less. They are very much
> like items with "suitability for consumption" timestamps: it they are not
> consumed quickly enough, we can simply forget about them.

I hadn't thought of it like that - I do see your point I think.
However things are usually consumed long before they expire - expiry times
are longer than expected shelf life.
I think it is important to think carefully about the correct expiry time for
each event type as they aren't all the same.
So I would probably go for a larger default which is always safe, but
possibly wasteful. But that is a small point.

>
> > I certainly agree that there are plenty of cases where timeouts are
> > important and necessary. But there are also plenty of cases where you will
> > know exactly when you can allow suspend again, and having a timeout there is
> > just confusing.
>
> Please note that with patch [2/2] the timeout can always be overriden.
>
> > But worse - the mechanism you provide can be trivially implemented using
> > unix-domain sockets talking to a suspend-daemon.
> >
> > Instead of opening /dev/sleepctl, you connect to /var/run/suspend-daemon/sock
> > Instead of ioctl(SLEEPCTL_STAY_AWAKE), you write a number to the socket.
> > Instead of ioctl(SLEEPCTL_RELAX), you write zero to the socket.
> >
> > All the extra handling you do in the kernel, can easily be done by
> > user-space suspend-daemon.
>
> I'm not exactly sure why it is "worse". Doing it through sockets may require
> the kernel to do more work and it won't be possible to implement the
> SLEEPCTL_WAIT_EVENT ioctl I've just described to John this way.

"worse" because it appears to me that you are adding functionality to the
kernel which is effectively already present. When people do that to meet a
specific need it is usually not as usable as the original. i.e. "You have
re-invented XXX - badly". In this case XXX is IPC.

Yes - more CPU cycles may be expended in the user-space solution than a
kernel space solution, but that is a trade-off we often make. I don't think
that suspend is a time-critical operation - is it?

And I think SLEEPCTL_WAIT_EVENT would work fine over sockets, particularly
instead of a signal being sense, a simple short message were sent back over
the socket.

>
> > I really wish I could work out why people find the current mechanism
> > "difficult to use". What exactly is it that is difficult?
> > I have describe previously how to build a race-free suspend system. Which
> > bit of that is complicated or hard to achieve? Or which bit of that cannot
> > work the way I claim? Or which need is not met by my proposals?
> >
> > Isn't it much preferable to do this in userspace where people can
> > experiment and refine and improve without having to upgrade the kernel?
>
> Well, I used to think that it's better to do things in user space. Hence,
> the hibernate user space interface that's used by many people. And my
> experience with that particular thing made me think that doing things in
> the kernel may actually work better, even if they _can_ be done in user space.
>
> Obviously, that doesn't apply to everything, but sometimes it simply is worth
> discussing (if not trying). If it doesn't work out, then fine, let's do it
> differently, but I'm really not taking the "this should be done in user space"
> argument at face value any more. Sorry about that.

:-) I have had similar mixed experiences. Sometimes it can be a lot easier
to get things working if it is all in the kernel.
But I think that doing things in user-space leads to a lot more flexibility.
Once you have the interfaces and designs worked out you can then start doing
more interesting things and experimenting with ideas more easily.

In this case, I think the *only* barrier to a simple solution in user-space
is the pre-existing software that uses the 'old' kernel interface. It seems
that interfacing with that is as easy as adding a script or two to pm-utils.

With that problem solved, experimenting is much easier in user-space than in
the kernel.

Thanks,
NeilBrown

>
> Thanks,
> Rafael
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

Attachment: signature.asc
Description: PGP signature

Next message: Andrea Arcangeli: "Re: kernel 3.0: BUG: soft lockup: find_get_pages+0x51/0x110"
Previous message: Yinghai Lu: "Re: [PATCH 8/8] PCI, sys: only create rescan under /sys/.../pci/devices/...for pci bridges"
Next in thread: Alan Stern: "Re: [RFC][PATCH 0/2] PM / Sleep: Extended control of suspend/hibernateinterfaces"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]