Re: lsusd - The Linux SUSpend Daemon

From: NeilBrown
Date: Sun Oct 23 2011 - 04:22:15 EST


On Fri, 21 Oct 2011 22:00:13 -0400 (EDT) Alan Stern
<stern@xxxxxxxxxxxxxxxxxxx> wrote:

> On Sat, 22 Oct 2011, NeilBrown wrote:
>
> > > > It uses files in /var/run/suspend for all communication.
> > >
> > > I'm not so keen on using files for communication. At best, they are
> > > rather awkward for two-way messaging. If you really want to use them,
> > > then at least put them on a non-backed filesystem, like something under
> > > /dev.
> >
> > Isn't /var/run a tmpfs filesystem? It should be.
> > Surely /run is, so in the new world order the files should probably go
> > there. But that is just a detail.
>
> On my Fedora-14 systems there is no /run, and /var/run is a regular
> directory in a regular filesystem.
>
> > I like files... I particularly like 'flock' to block suspend. The
> > rest.... whatever..
> > With files, you only need a context switch when there is real communication.
> > With sockets, every message sent must be read so there will be a context
> > switch.
> >
> > Maybe we could do something with futexes...
>
> Not easily -- as far as I can tell, futexes enjoy relatively little
> support. In any case, they provide the same service as a mutex, which
> means you'd have to build a shared lock on top of them.
>
> > > > lsusd does not try to be event-loop based because:
> > > > - /sys/power/wakeup_count is not pollable. This could probably be
> > > > 'fixed' but I want code to work with today's kernel. It will probably
> > >
> > > Why does this matter?
> >
> > In my mind an event based program should never block. Every action should be
> > non-blocking and only taken when 'poll' says it can.
> > Reading /sys/power/wakeup_count can be read non-blocking, but you cannot find
> > out when it is sensible to try to read it again. So it doesn't fit.
>
> There shouldn't be any trouble about making wakeup_count pollable. It
> also would need to respect nonblocking reads, which it currently does
> not do.

Hmm.. you are correct. I wonder why I thought it did support non-blocking
reads...
I guess it was the code for handling an interrupted system call.

I feel a bit uncomfortable with the idea of sysfs files that block but I
don't think I can convincingly argue against it.
A non-blocking flag could be passed in, but it would be a very messy change -
lots of function call signatures changing needlessly: we would need a flag
to the 'show' method ... or add a 'show_nonblock' method which would also be
ugly.


But I think there is a need to block - if there is an in-progress event then
it must be possible to wait for it to complete as it may not be visible to
userspace until then.
We could easily enable 'poll' for wakeup_count and then make it always
non-blocking, but I'm not really sure I want to require programs to use poll,
only to allow them. And without using poll there is no way to wait.

As wakeup_count really has to be single-access we could possibly fudge
something by remembering the last value read (like we remember the last value
written).

- if the current count is different from the last value read, then return
it even if there are in-progress events.
- if the current count is the same as the last value read, then block until
there are no in-progress events and return the new value.
- enable sysfs_poll on wakeup_count by calling sysfs_notify_dirent at the
end of wakeup_source_deactivated .... or calling something in
kernel/power/main.c which calls that. However we would need to make
sysfs_notify_dirent a lot lighter weight first. Maybe I should do that.

Then a process that uses 'poll' could avoid reading wakeup_count except when
it has changed, and then it won't block. And a process that doesn't use poll
can block by simply reading twice - either explicitly or by going around a
read then write and it fails
loop a second time.

I'm not sure I'm completely comfortable with that, but it is the best I could
come up with.


>
> At the worst, you could always have a separate thread to read
> wakeup_count.

Maybe. I'm tempted simply not to worry about the short delay after all, but
the moment I do that, someone will use a wakeup_source to disable suspend
for a longer period of time (just like I suggested) and suddenly it won't be
a short delay after all.

>
> > > > - I cannot get an event notification when a lock is removed from a
> > > > file. :-( And I think locks are an excellent light-weight
> > > > mechanism for blocking suspend.
> > >
> > > Except for this one drawback. Socket connections are superior in that
> > > regard.
> >
> > I'm very happy for someone else write an all-socket based daemon.
>
> Hmmm... Maybe I'll take you up on that.
>
>
> > > > lsused will send a 'S' message to the client and await an 'R' reply
> > > > (S == suspend, R == ready). When all replies are in, lsused will
> > > > allow the suspend to complete. When it does (or aborts), it will send
> > > > 'A' (awake) to those clients to which it sent 'S'.
> > >
> > > But not to the client which failed to send an 'R'?
> >
> > Every client must send an R before suspend can continue.
>
> I was referring to the case where you abort before receiving an 'R'.
> The current suspend attempt will fail, but then what happens during the
> next attempt?

There is no situation in which I abort before receiving an 'R'.
I wait until all 'R's are in until I even check if an abort was requested.


NeilBrown



Sample untested patch to allow non-blocking reads and poll on wakeup_count.

diff --git a/drivers/base/power/wakeup.c b/drivers/base/power/wakeup.c
index 84f7c7d..54543ba 100644
--- a/drivers/base/power/wakeup.c
+++ b/drivers/base/power/wakeup.c
@@ -44,6 +44,8 @@ static void split_counters(unsigned int *cnt, unsigned int *inpr)

/* A preserved old value of the events counter. */
static unsigned int saved_count;
+/* Record last value that was read from events counter. */
+static unsigned int last_read_count;

static DEFINE_SPINLOCK(events_lock);

@@ -410,6 +412,7 @@ static void wakeup_source_deactivate(struct wakeup_source *ws)
{
ktime_t duration;
ktime_t now;
+ unsigned int comb;

ws->relax_count++;
/*
@@ -440,7 +443,10 @@ static void wakeup_source_deactivate(struct wakeup_source *ws)
* Increment the counter of registered wakeup events and decrement the
* couter of wakeup events in progress simultaneously.
*/
- atomic_add(MAX_IN_PROGRESS, &combined_event_count);
+ comb = atomic_add_return(MAX_IN_PROGRESS, &combined_event_count);
+
+ if ((comb >> IN_PROGRESS_BITS) == last_read_count + 1)
+ wakeup_count_changed();
}

/**
@@ -624,15 +630,24 @@ bool pm_get_wakeup_count(unsigned int *count)

for (;;) {
split_counters(&cnt, &inpr);
- if (inpr == 0 || signal_pending(current))
+ if (inpr == 0 || cnt != last_read_count)
break;
+ if (signal_pending(current))
+ return false;
pm_wakeup_update_hit_counts();
schedule_timeout_interruptible(msecs_to_jiffies(TIMEOUT));
}

+ last_read_count = cnt;
+ /* If cnt has just changed, then last_read_count will be a bit
+ * old, so we won't block on the next read, only the one after.
+ * However this ensures wakeup_source_deactivate doesn't
+ * miss out on calling wakeup_count_changed() on a change.
+ */
+ smp_wmb();
split_counters(&cnt, &inpr);
*count = cnt;
- return !inpr;
+ return true;
}

/**
diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c
index 1ad8c93..41361ba 100644
--- a/fs/sysfs/file.c
+++ b/fs/sysfs/file.c
@@ -444,18 +444,19 @@ static unsigned int sysfs_poll(struct file *filp, poll_table *wait)

void sysfs_notify_dirent(struct sysfs_dirent *sd)
{
- struct sysfs_open_dirent *od;
- unsigned long flags;
+ if (sd->s_attr.open) {
+ struct sysfs_open_dirent *od;
+ unsigned long flags;
+ spin_lock_irqsave(&sysfs_open_dirent_lock, flags);

- spin_lock_irqsave(&sysfs_open_dirent_lock, flags);
+ od = sd->s_attr.open;
+ if (od) {
+ atomic_inc(&od->event);
+ wake_up_interruptible(&od->poll);
+ }

- od = sd->s_attr.open;
- if (od) {
- atomic_inc(&od->event);
- wake_up_interruptible(&od->poll);
+ spin_unlock_irqrestore(&sysfs_open_dirent_lock, flags);
}
-
- spin_unlock_irqrestore(&sysfs_open_dirent_lock, flags);
}
EXPORT_SYMBOL_GPL(sysfs_notify_dirent);

diff --git a/include/linux/suspend.h b/include/linux/suspend.h
index 6bbcef2..e42b7f9 100644
--- a/include/linux/suspend.h
+++ b/include/linux/suspend.h
@@ -296,6 +296,7 @@ extern bool events_check_enabled;
extern bool pm_wakeup_pending(void);
extern bool pm_get_wakeup_count(unsigned int *count);
extern bool pm_save_wakeup_count(unsigned int count);
+extern void wakeup_count_changed(void);
#else /* !CONFIG_PM_SLEEP */

static inline int register_pm_notifier(struct notifier_block *nb)
diff --git a/kernel/power/main.c b/kernel/power/main.c
index 6c601f8..6b3cd80 100644
--- a/kernel/power/main.c
+++ b/kernel/power/main.c
@@ -231,6 +231,8 @@ power_attr(state);
* are any wakeup events detected after 'wakeup_count' was written to.
*/

+struct sysfs_dirent *wakeup_count_dirent;
+
static ssize_t wakeup_count_show(struct kobject *kobj,
struct kobj_attribute *attr,
char *buf)
@@ -253,6 +255,12 @@ static ssize_t wakeup_count_store(struct kobject *kobj,
return -EINVAL;
}

+void wakeup_count_change(void)
+{
+ if (wakeup_count_dirent)
+ sysfs_notify_dirent(wakeup_count_dirent);
+}
+
power_attr(wakeup_count);
#endif /* CONFIG_PM_SLEEP */

@@ -342,7 +350,11 @@ static int __init pm_init(void)
power_kobj = kobject_create_and_add("power", NULL);
if (!power_kobj)
return -ENOMEM;
- return sysfs_create_group(power_kobj, &attr_group);
+ error = sysfs_create_group(power_kobj, &attr_group);
+#ifdef CONFIG_PM_SLEEP
+ wakeup_count_dirent = sysfs_get_dirent(power_kobj->sd, NULL, "wakeup_count");
+#endif
+ return error;
}

core_initcall(pm_init);



Attachment: signature.asc
Description: PGP signature