Re: [PATCH v5 01/48] kernel: Add support for power-off handler call chain

From: Rafael J. Wysocki
Date: Thu Nov 06 2014 - 18:56:15 EST


On Thursday, November 06, 2014 02:27:03 PM Guenter Roeck wrote:
> On Thu, Nov 06, 2014 at 11:30:59PM +0100, Rafael J. Wysocki wrote:
> > On Thursday, November 06, 2014 08:42:45 AM Guenter Roeck wrote:
> > > Various drivers implement architecture and/or device specific means to
> > > power off the system. For the most part, those drivers set the global
> > > variable pm_power_off to point to a function within the driver.
> > >
> > > This mechanism has a number of drawbacks. Typically only one scheme
> > > to remove power is supported (at least if pm_power_off is used).
> > > At least in theory there can be multiple means remove power, some of
> > > which may be less desirable. For example, some mechanisms may only
> > > power off the CPU or the CPU card, while another may power off the
> > > entire system. Others may really just execute a restart sequence
> > > or drop into the ROM monitor. Using pm_power_off can also be racy
> > > if the function pointer is set from a driver built as module, as the
> > > driver may be in the process of being unloaded when pm_power_off is
> > > called. If there are multiple power-off handlers in the system, removing
> > > a module with such a handler may inadvertently reset the pointer to
> > > pm_power_off to NULL, leaving the system with no means to remove power.
> > >
> > > Introduce a system power-off handler call chain to solve the described
> > > problems. This call chain is expected to be executed from the architecture
> > > specific machine_power_off() function. Drivers and architeceture code
> > > providing system power-off functionality are expected to register with
> > > this call chain. When registering a power-off handler, callers can
> > > provide a priority to control power-off handler execution sequence
> > > and thus ensure that the power-off handler with the optimal capabilities
> > > to remove power for a given system is called first.
> > >
> > > Cc: Alan Cox <gnomes@xxxxxxxxxxxxxxxxxxx>
> > > Cc: Alexander Graf <agraf@xxxxxxx>
> > > Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> > > Cc: Geert Uytterhoeven <geert@xxxxxxxxxxxxxx>
> > > Cc: Heiko Stuebner <heiko@xxxxxxxxx>
> > > Cc: Lee Jones <lee.jones@xxxxxxxxxx>
> > > Cc: Len Brown <len.brown@xxxxxxxxx>
> > > Cc: Pavel Machek <pavel@xxxxxx>
> > > Cc: Philippe RÃtornaz <philippe.retornaz@xxxxxxxxx>
> > > Cc: Rafael J. Wysocki <rjw@xxxxxxxxxxxxx>
> > > Cc: Romain Perier <romain.perier@xxxxxxxxx>
> > > Acked-by: Pavel Machek <pavel@xxxxxx>
> > > Acked-by: Heiko Stuebner <heiko@xxxxxxxxx>
> > > Signed-off-by: Guenter Roeck <linux@xxxxxxxxxxxx>
> > > ---
> > > v5:
> > > - Rebase to v3.18-rc3
> > > v4:
> > > - Do not use notifiers but internal functions and data structures to manage
> > > the list of power-off handlers. Drop unused parameters from callbacks, and
> > > make the power-off function type void.
> > > Code to manage and walk the list of callbacks is derived from notifier.c.
> > > v3:
> > > - Rename new file to power_off_handler.c
> > > - Replace poweroff in all newly introduced variables and in text
> > > with power_off or power-off as appropriate
> > > - Replace POWEROFF_PRIORITY_xxx with POWER_OFF_PRIORITY_xxx
> > > - Execute power-off handlers without any locks held
> > > v2:
> > > - poweroff -> power_off
> > > - Add defines for default priorities
> > > - Use raw notifiers protected by spinlocks instead of atomic notifiers
> > > - Add register_poweroff_handler_simple
> > > - Add devm_register_power_off_handler
> > >
> > > include/linux/pm.h | 28 ++++
> > > kernel/power/Makefile | 1 +
> > > kernel/power/power_off_handler.c | 293 +++++++++++++++++++++++++++++++++++++++
> > > 3 files changed, 322 insertions(+)
> > > create mode 100644 kernel/power/power_off_handler.c
> > >
> > > diff --git a/include/linux/pm.h b/include/linux/pm.h
> > > index 383fd68..a4d6bf8 100644
> > > --- a/include/linux/pm.h
> > > +++ b/include/linux/pm.h
> > > @@ -35,6 +35,34 @@ extern void (*pm_power_off)(void);
> > > extern void (*pm_power_off_prepare)(void);
> > >
> > > struct device; /* we have a circular dep with device.h */
> > > +
> > > +/*
> > > + * Data structures and callbacks to manage power-off handlers
> > > + */
> > > +
> > > +struct power_off_handler_block {
> > > + void (*handler)(struct power_off_handler_block *);
> > > + struct power_off_handler_block __rcu *next;
> > > + int priority;
> > > +};
> > > +
> > > +int register_power_off_handler(struct power_off_handler_block *);
> > > +int devm_register_power_off_handler(struct device *,
> > > + struct power_off_handler_block *);
> > > +int register_power_off_handler_simple(void (*function)(void), int priority);
> > > +int unregister_power_off_handler(struct power_off_handler_block *);
> > > +void do_kernel_power_off(void);
> > > +bool have_kernel_power_off(void);
> > > +
> > > +/*
> > > + * Pre-defined power-off handler priorities
> > > + */
> > > +#define POWER_OFF_PRIORITY_FALLBACK 0
> > > +#define POWER_OFF_PRIORITY_LOW 64
> > > +#define POWER_OFF_PRIORITY_DEFAULT 128
> > > +#define POWER_OFF_PRIORITY_HIGH 192
> > > +#define POWER_OFF_PRIORITY_HIGHEST 255
> >
> > I'm not sure why we need these gaps in the priority space.
> >
> > I guess it might be possible to use
> >
> > enum power_off_priority {
> > POWER_OFF_PRIORITY_FALLBACK = 0,
> > POWER_OFF_PRIORITY_LOW,
> > POWER_OFF_PRIORITY_DEFAULT,
> > POWER_OFF_PRIORITY_HIGH,
> > POWER_OFF_PRIORITY_HIGHEST,
> > POWER_OFF_PRIORITY_LIMIT,
> > };
>
> I retained the large number space on purpose, specifically to permit in-between
> priorities. In other words, I want people to be able to say "priority for this
> handler is higher than low but lower than default". After all, the defines were
> intended as hints, not as a "Thou shall use those and only those priorities".

Problem with that is how they are supposed to know what priority to use then.

How do I know if my priority is between DEFAULT and HIGH and whether it is
closer to HIGH or closer to DEFAULT? What are the rules?

The only rule that seems to be there is "this handler should be tried before
that one, so it needs to have a higher priority". But now the question is
how people are going to know which handlers they are competing with and whether
or not they are more "important".

> Having said that, the important part is to get the series accepted, so I won't
> argue if that is what it takes to get an Ack. Let me know.

This isn't worth fighting over in my view, so I won't if everyone else is fine
with it.

Just feel free to ignore this concern if you don't think it is valid.

Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/