Re: [RFC V1] cpuidle: add idle routine registration and cleanup pm_idlepointer

From: Trinabh Gupta
Date: Wed Oct 20 2010 - 11:57:46 EST




On 10/20/2010 08:48 PM, Arjan van de Ven wrote:
On 10/20/2010 8:12 AM, Trinabh Gupta wrote:


On 10/20/2010 12:31 AM, Trinabh Gupta wrote:


On 10/20/2010 12:19 AM, Venkatesh Pallipadi wrote:
On Tue, Oct 19, 2010 at 11:38 AM, Arjan van de Ven
<arjan@xxxxxxxxxxxxxxx> wrote:
On 10/19/2010 11:36 AM, Trinabh Gupta wrote:

The core of the kernel's idle routine on x86 presently depends on an
exported pm_idle function pointer that is unmanaged and causing
hazard to various subsystems when they save and restore it.
The first problem is that this exported pointer can be
modified/flipped
by any subsystem. There is no tracking or notification mechanism.
Secondly and more importantly, various subsystems save the value of
this pointer, flip it and later restore to the saved value. There is
no guarantee that the saved value is still valid. The problem has
been discussed in [2,3] and Peter Zijlstra suggested removing pm_idle
and implementing a list based registration [1].

This patch is an initial RFC implementation for x86 architecture
only. This framework can be generalised for other archs and also
include the current cpuidle framework for managing multiple idle
routines.

Tests done with the patch:
------------------------
1. Build (on 2.6.36-rc7) and booted on x86 with C1E as deepest idle
state and current_idle was selected to be mwait_idle.

2. Build (on 2.5.36-rc8) and booted on x86 (Nehalem) with ACPI C3 as
deepest sleep state. The current_idle was selected to be
cpuidle_idle_call which is the cpuidle subsystem that will further
select idle routines from {C1,C2,C3}.

Future implementation will try to eliminate this hirearchy and have
a single registration and menu/idle cpuidle governor for selection
of idle routine.


looks like you're duplicating the cpuidle subsystem

how about biting the bullet and just always and only use the cpuidle
subsystem for all idle on x86 ?


I agree with Arjan.
If we have a default_cpuidle driver which parses idle= params, handles
various mwait quirks in x86 process*.c and registers with cpuidle, we
can then always call cpuidle idle routine on x86.

This wouldn't duplicate code. It would move parts/functionality of
cpuidle into the kernel, keeping governors alone as modules.

If we directly call cpuidle_idle_call() then this may be too much
overhead for architectures that have single idle routine i.e cases where
cpuidle is not used and will be seen as a bloat. (c.f goals 4a,b).

Hi Venki, Arjan

Building cpuidle into the kernel would add ~7KB for everyone, even
x86 architectures having only single idle state/routine.

but now you're duplicating this functionality adding code for everyone.

I do not intend to duplicate code. I am trying to do what Andi
suggested i.e make the registration part lightweight and *move*
it into the kernel so that we can do away with problems such as
exported pm_idle pointer. This implementation is an *intermediate*
step.


99.999% of the people today run cpuidle... (especially embedded x86
where they really care about power)
all x86 going forward also has > 1 idle option anyway.

and you're adding and extra layer in the middle that just duplicates the
layer that's in use in practice above it.

As I said, this would not duplicate rather split things - minimal
registration part within the kernel (so that problems like
pm_idle are resolved) and governors as module.

-Trinabh


seriously, this sounds like the wrong tradeoff to make.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/