Re: [PATCH] cpuidle: Do not use CPUIDLE_DRIVER_STATE_START in cpuidle.c

From: Preeti U Murthy
Date: Wed May 27 2015 - 12:20:14 EST


On 05/27/2015 07:27 PM, Rafael J. Wysocki wrote:
> On Wed, May 27, 2015 at 2:25 PM, Daniel Lezcano
> <daniel.lezcano@xxxxxxxxxx> wrote:
>> On 05/27/2015 01:31 PM, Preeti U Murthy wrote:
>>>
>>> On 05/27/2015 07:06 AM, Rafael J. Wysocki wrote:
>>>>
>>>> From: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
>>>>
>>>> The CPUIDLE_DRIVER_STATE_START symbol is defined as 1 only if
>>>> CONFIG_ARCH_HAS_CPU_RELAX is set, otherwise it is defined as 0.
>>>> However, if CONFIG_ARCH_HAS_CPU_RELAX is set, the first (index 0)
>>>> entry in the cpuidle driver's table of states is overwritten with
>>>> the default "poll" entry by the core. The "state" defined by the
>>>> "poll" entry doesn't provide ->enter_dead and ->enter_freeze
>>>> callbacks and its exit_latency is 0.
>>>>
>>>> For this reason, it is not necessary to use CPUIDLE_DRIVER_STATE_START
>>>> in cpuidle_play_dead() (->enter_dead is NULL, so the "poll state"
>>>> will be skipped by the loop) and in find_deepest_state() (since
>>>> exit_latency is 0, the "poll state" will become the default if the
>>>> "s->exit_latency <= latency_req" check is replaced with
>>>> "s->exit_latency < latency_req" which may only matter for drivers
>>>> providing different states with the same exit_latency).
>>>>
>>>> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
>>>> ---
>>>> drivers/cpuidle/cpuidle.c | 8 ++++----
>>>> 1 file changed, 4 insertions(+), 4 deletions(-)
>>>
>>> <snip>
>>>
>>>>
>>>> @@ -79,13 +79,13 @@ static int find_deepest_state(struct cpu
>>>> bool freeze)
>>>> {
>>>> unsigned int latency_req = 0;
>>>> - int i, ret = freeze ? -1 : CPUIDLE_DRIVER_STATE_START - 1;
>>>> + int i, ret = -ENXIO;
>>>>
>>>> - for (i = CPUIDLE_DRIVER_STATE_START; i < drv->state_count; i++) {
>>>> + for (i = 0; i < drv->state_count; i++) {
>>>> struct cpuidle_state *s = &drv->states[i];
>>>> struct cpuidle_state_usage *su = &dev->states_usage[i];
>>>>
>>>> - if (s->disabled || su->disable || s->exit_latency <=
>>>> latency_req
>>>> + if (s->disabled || su->disable || s->exit_latency <
>>>> latency_req
>>>
>>>
>>> Prior to this patch,
>>>
>>> For drivers on which CPUIDLE_DRIVER_STATE_START takes a value 0 and
>>> whose first idle state has an exit_latency of 0, find_deepest_state()
>>> would return -1 if it failed to find a deeper idle state.
>>> But as an effect of this patch, find_deepest_state() returns 0 in the
>>> above circumstance.
>>
>>
>> Except I am missing something, with an exit_latency = 0, the state will be
>> never selected, because of the "s->exit_latency < latency_req" condition
>> (strictly greater than).
>
> No, this is the condition to skip the state, so previously it wouldn't
> be selected, but after the patch it will.
>
> So yes, the patch changes behavior for systems with
> CONFIG_ARCH_HAS_CPU_RELAX unset.
>
>>> My concern is if these drivers do not intend to enter a polling state
>>> during suspend, this will cause an issue, won't it?
>
> The change in behavior happens for architectures where
> CONFIG_ARCH_HAS_CPU_RELAX is not set. In those cases the 0-index
> state is supposed to be provided by the driver. Is there a reason to
> expect that this may not be a genuine idle state?

On PowerPC, we have the 0-index idle state, whose exit_latency is 0 and
all that the CPU does in this state, is poll on need_resched(), except
at a lower priority from the hardware's standpoint. Nevertheless, the
CPU is busy polling. So, I would not consider it a genuine idle state.

On a side note, we do not yet support suspend on Power servers, but we
may in the future. Hence the concern.

>> Definitively poll can cause thermal issues, especially when suspending. It
>> is a dangerous state (let's imagine you close your laptop => suspend/poll
>> and then put it in your bag for a travel).
>
> With ARCH_HAS_CPU_RELAX set the "poll state" is supposed to be thermally safe.
>
>> I don't think with the code above we can reach this situation but I agree
>> this is something we have to take care carefully.
>>
>> Actually, I am in favour of removing poll at all from the cpuidle driver and
>> poll only when a cpuidle state selection fails under certain condition.
>>
>> So I fully agree with your statement below.
>>
>>> I would expect the cpus to be in a hardware
>>> defined idle state.
>
> Well, except for the degenerate case in which all of them are disabled
> and for the "broadcast timer stopping aviodance" use case for
> find_deepest_state().

During suspend, the CPUs can very well enter states where the timer
stops since we stop timer interrupts anyway. So unless the idle states
are explicitly disabled by the user/hardware for some reason, deeper
idle states will still be available during suspend as far as I can see.

>
> So there are two questions in my view:
> (1) Should find_deepest_state() ever return states with exit_latency equal to 0?

I would say no, since such an idle state would mostly be polling on a
wakeup event. Atleast, there is one such case in PowerPC.

> (2) If the answer to (1) is "yes", should the "poll state" be ever
> returned by find_deepest_state()?
>
> In any case, find_deepest_state() will only return a state with
> exit_latency equal to 0 if there's no other choice and if it returns
> nothing, we'll fall back to the architecture idle method. So the
> question really is whether or not falling back to arch idle is any
> better than using any state we have in the table.

My suggestion is to *not* fall back to arch idle code, since that is a
black box from the core cpuidle's perspective.

>
> The patch is based on the assumption that any state from the table
> will be better than arch idle, including the "polling state". If that
> does not hold, we'll need to rethink a couple of other things in my
> view.

We could fail suspend if a non-polling idle state is not available perhaps ?

Regards
Preeti U Murthy
>
> Thanks,
> Rafael
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pm" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/