Re: [PATCH 7/8] cpufreq: Preserve policy structure across suspend/resume

From: Srivatsa S. Bhat
Date: Tue Jul 16 2013 - 05:33:16 EST


On 07/16/2013 02:40 PM, Viresh Kumar wrote:
> On 16 July 2013 14:26, Srivatsa S. Bhat
> <srivatsa.bhat@xxxxxxxxxxxxxxxxxx> wrote:
>> On 07/16/2013 11:45 AM, Viresh Kumar wrote:
>
>>> To understand it I actually applied your patches to get better view of the code.
>>> (Haven't tested it though).. And found that your code is doing the right thing
>>> and we shouldn't get a mismatch.. This is the sequence of events I can draw:
>>>
>>> - __cpu_add_dev() for first cpu. sets the refcount to 'x', where x are
>>> the no. of
>>> cpus in its clock domain.
>>> - _cpu_add_dev() for other cpus: doesn't change anything in refcount
>>>
>>> - Suspend:
>>> - cpu_remove_dev() for all cpus, due to frozen flag we don't touch the value
>>> of count
>>> - Resume:
>>> - cpu_add_dev() for all cpus, due to frozen flag we don't touch the
>>> value of count.
>>>
>>
>> Actually this one is tricky (I took a look again). So we have this code in the
>> beginning of _cpufreq_add_dev():
>>
>>
>> 1008 #ifdef CONFIG_SMP
>> 1009 /* check whether a different CPU already registered this
>> 1010 * CPU because it is in the same boat. */
>> 1011 policy = cpufreq_cpu_get(cpu);
>> 1012 if (unlikely(policy)) {
>> 1013 cpufreq_cpu_put(policy);
>> 1014 return 0;
>> 1015 }
>>
>> The _get() is not controlled by the frozen flag, but it still doesn't take a
>> refcount because of a subtle reason: per_cpu(cpufreq_cpu_data, cpu) was set to
>> NULL in __cpufreq_remove_dev() and the memory was saved away in fallback storage.
>> So, when __cpufreq_cpu_get() executes, it sees:
>>
>> 204 /* get the CPU */
>> 205 data = per_cpu(cpufreq_cpu_data, cpu);
>> 206
>> 207 if (!data)
>> 208 goto err_out_put_module;
>>
>> Thus, since data is NULL, cpufreq_cpu_get() won't take a refcount and will return
>> silently.
>
> Even if this wouldn't have happened, refcount wouldn't have been
> touched due to this code:
>
>> 1012 if (unlikely(policy)) {
>> 1013 cpufreq_cpu_put(policy);
>> 1014 return 0;
>> 1015 }
>
> i.e. If we get a valid policy structure, we siimply put the policy again
> and so decrement the incremented refcount.

Ah, yes!

>
> So, even if you don't keep the fallback storage, things should work
> without any issue (probably worth trying as this will get rid of a per
> cpu variable :))
>

No, I already tried that and it didn't work ;-( The thing is, we need the
__cpufreq_add_dev() code to call the ->init() routines of drivers etc. But if
it finds the policy structure, it will skip all of that initialization and happily
proceed. Which is precisely the cause of all the erratic behaviour we are seeing
(ie., lack of proper initialization post-resume).

So this approach keeps the memory preserved in a fallback storage and lets the
init code run to full completion without any issues.

Perhaps we could do some _more_ code reorganization in the future to take this
issue into account etc., but IMHO that might be non-trivial. I'm trying to keep
this as simple and straight-forward as possible as a first step, to atleast get
it properly working. (Changing the order in which init is done is kinda scary
since its hard to comprehend what assumptions we might be breaking!).

We can perhaps revisit your idea later and optimize out the extra per-cpu data.

>> Further down in __cpufreq_add_dev(), we restore the original memory, using
>> the frozen flag:
>>
>> 1037 if (frozen)
>> 1038 /* Restore the saved policy when doing light-weight init */
>> 1039 policy = cpufreq_policy_restore(cpu);
>> 1040 else
>> 1041 policy = cpufreq_policy_alloc();
>>
>>
>> So that is how we manage to fool cpufreq_cpu_get() into not taking a fresh
>> refcount while resuming :)

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/