Re: [PATCH] cpufreq: Fix cpufreq regression after suspend/resume

From: Srivatsa S. Bhat
Date: Thu Jul 11 2013 - 02:27:00 EST


On 07/11/2013 11:10 AM, Lan Tianyu wrote:
> 2013/7/11 Srivatsa S. Bhat <srivatsa.bhat@xxxxxxxxxxxxxxxxxx>:
>> Hi Toralf,
>>
>> On 07/11/2013 02:20 AM, Toralf Förster wrote:
>>> I tested the patch several times on top of a66b2e5 - the origin issue is
>>> fixed but - erratically another issue now appears : all 4 cores are runs
>>> after wakeup at 2.6 GHz.
>>> The temporary hot fix is to switch between governor performance and
>>> ondemand for all 4 cores.
>>>
>>>
>>
>> Thanks for all your testing efforts. The commit a66b2e5 took a shortcut to
>> achieve its goals but failed subtly in many aspects. Below is a patch (only
>> compile-tested) which tries to achieve the goal (preserve sysfs files) in
>> the proper way, by separating out the cpufreq-core bits from the sysfs bits.
>> You might want to give it a try on current mainline and see if it improves
>> anything.
>>
>> Regards,
>> Srivatsa S. Bhat
>>
>>
>> (Applies on current mainline)
>> ---
>>
>> drivers/cpufreq/cpufreq.c | 144 ++++++++++++++++++++++++++-------------------
>> 1 file changed, 82 insertions(+), 62 deletions(-)
>>
[...]
>> +/**
>> + * cpufreq_add_dev - add a CPU device
>> + *
>> + * Adds the cpufreq interface for a CPU device.
>> + *
>> + * The Oracle says: try running cpufreq registration/unregistration concurrently
>> + * with with cpu hotplugging and all hell will break loose. Tried to clean this
>> + * mess up, but more thorough testing is needed. - Mathieu
>> + */
>> +static int cpufreq_add_dev(struct device *dev, struct subsys_interface *sif)
>> +{
>> + return __cpufreq_add_dev(dev, sif, true);
>> +}
>> +
>> static void update_policy_cpu(struct cpufreq_policy *policy, unsigned int cpu)
>> {
>> int j;
>> @@ -1106,7 +1116,7 @@ static void update_policy_cpu(struct cpufreq_policy *policy, unsigned int cpu)
>> * This routine frees the rwsem before returning.
>> */
>> static int __cpufreq_remove_dev(struct device *dev,
>> - struct subsys_interface *sif)
>> + struct subsys_interface *sif, bool remove_sysfs)
>> {
>> unsigned int cpu = dev->id, ret, cpus;
>> unsigned long flags;
>> @@ -1145,9 +1155,9 @@ static int __cpufreq_remove_dev(struct device *dev,
>> cpumask_clear_cpu(cpu, data->cpus);
>> unlock_policy_rwsem_write(cpu);
>>
>> - if (cpu != data->cpu) {
>> + if (cpu != data->cpu && remove_sysfs) {
>> sysfs_remove_link(&dev->kobj, "cpufreq");
>> - } else if (cpus > 1) {
>> + } else if (cpus > 1 && remove_sysfs) {
>> /* first sibling now owns the new sysfs dir */
>> cpu_dev = get_cpu_device(cpumask_first(data->cpus));
>> sysfs_remove_link(&cpu_dev->kobj, "cpufreq");
>> @@ -1184,26 +1194,25 @@ static int __cpufreq_remove_dev(struct device *dev,
>>
>> /* If cpu is last user of policy, free policy */
>> if (cpus == 1) {
>> - lock_policy_rwsem_read(cpu);
>> - kobj = &data->kobj;
>> - cmp = &data->kobj_unregister;
>> - unlock_policy_rwsem_read(cpu);
>> - kobject_put(kobj);
>> -
>> - /* we need to make sure that the underlying kobj is actually
>> - * not referenced anymore by anybody before we proceed with
>> - * unloading.
>> - */
>> - pr_debug("waiting for dropping of refcount\n");
>> - wait_for_completion(cmp);
>> - pr_debug("wait complete\n");
>> -
>> if (cpufreq_driver->exit)
>> cpufreq_driver->exit(data);
>>
>> free_cpumask_var(data->related_cpus);
>> free_cpumask_var(data->cpus);
>> kfree(data);
>> +
>> + if (remove_sysfs) {
>> + lock_policy_rwsem_read(cpu);
>> + kobj = &data->kobj;
>> + cmp = &data->kobj_unregister;
>
> This looks no right. "data" has already been freed but data-> kobj still is
> referenced.
>

Oops! You are right. Hmm, this looks quite difficult to get right :(
There are multiple challenges here:

1. The sysfs files must not be removed during cpu_down, and not initialized

during cpu_up. That would help us preserve the file permissions.
2. But we should ensure that we really do the cpufreq-core parts of the cpu
initialization during cpu_up. If we fail to free some of the data-structures
during cpu_down, the cpu_up callback will think that a full-init is not
required and not do its job. That will make cpufreq behave erratically after
suspend/resume and take us back to square one.

3. A full re-init in the cpu_up callback also involves memory allocations.
So if we don't release the memory in the cpu_down callback, we'll end up
in a memory leak.

I tried to address all these in this patch, but you found yet another serious
loop-hole. I guess I'm out of ideas now... if anybody has any thoughts on how
to get this right, then I'm all ears. Else, we'll just revert the original
commit like Rafael suggested and leave it upto userspace to save and restore
the permissions across suspend/resume if it wants ;-(

>> + unlock_policy_rwsem_read(cpu);
>> + kobject_put(kobj);
>> +
>> + pr_debug("waiting for dropping of refcount\n");
>> + wait_for_completion(cmp);
>> + pr_debug("wait complete\n");
>> + }
>> +
>> } else if (cpufreq_driver->target) {
>> __cpufreq_governor(data, CPUFREQ_GOV_START);
>> __cpufreq_governor(data, CPUFREQ_GOV_LIMITS);
>> @@ -1221,7 +1230,7 @@ static int cpufreq_remove_dev(struct device *dev, struct subsys_interface *sif)
>> if (cpu_is_offline(cpu))
>> return 0;
>>
>> - retval = __cpufreq_remove_dev(dev, sif);
>> + retval = __cpufreq_remove_dev(dev, sif, true);
>> return retval;
>> }

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/