Re: [PATCH 2/2] cpufreq: serialize calls to __cpufreq_governor()

From: Rafael J. Wysocki
Date: Fri Sep 06 2013 - 08:20:37 EST


On Friday, September 06, 2013 11:33:55 AM Srivatsa S. Bhat wrote:
> On 09/05/2013 06:24 AM, Stephen Boyd wrote:
> > On 09/04/13 17:26, Rafael J. Wysocki wrote:
> >> On Wednesday, September 04, 2013 04:50:01 PM Stephen Boyd wrote:
> >>> On 09/04/13 16:55, Rafael J. Wysocki wrote:
> >>>> Well, I'm not sure when Viresh is going to be back.
> >>>>
> >>>> Srivatsa, can you please resend this patch with a proper changelog?
> >>>>
> >>> I haven't had a chance to try this out yet, but I was just thinking
> >>> about this patch. How is it going to work? If one task opens the file
> >>> and another task is taking down the CPU wouldn't we deadlock in the
> >>> CPU_DOWN notifier waiting for the kobject to be released? Task 1 will
> >>> grab the kobject reference and sleep on the hotplug mutex and task 2
> >>> will put the kobject and wait for the completion, but it won't happen.
> >>> At least I think that's what would happen.
> >> Do you mean the completion in sysfs_deactivate()? Yes, we can deadlock
> >> there.
> >
> > I mean the complete in cpufreq_sysfs_release(). I don't think that will
> > ever be called because the kobject is held by the task calling store()
> > which is waiting on the hotplug lock to be released.
> >
> >>
> >> Well, I guess the Srivatsa's patch may be salvaged by making it do a "trylock"
> >> version of get_online_cpus(), but then I wonder if there's no better way.
> >
> > I think the real solution is to remove the kobject first if the CPU
> > going down is the last user of that policy. Once the completion is done
> > we can stop the governor and clean up state. For the case where there
> > are still CPUs using the policy why can't we cancel that CPU's work and
> > do nothing else? Even in the case where we have to move the cpufreq
> > directory do we need to do a STOP/START/LIMITS sequence? I hope we can
> > get away with just moving the directory and canceling that CPUs work then.
> >
>
> Conceptually, I agree that your idea of not allowing any process to take
> a new reference to the kobject while we are taking the CPU offline, is a
> sound solution.
>
> However, I am reluctant to go down that path because, handling the CPU hotplug
> sequence in the suspend/resume path might get even more tricky, if we want
> to implement the changes that you propose. Just recently we managed to
> rework the cpufreq CPU hotplug handling to retain the sysfs file permissions
> around suspend/resume, and doing that was not at all simple. Adding more
> quirks and complexity to the kobject handling in that path will only make
> things even more challenging, IMHO. That's the reason I'm trying to think
> of ways to avoid touching that fragile code, and instead solve this problem
> in some other way, without compromising on the robustness of the solution.
>
> So here is my new proposal, as a replacement to this patch[2/2]:

Actually, I've already applied patch [2/2], because while it doesn't solve the
whole problem, it narrows it down somewhat. So, please work on top of that
patch going forward.

> We note that, at CPU_DOWN_PREPARE stage, the CPU is not yet marked offline,
> whereas by the time we handle CPU_POST_DEAD, the CPU is removed from the
> cpu_online_mask, and also the cpu_hotplug lock is dropped.
>
> So, let us split up __cpu_remove_dev() into 2 parts, say:
> __cpu_remove_prepare() - invoked during CPU_DOWN_PREPARE
> __cpu_remove_finish() - invoked during CPU_POST_DEAD
>
>
> In the former function, we stop the governors, so that policy->governor_enabled
> gets set to false, so that patch [1/2] will return -EBUSY to any subsequent
> ->store() requests. Also, we do everything except the kobject cleanup.
>
> In the latter function, we do the remaining work, particularly the part
> where we wait for the kobject refcount to drop to zero and the subsequent
> cleanup.
>
>
> And the ->store() functions will be modified to look like this:
>
> store()
> {
> get_online_cpus();
>
> if (!cpu_online(cpu))
> goto out;
>
> /* Body of the function*/
> out:
> put_online_cpus();
> }
>
> That way, if a task tries to write to a cpufreq file during CPU offline,
> it will get blocked on get_online_cpus(), and will continue after
> CPU_DEAD (since we release the lock here). Then it will notice that the cpu
> is offline, and hence will return silently, thus dropping the kobject refcnt.
> So, when the cpufreq core comes back at the CPU_POST_DEAD stage to cleanup
> the kobject, it won't encounter any problems.
>
> Any thoughts on this approach? I'll try to code it up and post the patch
> later today.

It sounds good to me (modulo the remark above).

Thanks,
Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/