Re: [PATCH v3] cpufreq: fix race on cpufreq online

From: Rafael J. Wysocki
Date: Tue May 24 2022 - 07:48:23 EST


On Tue, May 24, 2022 at 1:29 PM Viresh Kumar <viresh.kumar@xxxxxxxxxx> wrote:
>
> On 24-05-22, 13:22, Rafael J. Wysocki wrote:
> > On Tue, May 24, 2022 at 1:15 PM Viresh Kumar <viresh.kumar@xxxxxxxxxx> wrote:
> > >
> > > On 13-05-22, 09:57, Viresh Kumar wrote:
> > > > On 12-05-22, 12:49, Rafael J. Wysocki wrote:
> > > > > > > Moreover, I'm not sure why the locking dance in store() is necessary.
> > > > > >
> > > > > > commit fdd320da84c6 ("cpufreq: Lock CPU online/offline in cpufreq_register_driver()")
> > > > >
> > > > > I get that, but I'm wondering if locking CPU hotplug from store() is
> > > > > needed at all. I mean, if we are in store(), we are holding an active
> > > > > reference to the policy kobject, so the policy cannot go away until we
> > > > > are done anyway. Thus it should be sufficient to use the policy rwsem
> > > > > for synchronization.
> > > >
> > > > I think after the current patchset is applied and we have the inactive
> > > > policy check in store(), we won't required the dance after all.
> > >
> > > I was writing a patch for this and then thought maybe look at mails
> > > around this time, when you sent the patch, and found the reason why we
> > > need the locking dance :)
> > >
> > > https://lore.kernel.org/lkml/20150729091136.GN7557@xxxxxxxxxxxxxxxxxxxxxx/
>
> Actually no, this is for the lock in cpufreq_driver_register().
>
> > Well, again, if we are in store(), we are holding a reference to the
> > policy kobject, so this is not initialization time.
>
> This is the commit which made the change.
>
> commit 4f750c930822 ("cpufreq: Synchronize the cpufreq store_*() routines with CPU hotplug")

So this was done before the entire CPU hotplug rework and it was
useful at that time.

The current code always runs cpufreq_set_policy() under policy->rwsem
and governors are stopped under policy->rwsem, so this particular race
cannot happen AFAICS.

Locking CPU hotplug prevents CPUs from going away while store() is
running, but in order to run store(), the caller must hold an active
reference to the policy kobject. That prevents the policy from being
freed and so policy->rwsem can be acquired. After policy->rwsem has
been acquired, policy->cpus can be checked to determine whether or not
there are any online CPUs for the given policy (there may be none),
because policy->cpus is only manipulated under policy->rwsem.

If a CPU that belongs to the given policy is going away,
cpufreq_offline() has to remove it from policy->cpus under
policy->rwsem, so either it has to wait for store() to release
policy->rwsem, or store() will acquire policy->rwsem after it and will
find that policy->cpus is empty.