Re: [PATCH] cpufreq: schedutil: Don't skip freq update when limits change

From: Rafael J. Wysocki
Date: Tue Jul 23 2019 - 06:27:47 EST


On Tue, Jul 23, 2019 at 11:15 AM Viresh Kumar <viresh.kumar@xxxxxxxxxx> wrote:
>
> On 23-07-19, 00:10, Doug Smythies wrote:
> > On 2019.07.21 23:52 Viresh Kumar wrote:
> >
> > > To avoid reducing the frequency of a CPU prematurely, we skip reducing
> > > the frequency if the CPU had been busy recently.
> > >
> > > This should not be done when the limits of the policy are changed, for
> > > example due to thermal throttling. We should always get the frequency
> > > within limits as soon as possible.
> > >
> > > Fixes: ecd288429126 ("cpufreq: schedutil: Don't set next_freq to UINT_MAX")
> > > Cc: v4.18+ <stable@xxxxxxxxxxxxxxx> # v4.18+
> > > Reported-by: Doug Smythies <doug.smythies@xxxxxxxxx>
> > > Signed-off-by: Viresh Kumar <viresh.kumar@xxxxxxxxxx>
> > > ---
> > > @Doug: Please try this patch, it must fix the issue you reported.
> >
> > It fixes the driver = acpi-cpufreq ; governor = schedutil test case
> > It does not fix the driver = intel_cpufreq ; governor = schedutil test case
> >
> > I have checked my results twice, but will check again in the day or two.
>
> The patch you tried to revert wasn't doing any driver specific stuff
> but only schedutil. If that revert fixes your issue with both the
> drivers, then this patch should do it as well.
>
> I am clueless now on what can go wrong with intel_cpufreq driver with
> schedutil now.
>
> Though there is one difference between intel_cpufreq and acpi_cpufreq,
> intel_cpufreq has fast_switch_possible=true and so it uses slightly
> different path in schedutil. I tried to look from that perspective as
> well but couldn't find anything wrong.

acpi-cpufreq should use fast switching on the Doug's system too.

> If you still find intel_cpufreq to be broken, even with this patch,
> please set fast_switch_possible=false instead of true in
> __intel_pstate_cpu_init() and try tests again. That shall make it very
> much similar to acpi-cpufreq driver.

I wonder if this helps. Even so, we want fast switching to be used by
intel_cpufreq.

Anyway, it looks like the change reverted by the Doug's patch
introduced a race condition that had not been present before. Namely,
need_freq_update is cleared in get_next_freq() when it is set _or_
when the new freq is different from the cached one, so in the latter
case if it happens to be set by sugov_limits() after evaluating
sugov_should_update_freq() (which returned 'true' for timing reasons),
that update will be lost now. [Previously the update would not be
lost, because the clearing of need_freq_update depended only on its
current value.] Where it matters is that in the "need_freq_update set"
case, the "premature frequency reduction avoidance" should not be
applied (as you noticed and hence the $subject patch).

However, even with the $subject patch, need_freq_update may still be
set by sugov_limits() after the check added by it and then cleared by
get_next_freq(), so it doesn't really eliminate the problem.

IMO eliminating would require invalidating next_freq this way or
another when need_freq_update is set in sugov_should_update_freq(),
which was done before commit ecd2884291261e3fddbc7651ee11a20d596bb514.