Re: [RFC PATCH] cpufreq: qcom-cpufreq-hw: allow work to be done on other CPU for PREEMPT_RT

From: Sebastian Andrzej Siewior
Date: Tue Mar 21 2023 - 06:57:44 EST


On 2023-03-21 11:24:46 [+0100], Krzysztof Kozlowski wrote:
> >> --- a/drivers/cpufreq/qcom-cpufreq-hw.c
> >> +++ b/drivers/cpufreq/qcom-cpufreq-hw.c
> >> @@ -390,7 +390,16 @@ static irqreturn_t qcom_lmh_dcvs_handle_irq(int irq, void *data)
> >>
> >> /* Disable interrupt and enable polling */
> >> disable_irq_nosync(c_data->throttle_irq);
> >> - schedule_delayed_work(&c_data->throttle_work, 0);
> >> +
> >> + /*
> >> + * Workqueue prefers local CPUs and since interrupts have set affinity,
> >> + * the work might execute on a CPU dedicated to realtime tasks.
> >> + */
> >> + if (IS_ENABLED(CONFIG_PREEMPT_RT))
> >> + queue_delayed_work_on(WORK_CPU_UNBOUND, system_unbound_wq,
> >> + &c_data->throttle_work, 0);
> >> + else
> >> + schedule_delayed_work(&c_data->throttle_work, 0);
> >
> > You isolated CPUs and use this on PREEMPT_RT. And this special use-case
> > is your reasoning to make this change and let it depend on PREEMPT_RT?
> >
> > If you do PREEMPT_RT and you care about latency I would argue that you
> > either disable cpufreq and set it to PERFORMANCE so that the highest
> > available frequency is set once and not changed afterwards.
>
> The cpufreq is set to performance. It will be changed anyway because
> underlying FW notifies through such interrupts about thermal mitigation
> happening.

I still fail to understand why this is PREEMPT_RT specific and not a
problem in general when it comes not NO_HZ_FULL and/ or CPU isolation.
However the thermal notifications have nothing to do with cpufreq.

> The only other solution is to disable the cpufreq device, e.g. by not
> compiling it.

People often disable cpufreq because _usually_ the system boots at
maximum performance. There are however exceptions and even x86 system
are configured sometimes to a lower clock speed by the firmware/ BIOS.
In this case it is nice to have a cpufreq so it is possible to set the
system during boot to a higher clock speed. And then remain idle unless
the cpufreq governor changed.

> Best regards,
> Krzysztof

Sebastian