Re: [RFC PATCH] cpufreq: qcom-cpufreq-hw: allow work to be done on other CPU for PREEMPT_RT

From: Krzysztof Kozlowski
Date: Tue Mar 21 2023 - 07:27:53 EST


On 21/03/2023 11:57, Sebastian Andrzej Siewior wrote:
> On 2023-03-21 11:24:46 [+0100], Krzysztof Kozlowski wrote:
>>>> --- a/drivers/cpufreq/qcom-cpufreq-hw.c
>>>> +++ b/drivers/cpufreq/qcom-cpufreq-hw.c
>>>> @@ -390,7 +390,16 @@ static irqreturn_t qcom_lmh_dcvs_handle_irq(int irq, void *data)
>>>>
>>>> /* Disable interrupt and enable polling */
>>>> disable_irq_nosync(c_data->throttle_irq);
>>>> - schedule_delayed_work(&c_data->throttle_work, 0);
>>>> +
>>>> + /*
>>>> + * Workqueue prefers local CPUs and since interrupts have set affinity,
>>>> + * the work might execute on a CPU dedicated to realtime tasks.
>>>> + */
>>>> + if (IS_ENABLED(CONFIG_PREEMPT_RT))
>>>> + queue_delayed_work_on(WORK_CPU_UNBOUND, system_unbound_wq,
>>>> + &c_data->throttle_work, 0);
>>>> + else
>>>> + schedule_delayed_work(&c_data->throttle_work, 0);
>>>
>>> You isolated CPUs and use this on PREEMPT_RT. And this special use-case
>>> is your reasoning to make this change and let it depend on PREEMPT_RT?
>>>
>>> If you do PREEMPT_RT and you care about latency I would argue that you
>>> either disable cpufreq and set it to PERFORMANCE so that the highest
>>> available frequency is set once and not changed afterwards.
>>
>> The cpufreq is set to performance. It will be changed anyway because
>> underlying FW notifies through such interrupts about thermal mitigation
>> happening.
>
> I still fail to understand why this is PREEMPT_RT specific and not a
> problem in general when it comes not NO_HZ_FULL and/ or CPU isolation.

Hm, good point, I actually don't know what is the workqueue
recommendation for NO_HZ_FULL CPUs - is still locality of the workqueue
preferred?

And how such code would look like?
if (tick_nohz_tick_stopped())?

> However the thermal notifications have nothing to do with cpufreq.

They have. The FW notifies that thermal mitigation is happening and
maximum allowed frequency is now XYZ. The cpufreq receives this and sets
maximum allowed scaling frequency for governor.

>
>> The only other solution is to disable the cpufreq device, e.g. by not
>> compiling it.
>
> People often disable cpufreq because _usually_ the system boots at
> maximum performance. There are however exceptions and even x86 system
> are configured sometimes to a lower clock speed by the firmware/ BIOS.
> In this case it is nice to have a cpufreq so it is possible to set the
> system during boot to a higher clock speed. And then remain idle unless
> the cpufreq governor changed.

Which we do not want here, thus disabling cpufreq is not the interesting
solution...

Best regards,
Krzysztof