Re: [RFC PATCH] cpufreq: qcom-cpufreq-hw: allow work to be done on other CPU for PREEMPT_RT

From: Sebastian Andrzej Siewior
Date: Thu Mar 23 2023 - 07:37:38 EST


On 2023-03-23 09:16:27 [+0100], Krzysztof Kozlowski wrote:
> > Yeah closer :) The CPU-mask for workqueues can still be different on
> > non-NOHZ-full CPUs. Still you interrupt the CPU doing in-userland work
> > and this is not desired.
>
> Probably this should be done by workqueue core code. Individual drivers
> should not need to investigate which CPUs are isolated.

_Either_ this is part of the interrupt service routine or it is not.
Sometimes work can be offloaded.
However this interrupt is short and offloads work to a workqueue.
Can the interrupt be moved to another CPU without breaking something?
The use can only change the CPUs on which the workqueue can run but also
the affinity of the IRQ itself. If the user wishes to isolate CPU X and
move workqueues and interrupts away from the CPU the question is why is
this a problem for you.

> > You have a threaded-IRQ which does nothing but schedules a worker. Why?
> > Why not sleep and remain in that threaded IRQ until the work is done?
> > You _can_ sleep in the threaded IRQ if you have to. Force-threaded is
> > different but this is one is explicit threaded so you could do it.
>
> If I get your point correctly, you want the IRQ handler thread to do the
> actual work instead of scheduling work? The answer to this is probably here:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e0e27c3d4e20dab861566f1c348ae44e4b498630

Let me look.

| Re-enabling an interrupt from its own interrupt handler may cause
| an interrupt storm, if there is a pending interrupt and because its
| handling is disabled due to already done entrance into the handler
| above in the stack.

I have hard time parsing this.
disable_irq_nosync() and enable enable_irq() shouldn't be invoked from
within the interrupt handler itself. This interrupt is already requested
as a threaded handler with IRQF_ONESHOT. This means the IRQ-chip already
disables the interrupt while the threaded handler is invoked. No need
for that.

I don't know what the purpose of reg_intr_clr here is. Acking the
interrupt before reading the status and doing any work does not look
right.

| Also, apparently it is improper to lock a mutex in an interrupt contex.

Again, this is an interrupt handler requested as a threaded handler.
This handler is invoked with enabled interrupts and preemption. The code
in this handler can invoke ssleep() or mutex_lock(). A might_sleep()
does not produce a plat here. It okay to acquire a mutex. This is why we
have threaded interrupts.

You can't acquire a mutex in a forced-threaded handler. This is not the
case here.

> >
> >>> However the thermal notifications have nothing to do with cpufreq.
> >>
> >> They have. The FW notifies that thermal mitigation is happening and
> >> maximum allowed frequency is now XYZ. The cpufreq receives this and sets
> >> maximum allowed scaling frequency for governor.
> >
> > I see. So the driver is doing something in worst case. This interrupt,
> > you have per-CPU and you need to do this CPU? I mean could you change
> > the affinity of the interrupt to another CPU?
>
> I don't know. The commit introducing it:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3ed6dfbd3bb987b3d2de86304ae45972ebff5870
> claimed it helps to reduce number of interrupts hitting CPU 10x-100x
> times... I don't see it - neither in tests nor in the code, so I am just
> thinking to revert that one.

So it may run on another CPU but doing it on the right cluster reduces
the received interrupt 10-100 times. Do we have per-cluster register or
is the interrupt ACK wrong and this what is observed?

The questions are:
- What is the interrupt signaling.
- What must be done to acknowledge the interrupt.

This needs to be figured out and verified that it actually works as
intended. The hardware might keep sending interrupt because the source
is either not acknowledge properly or the source of the interrupt (the
reason why it was generated in first place) is still pending/ not
handled. The changes you reference look like "if we do this then it
seems better.". No explanation about the issue/ root cause and the
targeted solution.

> Best regards,
> Krzysztof

Sebastian