Re: [PATCH 1/1] Revert "genirq: Remove the second parameter from handle_irq_event_percpu()"

From: Huang Shijie
Date: Wed Jan 13 2016 - 20:29:53 EST


On Wed, Jan 13, 2016 at 02:07:25PM +0100, Thomas Gleixner wrote:
> On Wed, 13 Jan 2016, zyjzyj2000@xxxxxxxxx wrote:
>
> > After this commit 71f64340fc0e ("genirq: Remove the second parameter
> > from handle_irq_event_percpu()") is applied, the variable action is
> > not protected by raw_spin_lock. The following calltrace will pop up.
>
> Thanks, for the report. I missed that detail when merging the patch!
>
> Just for correctness sake: You miss to explain why this can happen.
>
> It's not about the variable action, it's about desc->action not being
> protected anymore. So the reason why this oopses is that the action is being
> removed concurrently.
>
> CPU 0 CPU 1
>
> free_irq() lock(desc)
> lock(desc) handle_edge_irq()
> handle_irq_event(desc)
> unlock(desc)
> desc->action = NULL handle_irq_event_percpu(desc)
> action = desc->action
>
> While the original code did:
>
> free_irq() lock(desc)
> lock(desc) handle_edge_irq()
> handle_irq_event()
> action = desc->action
> unlock(desc)
> desc->action = NULL handle_irq_event_percpu(desc, action)
>
> So now the question is whether we revert that patch or simply change
> handle_irq_event_percpu() to deal with that. Patch below.
>
> That preserves us the code size reduction of commit 71f64340fc0e. This is safe
> because we either see a valid desc->action or NULL. If the action is about to
> be removed it is still valid as free_irq() is blocked on synchronize_irq().
>
> free_irq() lock(desc)
> lock(desc) handle_edge_irq()
> handle_irq_event(desc)
> set(INPROGRESS)
> unlock(desc)
> handle_irq_event_percpu(desc)
> action = desc->action
> desc->action = NULL
> sychronize_irq()
> while(INPROGRESS); lock(desc)
> clr(INPROGRESS)
> free(action)
>
> That's basically the same mechanism as we have for shared
> interrupts. action->next can become NULL while handle_irq_event_percpu()
> runs. Either it sees the action or NULL. It does not matter, because action
> itself cannot go away.
>
> Thanks,
>
> tglx
>
> 8<-------------
>
> --- a/kernel/irq/handle.c
> +++ b/kernel/irq/handle.c
> @@ -136,9 +136,15 @@ irqreturn_t handle_irq_event_percpu(stru
> {
> irqreturn_t retval = IRQ_NONE;
> unsigned int flags = 0, irq = desc->irq_data.irq;
> - struct irqaction *action = desc->action;
> + struct irqaction *action;
>
> - do {
> + /*
> + * READ_ONCE is not required here. The compiler cannot reload action
> + * because it'll be action->next for the second iteration of the loop.
> + */
> + action = desc->action;
> +
> + while (action) {
> irqreturn_t res;
>
> trace_irq_handler_entry(irq, action);
> @@ -173,7 +179,7 @@ irqreturn_t handle_irq_event_percpu(stru
>
> retval |= res;
> action = action->next;
> - } while (action);
> + }
>
> add_interrupt_randomness(irq, flags);

I prefer to this patch, revert the old the patch is not a good solution.

thanks
Huang Shijie