[tip:irq/urgent] genirq: Validate action before dereferencing it in handle_irq_event_percpu()

From: tip-bot for Thomas Gleixner
Date: Thu Jan 14 2016 - 14:16:50 EST


Commit-ID: 570540d50710ed192e98e2f7f74578c9486b6b05
Gitweb: http://git.kernel.org/tip/570540d50710ed192e98e2f7f74578c9486b6b05
Author: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
AuthorDate: Wed, 13 Jan 2016 14:07:25 +0100
Committer: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
CommitDate: Thu, 14 Jan 2016 20:09:49 +0100

genirq: Validate action before dereferencing it in handle_irq_event_percpu()

commit 71f64340fc0e changed the handling of irq_desc->action from

CPU 0 CPU 1
free_irq() lock(desc)
lock(desc) handle_edge_irq()
if (desc->action) {
handle_irq_event()
action = desc->action
unlock(desc)
desc->action = NULL handle_irq_event_percpu(desc, action)
action->xxx
to

CPU 0 CPU 1
free_irq() lock(desc)
lock(desc) handle_edge_irq()
if (desc->action) {
handle_irq_event()
unlock(desc)
desc->action = NULL handle_irq_event_percpu(desc, action)
action = desc->action
action->xxx

So if free_irq manages to set the action to NULL between the unlock and before
the readout, we happily dereference a null pointer.

We could simply revert 71f64340fc0e, but we want to preserve the better code
generation. A simple solution is to change the action loop from a do {} while
to a while {} loop.

This is safe because we either see a valid desc->action or NULL. If the action
is about to be removed it is still valid as free_irq() is blocked on
synchronize_irq().

CPU 0 CPU 1
free_irq() lock(desc)
lock(desc) handle_edge_irq()
handle_irq_event(desc)
set(INPROGRESS)
unlock(desc)
handle_irq_event_percpu(desc)
action = desc->action
desc->action = NULL while (action) {
action->xxx
...
action = action->next;
sychronize_irq()
while(INPROGRESS); lock(desc)
clr(INPROGRESS)
free(action)

That's basically the same mechanism as we have for shared
interrupts. action->next can become NULL while handle_irq_event_percpu()
runs. Either it sees the action or NULL. It does not matter, because action
itself cannot go away before the interrupt in progress flag has been cleared.

Fixes: commit 71f64340fc0e "genirq: Remove the second parameter from handle_irq_event_percpu()"
Reported-by: zyjzyj2000@xxxxxxxxx
Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Cc: Huang Shijie <shijie.huang@xxxxxxx>
Cc: Jiang Liu <jiang.liu@xxxxxxxxxxxxxxx>
Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Cc: stable@xxxxxxxxxxxxxxx
Link: alpine.DEB.2.11.1601131224190.3575@nanos">http://lkml.kernel.org/r/alpine.DEB.2.11.1601131224190.3575@nanos
---
kernel/irq/handle.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/irq/handle.c b/kernel/irq/handle.c
index a302cf9..57bff78 100644
--- a/kernel/irq/handle.c
+++ b/kernel/irq/handle.c
@@ -138,7 +138,8 @@ irqreturn_t handle_irq_event_percpu(struct irq_desc *desc)
unsigned int flags = 0, irq = desc->irq_data.irq;
struct irqaction *action = desc->action;

- do {
+ /* action might have become NULL since we dropped the lock */
+ while (action) {
irqreturn_t res;

trace_irq_handler_entry(irq, action);
@@ -173,7 +174,7 @@ irqreturn_t handle_irq_event_percpu(struct irq_desc *desc)

retval |= res;
action = action->next;
- } while (action);
+ }

add_interrupt_randomness(irq, flags);