DAC960 SMP deadlock fix

From: Andrea Arcangeli (andrea@suse.de)
Date: Thu Sep 07 2000 - 09:50:51 EST


Hi Leonard,

this night I (hopefully) finally spotted and fixed a longstanding deadlock
that was hitting us on heavily loaded server running the DAC960.

The bug is present also in the earlier 2.2.x and 2.4.x and it's _not_ been
introduced with the DAC960 updates in 2.2.17.

In 2.4.x the SMP deadlock can't trigger because of two reasons:

1) the waitqueue uses a per-queue spinlock and the driver always wakeup
   the waitqueue with the io_request_lock acquired
2) the waitqueue lock in 2.4.x at the moment is a spinlock_t (thus
   there isn't a __cli-less read_lock (no-irqsave) wake_up).

The bug triggers in 2.2.x because the waitqueue_lock is shard by all
waitqueues and the wake_up uses a read_lock (that can be interrupted by
irq handlers that can later spin on the io_request_lock).

The SMP locking rule that was not enforced by the driver and that was
causing the deadlock is:

        "never run add_wait_queue with the io_request_lock acquired"

The reason of the rule is that the irq_request_lock can be acquired by
interrupt handlers as well.
        
The patch with a description of the problem is here:

        ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/patches/v2.2/2.2.18pre2/patches/v2.2/2.2.18pre2/DAC960-SMP-lock-inversion-1

I recommend adding the above fix to 2.2.18pre3.

Andrea

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Thu Sep 07 2000 - 21:00:29 EST