Re: Kernel 2.4.0test10 crash (RAID+SMP)

From: Neil Brown (neilb@cse.unsw.edu.au)
Date: Mon Nov 06 2000 - 17:19:12 EST


On Monday November 6, jgarzik@mandrakesoft.com wrote:
> Neil Brown wrote:
> > It looks like an interupt is happening while another interrupt is
> > happening, which should be impossible... but it isn't.
>
> If multiple interrupts are hitting a single code path (like IDE irqs 14
> -and- 15), you definitely have to think about that. The reentrancy
> guarantee only exists when a single IRQ is assigned to a single
> handler...
>
> Jeff

Maybe I wasn't very clear in the description of the problem (it was a
busy day) and just hoped that the nature of the patch would make the
nature of the problem clear.

The b_end_io routine that raid1 attaches to io request buffer_heads
that are used for resyncing had a side effect of re-enabling
interrupts. As it is called from an interrupt context, this is
clearly a bug. It allowed another interrupt to be serviced before a
previous interrupt had been completed, which is a problem waiting to
happen.
In this case, it became a real problem because the first interrupt had
grabbed a spinlock (I didn't bother to discover which one) and the
second interrupt tried to grab the same spinlock. This produced the
deadlock which the NMI-Oopser detected and reported.

When I have (sometime today) convinced myself that I have found all
the spin_{,un}lock_irq() calls that could be called from interrupt
context and corrected them to spin_{,un}lock_irq{save,restore}()
calls, I will send the patch to Linus.

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Tue Nov 07 2000 - 21:00:20 EST