Re: NFS deadlock explained (on S/390)

From: Manfred Spraul (manfred@colorfullife.com)
Date: Fri Aug 31 2001 - 13:02:17 EST


> Specifically, what happens is that the first CPU runs the QDIO
> bottom half (from ksoftirqd, but this is probably not important).
> That bottom half grabs the QDIO private spinlock, and then
> executes a bunch of code including a dev_kfree_skb_any().
> As we are only in a softirq, not a hard irq, dev_kfree_skb_any()
> call kfree_skb() directly, which happens to invoke the
> udp_write_space() callback in xprt.c. This routine then spins
> trying to acquire the xprt_sock_lock.

I think in_irq() and in_interrupt() should check the cpu interrupt flag
and return TRUE if the per-cpu interrupts are disabled.

The current behavious is just weird:
    spin_lock_bh();
    in_interrupt(); --> true
    spin_unlock_bh();
    spin_lock_irq();
    in_interrupt(); --> false
    spin_unlock_irq();

> Whether this same situation can explain the deadlocks seen on
> other platforms depends on whether the drivers used there exhibit
> similar locking behaviours as the QDIO driver, of course.

It should be possible to detect that automatically: if
dev_kfree_skb_any() is called outside irq context with disabled per-cpu
interrupts then it's probably due to a spin_lock_irq() and could
deadlock.

--
    Manfred

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Fri Aug 31 2001 - 21:00:35 EST