NFS deadlock explained (on S/390)

From: Ulrich Weigand (Ulrich.Weigand@de.ibm.com)
Date: Fri Aug 31 2001 - 11:14:38 EST


Hi,

I've found the cause of the NFS deadlocks on the S/390 platform. It's
a classical case of deadlocking on two spinlocks; one spinlock is the
xprt_sock_lock, and the other is a private spinlock used by the QDIO
driver to protect its data structures accessed from both its bottom half
and its interrupt handler.

Specifically, what happens is that the first CPU runs the QDIO bottom
half (from ksoftirqd, but this is probably not important). That bottom
half grabs the QDIO private spinlock, and then executes a bunch of code
including a dev_kfree_skb_any(). As we are only in a softirq, not a
hard irq, dev_kfree_skb_any() call kfree_skb() directly, which happens
to invoke the udp_write_space() callback in xprt.c. This routine then
spins trying to acquire the xprt_sock_lock.

On the other CPU, an normal sys_open system call on an NFS file is
processed; the call path goes through nfs_permission, rpc_execute,
and rpc_release_task to xprt_release. That routine holds the
xprt_sock_lock; however it takes it only with a spin_lock_bh, so that
interrupts are not disabled. While the lock is held, a QDIO interrupt
comes in; the QDIO interrupt handler spins trying to acquire the
QDIO private lock (to protect itself against parallel execution with
the QDIO bottom half).

Now, you can argue who's at fault ;-) On the one hand, QDIO should
probably hold on to its spinlock only for shorter periods of time,
and especially not call kfree_skb while holding it.

On the other hand, the xprt_sock_lock handling really appears to invite
such deadlocks, as xprt.c grabs it both from within the udp_write_space
callback, which can be called at times surprising to a driver who just
wanted to free a piece of memory, and at the same allows itself to be
interrupted by random driver IRQs while holding the lock ...

Whether this same situation can explain the deadlocks seen on other
platforms depends on whether the drivers used there exhibit similar
locking behaviours as the QDIO driver, of course.

(We've fixed our problem by changing the QDIO driver not to call
kfree_skb while holding to any lock.)

Mit freundlichen Gruessen / Best Regards

Ulrich Weigand

--
  Dr. Ulrich Weigand
  Linux for S/390 Design & Development
  IBM Deutschland Entwicklung GmbH, Schoenaicher Str. 220, 71032 Boeblingen
  Phone: +49-7031/16-3727   ---   Email: Ulrich.Weigand@de.ibm.com

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Fri Aug 31 2001 - 21:00:34 EST