Scheduler lockup or nfsd problem in 2.6.24.2 and 2.6.23.17?

From: Allard Hoeve
Date: Thu Feb 28 2008 - 09:04:30 EST



Hello all,

The last few days our trusty NFS server has experienced several soft lockups. These occur every 11 hours or so. The system does not respond afterwards. Sending sysrq commands over the serial console seems to work allthough we had to powercycle the server once.

First we thought it would be an NFS problem, and now that we tried 2.6.23.17 instead of 2.6.24.2, we now have two different stacktraces that share a trace through nfsd (nfsd_direct_splice_actor):

http://article.gmane.org/gmane.linux.nfs/19107
http://article.gmane.org/gmane.linux.nfs/19130

The second however, leads me to think the (relatively new) scheduler might be involved through __check_preempt_curr_fair.

I'm now trying 2.6.22.19, which has a recent lockd issue with NFS fixed but hasn't had the scheduler update.

How do I go about debugging this problem? What do you experts think?

Regards,

Allard Hoeve

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/