Soft lockup problem

From: Gerard Saraber
Date: Mon Feb 06 2012 - 10:40:45 EST


Greetings everyone,
I've been having a bit of a problem since upgrading to the linux 3.x
series, I have a machine that we're using as a NAS that runs various
rsync processes (mostly at night), lately after a day or two, I will
come in in the morning to a load average of 49, but the machine not
really doing anything, when trying to run 'dstat' the command just
hung with no output at all. there were no errors in the logs, or even
anything that would vaguely point at anything I could work with.
So needing to get the machine back to work I attempted to reboot it
"shutdown -r now" on console... it gives a nice message saying it's
going to reboot, but nothing ever happens.. the only way to reboot it
is by using ctrl + alt + sysrq + b. after which the machine reboots
and the raid array comes back clean.

I'm not sure how to troubleshoot this, any pointers would be appreciated.

I'm compiling 3.2.4 at the moment and found a bunch of possibly useful
options in the kernel debugging section:
detect hard/soft lockups and detect hung tasks, maybe it'll give me
something more to go on.

Some details about the machine:
Linux xenbox 3.2.2 #1 SMP Sun Jan 29 10:28:22 CST 2012 x86_64 Intel(R)
Xeon(R) CPU 5140 @ 2.33GHz GenuineIntel GNU/Linux
It has 3 software raid arrays (2 x 5 drives and 1 x 4 drives) LVM'ed
together into a 23TB XFS filesystem.
6GB memory and a pair of Intel Gigabit ethernet controllers bonded together.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/