In-kernel deadlock of some sort with 2.6.39.2

From: Omari Stephens
Date: Wed Jul 13 2011 - 16:27:56 EST


Please CC me on responses, since I'm not on lkml.

### Short version:
Under 2.6.39.2, one of my machines regularly gets into a state where processes end up in uninterruptible waits that never end. One peculiar thing that happens is that attempts to stat(1) or read certain files from procfs never return.

I am pretty familiar with compiling and running my own kernels, but not so familiar with troubleshooting when non-obvious things go wrong. Any suggestions would be appreciated, even if it's "we might've fixed something related in version XYZ, try that one"

I've uploaded my config here:
http://web.mit.edu/~xsdg/Public/stuff/kernel/broken_2.6.39.2_config.txt


### Detailed version:
On one of my machines, I recently compiled and installed 2.6.39.2 alongside a switch from the nv driver to nouveau. This was specifically to solve an issue where FF7 nightly would cause high CPU usage in X just by virtue of painting the screen.

The upgrade did fix my X issues, FF7 is as smooth as could be hoped on this machine, but now FF periodically (but repeatably, after a reboot) stops responding. According to top, the system is about 94% IO-wait.:
Cpu0 : 3.7%us, 2.4%sy, 0.0%ni, 0.0%id, 93.9%wa, 0.0%hi, 0.0%si, 0.0%st

Oddly, I noticed that running `ps` would halt uninterruptibly. After some further debugging, I discovered that attempting to stat (not even read) certain files in procfs will never return. For instance:

19:36:38> [xsdg{perl}@/proc/4950]
$find | sort | xargs stat
[...]
File: `./environ'
Size: 0 Blocks: 0 IO Block: 1024 regular empty file
Device: 3h/3d Inode: 6413606 Links: 1
Access: (0400/-r--------) Uid: ( 1000/ xsdg) Gid: ( 1000/ xsdg)
Access: 2011-07-13 19:26:15.829482661 +0000
Modify: 2011-07-13 19:26:15.829482661 +0000
Change: 2011-07-13 19:26:15.829482661 +0000
[sits here indefinitely]

By the magical powers of deduction:
19:36:50> [xsdg{perl}@/proc/4950]
$l exe
[sits here indefinitely]

Oddly, I can stat cmdline with no issues, but if I try to _read_ it, then it blocks. As you might imagine, I have no idea what process 4950 is.
19:56:16> [xsdg{perl}@/proc/4950]
$stat cmdline
File: `cmdline'
Size: 0 Blocks: 0 IO Block: 1024 regular empty file
Device: 3h/3d Inode: 3553148 Links: 1
Access: (0444/-r--r--r--) Uid: ( 1000/ xsdg) Gid: ( 1000/ xsdg)
Access: 2011-07-12 18:13:35.481767937 +0000
Modify: 2011-07-12 18:13:35.481767937 +0000
Change: 2011-07-12 18:13:35.481767937 +0000

19:56:18> [xsdg{perl}@/proc/4950]
$cat cmdline
[sits here indefinitely]

--xsdg
http://blog.doppler-photo.net/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/