On Fri, May 30, 2008 at 03:37:01PM -0400, Chuck Lever wrote:
Something else of note which I hadn't seen before, usually things lock
up just after that first oops. For some reason, today it survived
a little longer, but things really went downhill fast.
It survived a 'dmesg ; scp dmesg davej@gelk', and then wedged solid.
So as well as the oops, it seems we're corrupting memory too.
For reference, this kernel has both SLUB_DEBUG and PAGEALLOC_DEBUG
enabled.
I haven't seen this kind of problem here with .26, but yes, it does
look like something is clobbering memory during an NFS mount.
I introduced some NFS mount parsing changes in this commit range:
2d767432..82d101d5
A quick bisect should show which, if any of these, is the guilty
party. If any of these are the problem, I suspect it's 3f8400d1.
I didn't get time to try this out yet (hopefully tomorrow).
In the meantime, we've just gotten word of another user seeing memory
corruption with nfs - https://bugzilla.redhat.com/show_bug.cgi?id=449958