Re: NFS oops in 2.6.26rc4

From: Chuck Lever
Date: Wed Jun 04 2008 - 14:14:44 EST



On Jun 4, 2008, at 10:19 AM, Dave Jones wrote:

On Fri, May 30, 2008 at 03:37:01PM -0400, Chuck Lever wrote:

Something else of note which I hadn't seen before, usually things lock
up just after that first oops. For some reason, today it survived
a little longer, but things really went downhill fast.
It survived a 'dmesg ; scp dmesg davej@gelk', and then wedged solid.
So as well as the oops, it seems we're corrupting memory too.
For reference, this kernel has both SLUB_DEBUG and PAGEALLOC_DEBUG
enabled.

I haven't seen this kind of problem here with .26, but yes, it does
look like something is clobbering memory during an NFS mount.

I introduced some NFS mount parsing changes in this commit range:

2d767432..82d101d5

A quick bisect should show which, if any of these, is the guilty
party. If any of these are the problem, I suspect it's 3f8400d1.

I didn't get time to try this out yet (hopefully tomorrow).
In the meantime, we've just gotten word of another user seeing memory
corruption with nfs - https://bugzilla.redhat.com/show_bug.cgi?id=449958

449958 could very well be the same problem. The stack traceback is a lot cleaner than the one you originally sent, but there are a lot of similarities. (I doubt this is related to symlinks, as the comment suggests).

Is commit 86d61d863 applied to the current rawhide kernel?

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/