Re: NFS oops in 2.6.26rc4

From: Dave Jones
Date: Wed Jun 04 2008 - 14:27:18 EST


On Wed, Jun 04, 2008 at 02:13:08PM -0400, Chuck Lever wrote:
>
> On Jun 4, 2008, at 10:19 AM, Dave Jones wrote:
>
> > On Fri, May 30, 2008 at 03:37:01PM -0400, Chuck Lever wrote:
> >
> >>> Something else of note which I hadn't seen before, usually things
> >>> lock
> >>> up just after that first oops. For some reason, today it survived
> >>> a little longer, but things really went downhill fast.
> >>> It survived a 'dmesg ; scp dmesg davej@gelk', and then wedged solid.
> >>> So as well as the oops, it seems we're corrupting memory too.
> >>> For reference, this kernel has both SLUB_DEBUG and PAGEALLOC_DEBUG
> >>> enabled.
> >>
> >> I haven't seen this kind of problem here with .26, but yes, it does
> >> look like something is clobbering memory during an NFS mount.
> >>
> >> I introduced some NFS mount parsing changes in this commit range:
> >>
> >> 2d767432..82d101d5
> >>
> >> A quick bisect should show which, if any of these, is the guilty
> >> party. If any of these are the problem, I suspect it's 3f8400d1.
> >
> > I didn't get time to try this out yet (hopefully tomorrow).
> > In the meantime, we've just gotten word of another user seeing memory
> > corruption with nfs - https://bugzilla.redhat.com/show_bug.cgi?id=449958
>
> 449958 could very well be the same problem. The stack traceback is a
> lot cleaner than the one you originally sent, but there are a lot of
> similarities. (I doubt this is related to symlinks, as the comment
> suggests).
>
> Is commit 86d61d863 applied to the current rawhide kernel?

That kernel was .26rc4.git2, so unless it's only gone in in the last day
or two, yes. (Bandwidth impaired right now, and no local git repo to check)

Dave

--
http://www.codemonkey.org.uk
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/