Re: kernel BUG at fs/dcache.c:648! with v3.11-7890-ge5c832d

From: Linus Torvalds
Date: Tue Sep 10 2013 - 14:25:51 EST


On Tue, Sep 10, 2013 at 10:47 AM, Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> But yes, e5c832d is obviously the "fixed" kernel. Let me think about this.

Ok, I think I found it.

I missed that "terminate_walk()" for the RCU case does this:

nd->flags &= ~LOOKUP_RCU;
if (!(nd->flags & LOOKUP_ROOT))
nd->root.mnt = NULL;
unlock_rcu_walk();

and my unlazy_walk() essentially terminated the walk _without_
clearing that nd->root.mnt thing (it did clear the LOOKUP_RCU bit and
unlock_rcy_walk(). So then later, we'd end up doing an extra
path_put(). Explaining a zero d_lockref.count.

The whole damn root.mnt behavior with !LOOKUP_ROOT is a mystery and
needs more comments. But the attached trivial patch should do the
missing portion of terminate_walk().

Al, can you walk us through the rules for what "root.mnt == NULL"
really means? It's basically used as a flag for whether we've gotten
the root pointer or not. But it's pretty damn esoteric.

Now I'm starting to wonder how come _I_ never saw any issues. Maybe it
ends up underflowing so quickly that most people just see a big
negative number..

Patch is entirely untested. Not that my testing apparently has been much good.

Moneta, are you comfortable compiling a test-kernel, or does this need
to become a rawhide package?

Linus

Attachment: patch.diff
Description: Binary data