Re: futex strangeness in 2.6.23-mm1/UML

From: Andrew Morton
Date: Mon Oct 22 2007 - 20:17:24 EST


On Mon, 22 Oct 2007 20:07:42 -0400
Rik van Riel <riel@xxxxxxxxxx> wrote:

> On Mon, 22 Oct 2007 17:29:26 -0400
> Rik van Riel <riel@xxxxxxxxxx> wrote:
>
> > On Mon, 22 Oct 2007 22:48:51 +0200
> > Miklos Szeredi <miklos@xxxxxxxxxx> wrote:
> >
> > > > > I'm getting a process stuck in pthread_rwlock_wrlock(), even
> > > > > though it looks like the lock is not held by anybody.
> > > > >
> > > > > I think the last -mm was OK. Any ideas?
> > > > >
> > > > > If not, I'll go searching for the offending patch.
> > > >
> > > > I wonder if that's the same bug that's breaking autofs for me.
> > >
> > > Probably.
> > >
> > > > Oct 22 14:39:01 kenny automount[2299]: cache_readlock: mapent
> > > > cache rwlock lock failed
> > > > Oct 22 14:39:01 kenny automount[2299]: unexpected pthreads error:
> > > > 11 at 65 in cache.c
> > > >
> > > > I'm bisecting 2.6.23-mm1 today to find the problem patch, with
> > > > some luck I'll have it this afternoon.
> > > >
> > > > With the series applied up to
> > > > whitespace-fixes-task-exit-handling.patch things work.
> > > >
> > > > It breaks before
> > > > kswapd-should-only-wait-on-io-if-there-is-io.patch
> > > >
> > > > That leaves only about 60-80 patches to look at :)
> > >
> > > OK, the first patch that breaks something for me is:
> > >
> > > pid-namespaces-move-alloc_pid-lower-in-copy_process.patch
> >
> > Confirmed. That same patch is the point where the bisect
> > here starts breaking autofs.
>
> Oww man. Getting into heisenbug territory now :(
>
> I bisected down to the point where I had these two patches between
> good and bad:
>
> # GOOD
> pid-namespaces-move-alloc_pid-lower-in-copy_process.patch
> pid-namespaces-make-proc-have-multiple-superblocks-one-for-each-namespace.patch
> # BAD
>
> Applying them one by one and rebuilding the kernel after each one
> gave me a working kernel, though!
>
> Bisecting them from the top (quilt pop) resulted in a broken kernel,
> bisecting from the bottom (quilt push) results in a working one.
>
> I have no idea what is going on any more...
>

I guess we can debug it in the old-fashioned ways. The first of which is
to palm the problem off on Pavel ;)

I don't recall seeing a simple step-by-step way by which others can
reproduce this?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/