Re: [PATCH] bug in futex unqueue_me

From: Steven Rostedt
Date: Sun Jul 30 2006 - 20:05:19 EST


On Sun, 2006-07-30 at 08:38 +0200, Ingo Molnar wrote:
> * Christian Borntraeger <borntrae@xxxxxxxxxx> wrote:
>
> > From: Christian Borntraeger <borntrae@xxxxxxxxxx>
> >
> > This patch adds a barrier() in futex unqueue_me to avoid aliasing of
> > two pointers.
> >
> > On my s390x system I saw the following oops:
>
> > So the code becomes more or less:
> > if (q->lock_ptr != 0) spin_lock(q->lock_ptr)
> > instead of
> > if (lock_ptr != 0) spin_lock(lock_ptr)
> >
> > Which caused the oops from above.
>
> interesting, how is this possible? We do a spin_lock(lock_ptr), and
> taking a spinlock is an implicit barrier(). So gcc must not delay
> evaluating lock_ptr to inside the critical section. And as far as i can
> see the s390 spinlock implementation goes through an 'asm volatile'
> piece of code, which is a barrier already. So how could this have
> happened? I have nothing against adding a barrier(), but we should first
> investigate why the spin_lock() didnt act as a barrier - there might be
> other, similar bugs hiding. (we rely on spin_lock()s barrier-ness in a
> fair number of places)

Ingo, this spinlock is probably still a barrier, but is it still a
barrier on itself? That is, the problem here is that we have the
compiler optimizing the lock_ptr temp variable that is used inside the
spin_lock. So does a spin_lock protect itself, or just the stuff inside
it?

Here we need a barrier to keep gcc from optimizing the use of the lock
and not what the lock is protecting.

I don't know about other areas in the kernel that has a dynamic spin
lock like this that needs protection.

-- Steve


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/