Re: 3.5-rc6 futex_wait_requeue_pi oops.

From: Darren Hart
Date: Thu Jul 19 2012 - 20:39:02 EST




On 07/19/2012 04:22 PM, Darren Hart wrote:
>
>
> On 07/13/2012 11:54 AM, Dave Jones wrote:
>> On Fri, Jul 13, 2012 at 08:47:38PM +0200, Thomas Gleixner wrote:
>> > On Fri, 13 Jul 2012, Dave Jones wrote:
>> >
>> > > Looks like calling futex() with garbage makes things unhappy.
>> >
>> > WARN_ON(!&q.pi_state);
>> > pi_mutex = &q.pi_state->pi_mutex;
>> > ret = rt_mutex_finish_proxy_lock(pi_mutex, to, &rt_waiter, 1);
>> > debug_rt_mutex_free_waiter(&rt_waiter);
>> >
>> > So there is some weird way which causes q.pi_state = NULL. Dave, did
>> > you see the warning before the oops happened ?
>>
>> No, that didn't seem to trigger.
>
> Well I don't have a fix yet, but I can explain this not triggering.
>
> q is on the stack, so the ADDRESS for q.pi_state is never going to be
> NULL. However, properly instrumented, we do see this:
>
> [ 23.621501] ---[ end trace 20bdfb44db182a17 ]---
> [ 23.622425] q.pi_state @ (null)
> [ 23.623272] &q.pi_state @ ffff880185e2dca8
> [ 23.624119] ------------[ cut here ]------------
>
> Duh.
>
> I'll add a fix to that WARN_ON in my futex-fixes branch along with the
> fix for the bug Dan found.
>

I think I have root cause. futex_wait_requeue_pi() doesn't like having
uaddr == uaddr2. The handle_early_wakeup() doesn't detect a problem
because key2 IS the same as key1, I think. I've just discovered this and
quickly hacked in a "if (uaddr==uaddr2) return -EINVAL" fix and the test
continues to run (with just ops 0, 11, 12) for several minutes now
(typically fails in a few seconds). I'll let it run for a few hours and
contemplate the proper fix.

--
Darren Hart
Intel Open Source Technology Center
Yocto Project - Linux Kernel


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/