Re: 3.5-rc6 futex_wait_requeue_pi oops.

From: Darren Hart
Date: Fri Jul 20 2012 - 02:55:54 EST


On 07/19/2012 05:37 PM, Darren Hart wrote:
>
>
> On 07/19/2012 04:22 PM, Darren Hart wrote:
>>
>>
>> On 07/13/2012 11:54 AM, Dave Jones wrote:
>>> On Fri, Jul 13, 2012 at 08:47:38PM +0200, Thomas Gleixner wrote:
>>> > On Fri, 13 Jul 2012, Dave Jones wrote:
>>> >
>>> > > Looks like calling futex() with garbage makes things unhappy.
>>> >
>>> > WARN_ON(!&q.pi_state);
>>> > pi_mutex = &q.pi_state->pi_mutex;
>>> > ret = rt_mutex_finish_proxy_lock(pi_mutex, to, &rt_waiter, 1);
>>> > debug_rt_mutex_free_waiter(&rt_waiter);
>>> >
>>> > So there is some weird way which causes q.pi_state = NULL. Dave, did
>>> > you see the warning before the oops happened ?
>>>
>>> No, that didn't seem to trigger.
>>
>> Well I don't have a fix yet, but I can explain this not triggering.
>>
>> q is on the stack, so the ADDRESS for q.pi_state is never going to be
>> NULL. However, properly instrumented, we do see this:
>>
>> [ 23.621501] ---[ end trace 20bdfb44db182a17 ]---
>> [ 23.622425] q.pi_state @ (null)
>> [ 23.623272] &q.pi_state @ ffff880185e2dca8
>> [ 23.624119] ------------[ cut here ]------------
>>
>> Duh.
>>
>> I'll add a fix to that WARN_ON in my futex-fixes branch along with the
>> fix for the bug Dan found.
>>
>
> I think I have root cause. futex_wait_requeue_pi() doesn't like having
> uaddr == uaddr2. The handle_early_wakeup() doesn't detect a problem
> because key2 IS the same as key1, I think. I've just discovered this and
> quickly hacked in a "if (uaddr==uaddr2) return -EINVAL" fix and the test
> continues to run (with just ops 0, 11, 12) for several minutes now
> (typically fails in a few seconds). I'll let it run for a few hours and
> contemplate the proper fix.

Dave, mind giving this a spin? It seems to be doing the trick here,
at least for the *REQUEUE_PI futex op codes in trinity.