Re: [patch 00/12] futex: Cure robust/PI futex exit races

From: Ingo Molnar
Date: Thu Nov 07 2019 - 03:41:43 EST



* Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:

> This series addresses a couple of robust/PI futex exit races:
>
> 1) The unlock races debugged and fixed by Yi and Yang
>
> These races are really subtle and I'm still puzzled how to trigger them
> reliably enough to decode them.
>
> The basic issue is that:
>
> A) An unlocking task can be killed between clearing the user space
> futex value and calling futex(FUTEX_WAKE).
>
> B) A woken up waiter can be killed before it can acquire the futex
> after returning to user space.
>
> In both cases the futex value is 0 and due to that the robust list exit
> code refuses to wake up waiters as the futex is not owned by the
> exiting task. As a consequence all other waiters might be blocked
> forever.
>
> 2) Oleg provided a test case which causes an infinite loop in the
> futex_lock_pi() code.
>
> The problem there is that an exiting task might be preempted by a
> waiter in a state which makes the waiter busy wait for the exiting task
> to complete the robust/PI exit cleanup code.
>
> That's obviously impossible when the waiter has higher priority than
> the exiting task and both are pinned on the same CPU resulting in a
> live lock.
>
> #1 is a straight forward and simple fix
>
> The solution Yi and Yang provided looks solid and in the worst case
> causes a spurious wakeup of a waiter which is nothing to worry about
> as all waiter code has to be prepared for that anyway.
>
> #2 is more complex
>
> In the current implementation there is no way to block until the exiting
> task has finished the cleanup.
>
> To fix this there is quite some code reshuffling required which at the
> same time is a valuable cleanup.
>
> The final solution is to guard the futex exit handling with a per task
> mutex and make the waiter block on that mutex until the exiting task has
> the cleanup completed.
>
> Details why a simpler solution is not feasible can be found here:
>
> https://lore.kernel.org/r/20191105152728.GA5666@xxxxxxxxxx
>
> Ignore my confusion of fork vs. vfork at the beginning of the thread.
> Futexes do that to human brains. :)
>
> The following series addresses both issues.
>
> Patch 1 is a slightly polished version of the original Yi and Yang
> submission. It is included for completeness sake and because it
> creates conflicts with the larger surgery which fixes issue #2.
>
> Aside of that a few eyeballs more on that subtlety are definitely not
> a bad thing especially as this has a user space component in it.
>
> The rest of the series addresses issue #2 which is more or less a kernel
> only problem, but extra eyeballs are appreciated.
>
> I'm certainly not proud about the solution for #2 but it's the best I could
> come up with without violating the user/kernel state consistency
> constraints.

I really like the whole series - this is how it should have been
implemented originally, but the exit scenarios 'looked' so simple so it
was just open-coded ... Mea culpa. :-)

As to ->futex_exit_mutex: that's really just a consequence of the ABI,
and a lot cleaner than all the previous pretense that these exit ops are
atomic - which they fundamentally aren't.

Haven't tested the series beyond build coverage, but the high level
principles behind the whole series look very sound to me:

Reviewed-by: Ingo Molnar <mingo@xxxxxxxxxx>

Thanks,

Ingo