Re: [PATCH] ipc/sem: Fix race between to-be-woken task and waker

From: Waiman Long
Date: Thu Sep 26 2019 - 14:12:04 EST


On 9/26/19 5:34 AM, Peter Zijlstra wrote:
> On Fri, Sep 20, 2019 at 11:54:02AM -0400, Waiman Long wrote:
>> While looking at a customr bug report about potential missed wakeup in
>> the system V semaphore code, I spot a potential problem. The fact that
>> semaphore waiter stays in TASK_RUNNING state while checking queue status
>> may lead to missed wakeup if a spurious wakeup happens in the right
>> moment as try_to_wake_up() will do nothing if the task state isn't right.
>>
>> To eliminate this possibility, the task state is now reset to
>> TASK_INTERRUPTIBLE immediately after wakeup before checking the queue
>> status. This should eliminate the race condition on the interaction
>> between the queue status and the task state and fix the potential missed
>> wakeup problem.
> Bah, this code always makes my head hurt.
>
> Yes, AFAICT the pattern it uses has been broken since 0a2b9d4c7967,
> since that removed doing the actual wakeup from under the sem_lock(),
> which is what it relies on.

After having a second look at the code again, I probably misread the
code the first time around. In the sleeping path, there is a check of
queue.status and setting of task state both under the sem lock in the
sleeping path. So as long as setting of queue status is under lock, they
should synchronize properly.

It looks like queue status setting is under lock, but I can't use
lockdep to confirm that as the locking can be done by either the array
lock or in one of the spinlocks in the array. Are you aware of a way of
doing that?

Anyway, I do think we need to add some comment to clarify the situation
to avoid future confusion.

Cheers,
Longman