Re: [RFC PATCH] ipc/mqueue: avoid sleep after wakeup

From: Manfred Spraul
Date: Sat May 15 2021 - 01:42:11 EST


On 5/15/21 6:06 AM, Hillf Danton wrote:
On Fri, 14 May 2021 17:51:47 +0200 Manfred Spraul wrote:
On 5/14/21 5:01 AM, Hillf Danton wrote:
The pipeline waker could start doing its job once waiter releases lock and
get the work done before waiter takes a nap, so check wait condition before
sleep to avoid waiting the wakeup that will never come, though that does not
hurt much thanks to timer timeouts like a second.
First: The timeout could be infinity, thus the code must not rely on a
timeout wakeup.

A wrong wait is would be a bug.


Check signal for the same reason.

Signed-off-by: Hillf Danton <hdanton@xxxxxxxx>
---

--- y/ipc/mqueue.c
+++ x/ipc/mqueue.c
@@ -710,15 +710,24 @@ static int wq_sleep(struct mqueue_inode_
__set_current_state(TASK_INTERRUPTIBLE);
spin_unlock(&info->lock);
- time = schedule_hrtimeout_range_clock(timeout, 0,
- HRTIMER_MODE_ABS, CLOCK_REALTIME);
I do not see a bug:

We do the __set_current_state() while holding the spinlock. If there is
a wakeup, then the wakeup will change current->state to TASK_RUNNING.
Correct.
schedule() will not remove us from the run queue when current->state is
TASK_RUNNING. The same applies if there are pending signals: schedule()
checks for pending signals and sets current->state to TASK_RUNNING.

Since the __set_current_state() is done while we hold info->lock, and
since the wakeup cannot happen before we have dropped the lock [because
the task that wakes us up needs the same lock], I do not see how a
wakeup could be lost.

Thus: Which issue do you see?
waiter waker
---- ----
unlock
lock
irq set STATE_READY
softirq unlock
wakeup
sleep a tick
schedule();

No need to schedule given READY.

This is not possible to avoid:

waiter waker
---- ----
unlock
schedule();
calls __schedule()
<before rq_lock()>

  lock
set STATE_READY
unlock
wakeup
--> set waiter->state = TASK_RUNNING

Now the run queue will be evaluated even though there is strictly speaking no need to do that.
Changes in ipc/sem.c can't solve that: From what I see, the majority of the critical window is in kernel/sched/*.c and not in ipc/sem.c
I do not consider it as useful to add complexity just to reduce the size of a extremely rare event.

Especially: No harm is done. User space can be preempted at any time, so the kernel is always allowed to check the run queue.

--
Manfred