Re: [PATCH][RFC]: mutex: adaptive spin

From: Linus Torvalds
Date: Tue Jan 06 2009 - 13:26:44 EST




On Tue, 6 Jan 2009, Linus Torvalds wrote:
>
> Sure - the owner may have rescheduled to another CPU, but if it did that,
> then we really might as well sleep.

.. or at least re-try the loop.

It might be worth it to move the whole sleeping behavior into that helper
function (mutex_spin_or_sleep()), and just make the logic be:

- if we _enter_ the function with the owner not on a runqueue, then we
sleep

- otherwise, we spin as long as the owner stays on the same runqueue.

and then we loop over the whole "get mutex spinlock and revalidate it all"
after either sleeping or spinning for a while.

That seems much simpler in all respects. And it should simplify the whole
patch too, because it literally means that in the main
__mutex_lock_common(), the only real difference is

- __set_task_state(task, state);
- spin_unlock_mutex(&lock->wait_lock, flags);
- schedule();

becoming instead

+ mutex_spin_or_schedule(owner, task, lock, state);

and then all the "spin-or-schedule" logic is in just that one simple
routine (that has to look up the rq and drop the lock and re-take it).

The "mutex_spin_or_schedule()" code would literally look something like

void mutex_spin_or_schedule(owner, task, lock, state)
{
struct rq *owner_rq = owner->rq;

if (owner_rq->curr != owner) {
__set_task_state(task, state);
spin_unlock_mutex(&lock->wait_lock, flags);
schedule();
} else {
spin_unlock_mutex(&lock->wait_lock, flags);
do {
cpu_relax();
while (lock->owner == owner &&
owner_rq->curr == owner);
}
spin_lock_mutex(&lock->wait_lock, flags);
}

or something.

Btw, this also fixes a bug: your patch did

+ __set_task_state(task, state);
+ /* didnt get the lock, go to sleep: */
+ schedule();

for the schedule case without holding the mutex spinlock.

And that seems very buggy and racy indeed: since it doesn't hold the mutex
lock, if the old owner releases the mutex at just the right point (yeah,
yeah, it requires a scheduling event on another CPU in order to also miss
the whole "task_is_current()" logic), the wakeup can get lost, because you
set the state to sleeping perhaps _after_ the task just got woken up. So
we stay sleeping even though the mutex is clear.

So I'm going to NAK the original patch, and just -require- the cleanup I'm
suggesting as also fixing what looks like a bug.

Of course, once I see the actual patch, maybe I'm going to find something
_else_ to kwetch about.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/