Re: [PATCH] sched: correct testing need_resched in mutex_spin_on_owner()

From: Hillf Danton
Date: Tue Jun 07 2011 - 14:22:37 EST


On Tue, Jun 7, 2011 at 11:08 PM, Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
> On Tue, Jun 07, 2011 at 10:47:21PM +0800, Hillf Danton wrote:
>> On Tue, Jun 7, 2011 at 10:40 PM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>> > On Tue, 2011-06-07 at 22:36 +0800, Hillf Danton wrote:
>> >
>> >> If you are right, the following comment also in __mutex_lock_common()
>> >>
>> >> Â Â Â for (;;) {
>> >> Â Â Â Â Â Â Â struct task_struct *owner;
>> >>
>> >> Â Â Â Â Â Â Â /*
>> >> Â Â Â Â Â Â Â Â* If there's an owner, wait for it to either
>> >> Â Â Â Â Â Â Â Â* release the lock or go to sleep.
>> >> Â Â Â Â Â Â Â Â*/
>> >> Â Â Â Â Â Â Â owner = ACCESS_ONCE(lock->owner);
>> >> Â Â Â Â Â Â Â if (owner && !mutex_spin_on_owner(lock, owner))
>> >> Â Â Â Â Â Â Â Â Â Â Â break;
>> >>
>> >> looks misleading too, but if owner is on this CPU, for what does we wait?
>> >
>> > huh, wtf!? it cannot be on this cpu, if it was we wouldn't be running
>> > the above code but whatever owner was doing.
>> >
>> > So my argument was, it cannot be on this cpu, therefore, by checking it
>> > is on a cpu, we check its on a different cpu.
>> >
>> > And I really don't see how any of that is related to the above.
>> >
>> Oh, it looks you are willing to rethink about testing need_resched?
>
> NO!
>
> Hillf, reading this thread I understand that you have no idea of what is
> happening here. You are totally clueless. Let me give you a clue.
>
> The idea of an adaptive mutex is quite simple. If a mutex is held for a
> short period of time, and on another CPU a task tries to grab the same
> mutex, it would be slower to have that task schedule out than to just
> spin waiting for that mutex to be released.
>
> Ideally, this mutex would then act like a spin lock, and things would
> run very efficiently. But, as it is not a spin lock, there's things we
> must also be aware of.
>
> 1) If the owner we are waiting for to release the mutex schedules out.
>
> 2) If another task wants us to run on the CPU of the waiter that is
> spinning.
>
> 3) If the waiter that is spinning runs out of its time quantum and
> should yield to another task.
>
> #1 is checked by the owner_running(). If that fails, then we should not
> spin anymore and its good to schedule. BTW, the answer why
> owner_running() is safe to check owner, is because it grabs rcu_locks()
> to make sure it is safe.
>
> #2 and #3 are checked by testing NEED_RESCHED on the waiter (NOT THE
> OWNER!). ÂIf the waiter that is spinning should be preempted, the waiter
> (not the owner) will have its NEED_RESCHED bit set. That is why the
> waiter's NEED_RESCHED flag is checked and not owners.
>
> And also we don't need to check need_resched() in the above code because
> the loop checks it anyway, or we break out of the loop.
>
> Understand?
>

After hours chewing __mutex_lock_common() and mutex_spin_on_owner(),
and what you and Peter said, the owner is on different CPU from the waiter,
and the patched need_resched is simply meaning the waiter is no longer
willing to wait for whatever reasons.

It is really unlucky tonight:(

Hillf
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/