Re: RFC for a new Scheduling policy/class in the Linux-kernel

From: Chris Friesen
Date: Thu Jul 16 2009 - 11:17:51 EST

Next message: Gregory Haskins: "[KVM PATCH] KVM: introduce "xinterface" API for external interactionwith guests"
Previous message: Christian Bornträger: "Re: regression post 2.6.30: device mapper fails on some logical volumes"
In reply to: Raistlin: "Re: RFC for a new Scheduling policy/class in the Linux-kernel"
Next in thread: Ted Baker: "Re: RFC for a new Scheduling policy/class in the Linux-kernel"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Ted Baker wrote:
> On Mon, Jul 13, 2009 at 03:45:11PM -0600, Chris Friesen wrote:
>
>> Given that the semantics of POSIX PI locking assumes certain scheduler
>> behaviours, is it actually abstraction inversion to have that same
>> dependency expressed in the kernel code that implements it?
> ...>
>> The whole point of mutexes (and semaphores) within the linux kernel is
>> that it is possible to block while holding them. I suspect you're going
>> to find it fairly difficult to convince people to spinlocks just to make
>> it possible to provide latency guarantees.
>
> The abstraction inversion is when the kernel uses (internally)
> something as complex as a POSIX PI mutex. So, I'm not arguing
> that the kernel does not need internal mutexes/semaphores that
> can be held while a task is suspended/blocked. I'm just arguing
> that those internal mutexes/semaphores should not be PI ones.

This ties back to your other message with the comment about implementing
userspace PI behaviour via some simpler "loopholes".

If the application is already explicitly relying on PI pthread mutexes
(possibly because it hasn't got enough knowledge of itself to do PP or
to design the priorities in such a way that inversion isn't a problem)
then presumably priority inversion in the kernel itself will also be an
issue.

If a high-priority task makes a syscall that requires a lock currently
held by a sleeping low-priority task, and there is a medium priority
task that wants to run, the classic scenario for priority inversion has
been achieved.

>> On the other hand, PP requires code analysis to properly set the
>> ceilings for each individual mutex.
>
> Indeed, this is difficult, but no more difficult than estimating
> worst-case blocking times, which requires more extensive code
> analysis and requires consideration of more cases with PI than PP.

I know of at least one example with millions of lines of code being
ported to linux from another OS. The scheduling requirements are fairly
lax but deadlock due to priority inversion is a highly likely. They
compare PI and PP, see that PP requires up-front analysis, so they
enable PI.

I suspect there are other similar cases where deadlock is the real
issue, and hard realtime isn't a concern (but low latency may be
desirable). PI is simple to enable and doesn't require any thought on
the part of the app writer.

>> Certainly if you block waiting for I/O while holding a lock then it
>> impacts the ability to provide latency guarantees for others waiting for
>> that lock. But this has nothing to do with PI vs PP or spinlocks, and
>> everything to do with how the lock is actually used.
>
> My only point there was with respect to application-level use of
> POSIX mutexes, that if an application needs to suspend while
> holding a mutex (e.g., for I/O) then the application will have
> potentially unbounded priority inversion, and so is losing the
> benefit from priority inheritance. So, if the only benefit of
> PRIO_INHERIT over PRIO_PROTECT is being able to suspend while
> holding a lock, there is no real benefit.

At least for POSIX, both PI and PP mutexes can suspend while the lock is
held. From the user's point of view, the only difference between the
two is that PP bumps the lock holder's priority always, while PI bumps
the priority only if/when necessary.

Chris
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Gregory Haskins: "[KVM PATCH] KVM: introduce "xinterface" API for external interactionwith guests"
Previous message: Christian Bornträger: "Re: regression post 2.6.30: device mapper fails on some logical volumes"
In reply to: Raistlin: "Re: RFC for a new Scheduling policy/class in the Linux-kernel"
Next in thread: Ted Baker: "Re: RFC for a new Scheduling policy/class in the Linux-kernel"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]