Re: [patch] speed up / fix the new generic semaphore code (fixAIM7 40% regression with 2.6.26-rc1)

From: Ingo Molnar
Date: Thu May 08 2008 - 11:01:58 EST



* Matthew Wilcox <matthew@xxxxxx> wrote:

> Fair is certainly the enemy of throughput (see also dbench arguments
> passim). It may be that some semaphore users really do want fairness
> -- it seems pretty clear that we don't want fairness for the BKL.

i dont think we need to consider any theoretical arguments about
fairness here as there's a fundamental down-to-earth maintenance issue
that governs: old semaphores were similarly unfair too, so it is just a
bad idea (and a bug) to change behavior when implementing new, generic
semaphores that are supposed to be a seemless replacement! This is about
legacy code that is intended to be phased out anyway.

This is already a killer argument and we wouldnt have to look any
further.

but even on the more theoretical level i disagree: fairness of CPU time
is something that is implemented by the scheduler in a natural way
already. Putting extra ad-hoc synchronization and scheduling into the
locking primitives around data structures only gives mathematical
fairness and artificial micro-scheduling, it does not actually make the
end result more useful! This is especially true for the BKL which is
auto-dropped by the scheduler anyway. (so descheduling a task will
automatically release it of its BKL ownership)

For example we've invested a _lot_ of time and effort into adding lock
stealing (i.e. intentional "unfairness") to kernel/rtmutex.c. Which is a
_lot_ harder to do atomically with PI constraints but still possible and
makes sense in the grand scheme of things. kernel/mutex.c is also
"unfair" - and that's correct IMO.

For the BKL in particular there's almost no sense to talk about any
underlying resource and there's almost no expectation from users for
that imaginery resource to be shared fairly.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/