Re: Mutex vs semaphores scheduler bug

From: Peter Zijlstra
Date: Mon Oct 12 2009 - 10:55:26 EST


On Sat, 2009-10-10 at 17:57 +0300, TÃrÃk Edwin wrote:
> If a semaphore (such as mmap_sem) is heavily congested, then using a
> userspace mutex makes the program faster.
>
> For example using a mutex around *anonymous* mmaps, speeds it up
> significantly (~80% on this microbenchmark,
> ~15% on real applications). Such workarounds shouldn't be necessary for
> userspace applications, the kernel should
> by default use the most efficient implementation for locks.

Should, yes, does, no.

> However when using a mutex the number of context switches is SMALLER by
> 40-60%.

That matches the problem, see below.

> I think its a bug in the scheduler, it scheduler the mutex case much
> better.

It's not, the scheduler doesn't know about mutexes/futexes/rwsems.

> Maybe because userspace also spins a bit before actually calling
> futex().

Nope, if we would ever spin, it would be in the kernel after calling
FUTEX_LOCK (which currently doesn't exist). glibc shouldn't do any
spinning on its own (if it does, I have yet another reason to try and
supplant the glibc futex code).

> I think its important to optimize the mmap_sem semaphore

It is.

The problem appears to be that rwsem doesn't allow lock-stealing, and
very strictly maintains FIFO order on contention. This results in extra
schedules and reduced performance as you noticed.

What happens is that when we release a contended rwsem we assign it to
the next waiter, if before that waiter gets ran, another (running) tasks
comes along and tries to acquire the lock, that gets put to sleep, even
though it could possibly get to acquire it (and the woken waiter would
detect failure and go back to sleep).

So what I think we need to do is have a look at all this lib/rwsem.c
slowpath code and hack in lock stealing.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/