[PATCH v2] posix-timers: RCU conversion

From: Eric Dumazet
Date: Tue Apr 05 2011 - 11:18:14 EST


Le mardi 05 avril 2011 Ã 16:48 +0200, Oleg Nesterov a Ãcrit :

> Can't we move this rcu_head into the union above?
>

Yes, you're right.


>
> Not that I really think it makes sense to change the patch... but we
> could even use SLAB_DESTROY_BY_RCU, this is more effective. All we
> need is ctor which sets ->it_signal = NULL and initializes ->it_lock
> for lock_timer().
>

I considered this, but this means SLUB cannot merge the
posix_timers_cache anymore.

This wastes more ram (each cpu has its own slab caches),
and considering posix-timers are seldom used...


> The question is, why do we use the global database for the timer ids.
> All timers live in signal_struct->posix_timers anyway, perhaps we could
> use a per-process array instead.
>

This would add some overhead at process creation (to initialize the
'array' or whatever tree root).

It would help some workloads, (create/delete timers from lot of
different processes/cpus). I am not sure we really need this right now,
since we waited 2011 before even trying to optimize read side ;)

Thanks

[PATCH v2] posix-timers: RCU conversion

Ben Nagy reported a scalability problem with KVM/QEMU that hit very hard
a single spinlock (idr_lock) in posix-timers code, on its 48 core
machine.

Even on a 16 cpu machine (2x4x2), a single test can show 98% of cpu time
used in ticket_spin_lock, from lock_timer

Ref: http://www.spinics.net/lists/kvm/msg51526.html

Switching to RCU is quite easy, IDR being already RCU ready.

idr_lock should be locked only for an insert/delete, not a lookup.

Benchmark on a 2x4x2 machine, 16 processes calling timer_gettime().

Before :

real 1m18.669s
user 0m1.346s
sys 1m17.180s

After :

real 0m3.296s
user 0m1.366s
sys 0m1.926s

Reported-by: Ben Nagy <ben@xxxxxxxx>
Signed-off-by: Eric Dumazet <eric.dumazet@xxxxxxxxx>
Tested-by: Ben Nagy <ben@xxxxxxxx>
Cc: Oleg Nesterov <oleg@xxxxxxxxxx>
Cc: Avi Kivity <avi@xxxxxxxxxx>
Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Cc: John Stultz <johnstul@xxxxxxxxxx>
Cc: Richard Cochran <richard.cochran@xxxxxxxxxx>
Cc: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>
---
V2 : rcu_head put in the 'it' union, as Oleg suggested.

include/linux/posix-timers.h | 1 +
kernel/posix-timers.c | 25 ++++++++++++++-----------
2 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/include/linux/posix-timers.h b/include/linux/posix-timers.h
index d51243a..ea0db16 100644
--- a/include/linux/posix-timers.h
+++ b/include/linux/posix-timers.h
@@ -80,6 +80,7 @@ struct k_itimer {
unsigned long incr;
unsigned long expires;
} mmtimer;
+ struct rcu_head rcu;
} it;
};

diff --git a/kernel/posix-timers.c b/kernel/posix-timers.c
index 4c01249..4a428b0 100644
--- a/kernel/posix-timers.c
+++ b/kernel/posix-timers.c
@@ -491,6 +491,13 @@ static struct k_itimer * alloc_posix_timer(void)
return tmr;
}

+static void k_itimer_rcu_free(struct rcu_head *head)
+{
+ struct k_itimer *tmr = container_of(head, struct k_itimer, it.rcu);
+
+ kmem_cache_free(posix_timers_cache, tmr);
+}
+
#define IT_ID_SET 1
#define IT_ID_NOT_SET 0
static void release_posix_timer(struct k_itimer *tmr, int it_id_set)
@@ -503,7 +510,7 @@ static void release_posix_timer(struct k_itimer *tmr, int it_id_set)
}
put_pid(tmr->it_pid);
sigqueue_free(tmr->sigq);
- kmem_cache_free(posix_timers_cache, tmr);
+ call_rcu(&tmr->it.rcu, k_itimer_rcu_free);
}

static struct k_clock *clockid_to_kclock(const clockid_t id)
@@ -631,22 +638,18 @@ out:
static struct k_itimer *__lock_timer(timer_t timer_id, unsigned long *flags)
{
struct k_itimer *timr;
- /*
- * Watch out here. We do a irqsave on the idr_lock and pass the
- * flags part over to the timer lock. Must not let interrupts in
- * while we are moving the lock.
- */
- spin_lock_irqsave(&idr_lock, *flags);
+
+ rcu_read_lock();
timr = idr_find(&posix_timers_id, (int)timer_id);
if (timr) {
- spin_lock(&timr->it_lock);
+ spin_lock_irqsave(&timr->it_lock, *flags);
if (timr->it_signal == current->signal) {
- spin_unlock(&idr_lock);
+ rcu_read_unlock();
return timr;
}
- spin_unlock(&timr->it_lock);
+ spin_unlock_irqrestore(&timr->it_lock, *flags);
}
- spin_unlock_irqrestore(&idr_lock, *flags);
+ rcu_read_unlock();

return NULL;
}


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/