Re: SMP performance degradation with sysbench

From: Nick Piggin
Date: Mon Feb 26 2007 - 08:36:33 EST


Rik van Riel wrote:
Lorenzo Allegrucci wrote:

Hi lkml,

according to the test below (sysbench) Linux seems to have scalability
problems beyond 8 client threads:
http://jeffr-tech.livejournal.com/6268.html#cutid1
http://jeffr-tech.livejournal.com/5705.html
Hardware is an 8-core amd64 system and jeffr seems willing to try more
Linux versions on that machine.
Anyway, is there anyone who can reproduce this?


I have reproduced it on a quad core test system.

With 4 threads (on 4 cores) I get a high throughput, with
approximately 58% user time and 42% system time.

With 8 threads (on 4 cores) I get way lower throughput,
with 37% user time, 29% system time 35% idle time!

The maximum time taken per query also increases from
0.0096s to 0.5273s. Ouch!

I don't know if this is MySQL, glibc or Linux kernel,
but something strange is going on...

Like you, I'm also seeing idle time start going up as threads increase.

I initially thought this was a problem with the multiprocessor scheduler,
because the pattern is exactly like some artificat in the load balancing.

However, after looking at the stats, and testing a couple of things, I
think it may not be after all.

I've reproduced this on a 8-socket/16-way dual core Opteron. So far what
I am seeing is that MySQL is having trouble putting enough load into the
scheduler.

Virtually all of the sleep time is coming from unix_stream_recvmsg, which
seems to be what the clients and server threads use to communicate with.
There doesn't seem to be any other tell-tale event that the database is
blocking on.

It seems like it might at least partially be a problem with MySQL
thread/connection management.

I found a couple of interesting issues so far. Firstly, the MySQL version
that I'm using (5.0.26-Max) is making lots of calls to sched_setscheduler
attempting to fiddle with SCHED_OTHER priority in what looks like an
attempt to boot CPU time while holding some resource. All these calls
actually fail, because you cannot change SCHED_OTHER priority like that.
Adding a hack to make it fall through to set_user_nice provides a boost
which eliminates the cliff (but a downward degredation is still there).

Secondly, I've raised the thread numbers from 16 to 32 for my system,
which also provides a bit more (although doesn't help the downward
slope).

Combined, it looks like around 30-40% improvement past 16 threads. It
isn't anything like making up for the dropoff seen in the blog link, but
different systems, different mysql version... I wonder how close we are
with this hack in place?

Attached is a graph of my numbers, from 1 to 32 clients. plain = 2.6.20.1,
sched is with the attached sched patch, and thread is with 32 rather than
16 clients.

Anyway, I'll keep experimenting. If anyone from MySQL wants to help look
at this, send me a mail (eg. especially with the sched_setscheduler issue,
you might be able to do something better).

Nick

--
SUSE Labs, Novell Inc.

PNG image

--- kernel/sched.c.orig 2007-02-26 11:46:46.849841000 +0100
+++ kernel/sched.c 2007-02-26 12:04:09.283056000 +0100
@@ -4227,8 +4227,6 @@ recheck:
(p->mm && param->sched_priority > MAX_USER_RT_PRIO-1) ||
(!p->mm && param->sched_priority > MAX_RT_PRIO-1))
return -EINVAL;
- if (is_rt_policy(policy) != (param->sched_priority != 0))
- return -EINVAL;

/*
* Allow unprivileged RT tasks to decrease priority:
@@ -4302,6 +4300,13 @@ recheck:

rt_mutex_adjust_pi(p);

+ if (!is_rt_policy(policy)) {
+ if (param->sched_priority == 8)
+ set_user_nice(p, -20);
+ else
+ set_user_nice(p, param->sched_priority-6);
+ }
+
return 0;
}
EXPORT_SYMBOL_GPL(sched_setscheduler);