Re: BFS vs. mainline scheduler benchmarks and measurements

From: Jens Axboe
Date: Wed Sep 09 2009 - 05:10:17 EST


On Wed, Sep 09 2009, Mike Galbraith wrote:
> On Wed, 2009-09-09 at 08:13 +0200, Ingo Molnar wrote:
> > * Jens Axboe <jens.axboe@xxxxxxxxxx> wrote:
> >
> > > On Tue, Sep 08 2009, Peter Zijlstra wrote:
> > > > On Tue, 2009-09-08 at 11:13 +0200, Jens Axboe wrote:
> > > > > And here's a newer version.
> > > >
> > > > I tinkered a bit with your proglet and finally found the
> > > > problem.
> > > >
> > > > You used a single pipe per child, this means the loop in
> > > > run_child() would consume what it just wrote out until it got
> > > > force preempted by the parent which would also get woken.
> > > >
> > > > This results in the child spinning a while (its full quota) and
> > > > only reporting the last timestamp to the parent.
> > >
> > > Oh doh, that's not well thought out. Well it was a quick hack :-)
> > > Thanks for the fixup, now it's at least usable to some degree.
> >
> > What kind of latencies does it report on your box?
> >
> > Our vanilla scheduler default latency targets are:
> >
> > single-core: 20 msecs
> > dual-core: 40 msecs
> > quad-core: 60 msecs
> > opto-core: 80 msecs
> >
> > You can enable CONFIG_SCHED_DEBUG=y and set it directly as well via
> > /proc/sys/kernel/sched_latency_ns:
> >
> > echo 10000000 > /proc/sys/kernel/sched_latency_ns
>
> He would also need to lower min_granularity, otherwise, it'd be larger
> than the whole latency target.
>
> I'm testing right now, and one thing that is definitely a problem is the
> amount of sleeper fairness we're giving. A full latency is just too
> much short term fairness in my testing. While sleepers are catching up,
> hogs languish. That's the biggest issue going on.
>
> I've also been doing some timings of make -j4 (looking at idle time),
> and find that child_runs_first is mildly detrimental to fork/exec load,
> as are buddies.
>
> I'm running with the below at the moment. (the kthread/workqueue thing
> is just because I don't see any reason for it to exist, so consider it
> to be a waste of perfectly good math;)

Using latt, it seems better than -rc9. The below are entries logged
while running make -j128 on a 64 thread box. I did two runs on each, and
latt is using 8 clients.

-rc9
Max 23772 usec
Avg 1129 usec
Stdev 4328 usec
Stdev mean 117 usec

Max 32709 usec
Avg 1467 usec
Stdev 5095 usec
Stdev mean 136 usec

-rc9 + patch

Max 11561 usec
Avg 1532 usec
Stdev 1994 usec
Stdev mean 48 usec

Max 9590 usec
Avg 1550 usec
Stdev 2051 usec
Stdev mean 50 usec

max latency is way down, and much smaller variation as well.


--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/