Re: [REPORT] cfs-v6-rc2 vs sd-0.46 vs 2.6.21-rc7

From: Michael Gerdau
Date: Thu Apr 26 2007 - 09:23:05 EST


> as a summary: i think your numbers demonstrate it nicely that the
> shorter 'timeslice length' that both CFS and SD utilizes does not have a
> measurable negative impact on your workload.

That's my impression as well. In particular I think that for this
type of workload it is pretty much irrelevant which scheduler I'm
using :-)

> To measure the total impact
> of 'timeslicing' you might want to try the exact same workload with a
> much higher 'timeslice length' of say 400 msecs, via:
>
> echo 400000000 > /proc/sys/kernel/sched_granularity_ns # on CFS
> echo 400 > /proc/sys/kernel/rr_interval # on SD

I just finished building cfs and sd against 2.6.21 and will try with
these over the next days.

> your existing numbers are a bit hard to analyze because the 3 workloads
> were started at the same time and they overlapped differently and
> utilized the system differently.

Right. However the differences w/r to which job finished when, in
particular the change in who "came in" 2. and 3. between sd and cfs
had been consistent with other similar such jobs I ran over the last
couple of days.

In general sd tends to finish all three such jobs at roughly the
same time while cfs clearly "favors" the LTMM-type jobs (which
admittedly involve the least computations). I don't really know
why that is so. However the jobs does some minimal I/O after each
successfully finish run and that might be different w/r to sd and
cfs. Not really knowing what both scheduler do internally I'm in
no position to discuss that with either you or Con :)

> i think the primary number that makes sense to look at (which is perhaps
> the least sensitive to the 'overlap effect') is the 'combined user times
> of all 3 workloads' (in order of performance):
>
> > 2.6.21-rc7: 20589.423 100.00%
> > 2.6.21-rc7-cfs-v6-rc2 (X @ nice -10): 20613.845 99.88%
> > 2.6.21-rc7-sd046: 20617.945 99.86%
> > 2.6.21-rc7-cfs-v6-rc2 (X @ nice 0): 20743.564 99.25%
>
> to me this gives the impression that it's all "within noise".

I think so too. But then I didn't seriously expect any different.

> In
> particular the two CFS results suggest that there's at least a ~100
> seconds noise in these results, because the renicing of X should have no
> impact on the result (the workloads are pure number-crunchers, and all
> use up the CPUs 100%, correct?),

Correct.

The differences could as well result from small fluctuations in my
work on the machine during the test (like reading mail, editing sources
and similar stuff).

> another (perhaps less reliable) number is the total wall-clock runtime
> of all 3 jobs. Provided i did not make any mistakes in my calculations,
> here are the results:
>
> > 2.6.21-rc7-sd046: 10512.611 seconds
> > 2.6.21-rc7-cfs-v6-rc2 (X @ nice -10): 10605.946 seconds
> > 2.6.21-rc7: 10650.535 seconds
> > 2.6.21-rc7-cfs-v6-rc2 (X @ nice 0): 10788.125 seconds
>
> (the numbers are lower than the first numbers because this is a 2 CPU
> system)
>
> both SD and CFS-nice-10 was faster than mainline, but i'd say this too
> is noise - especially because this result highly depends on the way the
> workloads overlap in general, which seems to be different for SD.

I don't think that's noise. As I wrote above IMO this is the one
difference I personally consider significant. While running I could
watch intermediate output of all 3 jobs intermixed on a console.

With sd the jobs were -within reason- head to head while with cfs
and mainline LTMM quickly got ahead and remained there.

As I speculated above this (IMO real) difference in the way sd and
cfs (and mainline) schedule the jobs might be related to the way
the schedulers react on I/O but I don't know.

> system time is interesting too:
>
> > 2.6.21-rc7: 35.379
> > 2.6.21-rc7-cfs-v6-rc2 (X @ nice -10): 40.399
> > 2.6.21-rc7-cfs-v6-rc2 (X @ nice 0): 44.239
> > 2.6.21-rc7-sd046: 45.515

Over the weekend I might try to repeat these tests while doing
nothing else on the machine (i.e. no editing or reading mail).

> combined system+user time:
>
> > 2.6.21-rc7: 20624.802
> > 2.6.21-rc7-cfs-v6-rc2 (X @ nice 0): 20658.084
> > 2.6.21-rc7-sd046: 20663.460
> > 2.6.21-rc7-cfs-v6-rc2 (X @ nice -10): 20783.963
>
> perhaps it might make more sense to run the workloads serialized, to
> have better comparabality of the individual workloads. (on a real system
> you'd naturally want to overlap these workloads to utilize the CPUs, so
> the numbers you did are very relevant too.)

Yes. In fact the workloads used to be run serialized and only
recently I tried to easily have them run on multicore systems.

The whole set of such jobs consists of 312 tasks and thus the
idea is to get a couple of PS3 and have them run there...
(once I found the time to port the software).

> The vmstat suggested there is occasional idle time in the system - is
> the workload IO-bound (or memory bound) in those cases?

There is writing out stats after each run (i.e. every 5-8 sec) both
to the PostgreSQL DB as well as to the console.

I could get rid of the console I/O (i.e. make it conditional) and
for testing purpose could switch off the DB if that would make be
helpful.

Best,
Michael
--
Technosis GmbH, Geschäftsführer: Michael Gerdau, Tobias Dittmar
Sitz Hamburg; HRB 89145 Amtsgericht Hamburg
Vote against SPAM - see http://www.politik-digital.de/spam/
Michael Gerdau email: mgd@xxxxxxxxxxxx
GPG-keys available on request or at public keyserver

Attachment: pgp00000.pgp
Description: PGP signature