Re: [ANNOUNCE/RFC] Really Fair Scheduler

From: Ingo Molnar
Date: Sun Sep 02 2007 - 05:26:52 EST



* Roman Zippel <zippel@xxxxxxxxxxxxxx> wrote:

> > with Peter's queue there are no underflows/overflows either anymore
> > in any synthetic corner-case we could come up with. Peter's queue
> > works well but it's 2.6.24 material.
>
> Did you even try to understand what I wrote? I didn't say that it's a
> "common problem", it's a conceptual problem. The rounding has been
> improved lately, so it's not as easy to trigger with some simple busy
> loops.

As i mentioned it in my first reply to you i really welcome your changes
and your interest in the Linux scheduler, but i'm still somewhat
surprised why you focus on rounding so much (and why you attack CFS's
math implementation so vehemently and IMO so unfairly) - and we had this
discussion before in the "CFS review" thread that you started.

The kind of rounding error you seem to be worried about is very, very
small. For normal nice-0 tasks it's in the "one part per a hundred
million" range, or smaller. As a comparison: "top"'s CPU utilization
statistics are accurate to "one part per thousand" - and that's roughly
the precision range that humans care about. (Graphical tools are even
coarser - one part per hundred or worse.)

I suspect if rounding effects are measurable you will post numbers that
prove it, correct? You did not do that so far. Or if they are not
measurable for any workload we care about, why should we worry about it
if it causes no problems in practice? (or if it causes problems, what
are the actual effects and can they be addressed in a simpler way?)

The main reason i'm interested in changing the fairness math under CFS
(be that Peter's changes or your changes) is _not_ primarily to address
any rounding behavior, but to potentially improve performance! Rounding
errors are at most an academic issue, unless they are actually
measurable in a workload we care about. (If rounding gets improved as a
side-effect of a change that's an added, albeit lower-prio bonus.) But
even more important is quality of scheduling - performance is secondary
to that. (unless performance is so bad that it becomes a quality issue:
such as an O(N) algorithm would do.)

Your math is fairly simple (and that is _good_, just like CFS's existing
math is simple), it can be summed up in essence as (without complicating
it with nice-level weighting, for easy understandability):

" use the already existing p->sum_exec_runtime 'task runtime' metric
that CFS maintains, and use that as the key into the rb-tree that
selects tasks that should be run next. To handle sleeping tasks: keep
a per-rq sum of all runnable task's ->sum_exec_runtime values and
start newly woken tasks at the average rq->sum/nr_running value. "

Now your patch does not actually do it that way in a clearly discernible
manner because lots of changes are intermixed into one big patch.

( please correct me if i got your math wrong. Your patch does not add
any comments at all to the new code and this slowed down my review
and analysis of your patch quite considerably. Lack of comments makes
it harder to see the purpose and makes it harder to notice the
benefits/tradeoffs involved in each change. )

Much of the remaining complexity in your patch i see as an add-on
optimization to that concept: you use a quite complex Bresenham
implementation that hides the real machinery of the code. Yes, rounding
improves too with Breshenham, but to me that is secondary - i'm mainly
interested in the performance aspects of Breshenham. There are
advantages of your approach but it also has disadavantages: you removed
sleeper fairness for example, which makes it harder to compare the
scheduling quality of the two implementations.

To sum up: the math in your patch _could_ be implemented as a much
smaller add-on to the already existing variables maintained by CFS, but
you chose to do lots of changes, variable-renames and removals at once
and posted them as one big change.

I really welcome large changes to the scheduler (hey, in the past 2.5
years alone we added 350+ scheduler commits from over 95 unique
contributors, so i'm as feature-happy as it gets), but it's far easier
to review and merge stuff if it's nicely split up. (I'll eventually get
through your patch too, but it's much harder that way and as you
probably know every core kernel hacker is away for the Kernel Summit so
dont expect much activity for a week or so.)

One conceptual reason behind the intrusive policy-modularization done in
CFS was to _never again_ be forced to do "big" scheduler changes - we
now can do most things in small, understandable steps.

I posted my very first, raw version of CFS after 3 days of hacking which
showed the raw math and nothing more, and i posted every iteration since
then, so you can follow through its history. _Please_, separate things
out so that they can be reviewed one by one. And _please_ order the
patches in "level of intrusiveness" order - i.e. leave the more
intrusive patches as late in the patch-queue as possible. Thanks!

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/