Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

From: Ingo Molnar
Date: Fri Apr 13 2007 - 18:52:50 EST



* William Lee Irwin III <wli@xxxxxxxxxxxxxx> wrote:

> On Fri, Apr 13, 2007 at 10:21:00PM +0200, Ingo Molnar wrote:
> > [announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
> > i'm pleased to announce the first release of the "Modular Scheduler Core
> > and Completely Fair Scheduler [CFS]" patchset:
> > http://redhat.com/~mingo/cfs-scheduler/sched-modular+cfs.patch
> > This project is a complete rewrite of the Linux task scheduler. My goal
> > is to address various feature requests and to fix deficiencies in the
> > vanilla scheduler that were suggested/found in the past few years, both
> > for desktop scheduling and for server scheduling workloads.
> > [ QuickStart: apply the patch to v2.6.21-rc6, recompile, reboot. The
> > new scheduler will be active by default and all tasks will default
> > to the new SCHED_FAIR interactive scheduling class. ]
>
> A pleasant surprise, though I did see it coming.

hey ;)

> On Fri, Apr 13, 2007 at 10:21:00PM +0200, Ingo Molnar wrote:
> > Highlights are:
> > - the introduction of Scheduling Classes: an extensible hierarchy of
> > scheduler modules. These modules encapsulate scheduling policy
> > details and are handled by the scheduler core without the core
> > code assuming about them too much.
>
> It probably needs further clarification that they're things on the
> order of SCHED_FIFO, SCHED_RR, SCHED_NORMAL, etc.; some prioritization
> amongst the classes is furthermore assumed, and so on. [...]

yep - they are linked via sched_ops->next pointer, with NULL delimiting
the last one.

> [...] They're not quite capable of being full-blown alternative
> policies, though quite a bit can be crammed into them.

yeah, they are not full-blown: i extended them on-demand, for the
specific purposes of sched_fair.c and sched_rt.c. More can be done too.

> There are issues with the per- scheduling class data not being very
> well-abstracted. [...]

yes. It's on my TODO list: i'll work more on extending the cleanups to
those fields too.

> A binomial heap would likely serve your purposes better than rbtrees.
> It's faster to have the next item to dequeue at the root of the tree
> structure rather than a leaf, for one. There are, of course, other
> priority queue structures (e.g. van Emde Boas) able to exploit the
> limited precision of the priority key for faster asymptotics, though
> actual performance is an open question.

i'm caching the leftmost leaf, which serves as an alternate, task-pick
centric root in essence.

> Another advantage of heaps is that they support decreasing priorities
> directly, so that instead of removal and reinsertion, a less invasive
> movement within the tree is possible. This nets additional constant
> factor improvements beyond those for the next item to dequeue for the
> case where a task remains runnable, but is preempted and its priority
> decreased while it remains runnable.

yeah. (Note that in CFS i'm not decreasing priorities anywhere though -
all the priority levels in CFS stay constant, fairness is not achieved
via rotating priorities or similar, it is achieved via the accounting
code.)

> On Fri, Apr 13, 2007 at 10:21:00PM +0200, Ingo Molnar wrote:
> > due to its design, the CFS scheduler is not prone to any of the
> > 'attacks' that exist today against the heuristics of the stock
> > scheduler: fiftyp.c, thud.c, chew.c, ring-test.c, massive_intr.c all
> > work fine and do not impact interactivity and produce the expected
> > behavior.
>
> I'm always suspicious of these claims. [...]

hey, sure - but please give it a go nevertheless, i _did_ test all these
;)

> A moderately formal regression test suite needs to be assembled [...]

by all means feel free! ;)

> A more general question here is what you mean by "completely fair;"

by that i mean the most common-sense definition: with N tasks running
each gets 1/N CPU time if observed for a reasonable amount of time. Now
extend this to arbitrary scheduling patterns, the end result should
still be completely fair, according to the fundamental 1/N(time) rule
individually applied to all the small scheduling patterns that the
scheduling patterns give. (this assumes that the scheduling patterns are
reasonably independent of each other - if they are not then there's no
reasonable definition of fairness that makes sense, and we might as well
use the 1/N rule for those cases too.)

> there doesn't appear to be inter-tgrp, inter-pgrp, inter-session, or
> inter-user fairness going on, though one might argue those are
> relatively obscure notions of fairness. [...]

sure, i mainly concentrated on what we have in Linux today. The things
you mention are add-ons that i can see handling via new scheduling
classes: all the CKRM and containers type of CPU time management
facilities.

> What these things mean when there are multiple CPU's to schedule
> across may also be of concern.

that is handled by the existing smp-nice load balancer, that logic is
preserved under CFS.

> These testcases are oblivious to SMP. This will demand that a
> scheduling policy integrate with load balancing to the extent that
> load balancing occurs for the sake of distributing CPU bandwidth
> according to nice level. Some explicit decision should be made
> regarding that.

this should already work reasonably fine with CFS: try massive_intr.c on
an SMP box.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/