Re: RFC: THE OFFLINE SCHEDULER

From: Thomas Gleixner
Date: Fri Aug 28 2009 - 06:30:22 EST


On Fri, 28 Aug 2009, Gregory Haskins wrote:
> > However, and to what I believe is your point: its not entirely clear to
> > me what impact, if any, there would be w.r.t. any _other_ events that
> > may be driven off of the scheduler tick (i.e. events other than
> > scheduling policies, like timeslice expiration, etc). Perhaps someone
> > else like Thomas, Ingo, or Peter have some input here.
> >
> > I guess the specific question to ask is: Does the scheduler tick code
> > have any role other than timeslice policies and updating accounting
> > information? Examples would include timer-expiry, for instance. I
> > would think most of this logic is handled by finer grained components
> > like HRT, but I am admittedly ignorant of the actual timer voodoo ;)

There is not much happening in the scheduler tick:

- accounting of CPU time. this can be delegated to some other CPU
as long as the user space task is running and consuming 100%

- timer list timers. If there is no service/device active on that CPU
then there are no timers to run

- rcu call backs. Same as above, but might need some tweaking.

- printk tick. Not really interesting

- scheduler time slicing. Not necessary in such a context

- posix cpu timers. Only interesting when the application uses them

So there is not much which needs the tick in such a scenario.

Of course we'd need to exclude that CPU from the do_timer duty as
well.

> Thinking about this idea some more: I can't see why this isn't just a
> trivial variation of the nohz idle code already in mainline. In both
> cases (idle and FIFO tasks) the cpu is "consumed" 100% by some arbitrary
> job (spinning/HLT for idle, RT thread for FIFO) while we have the
> scheduler tick disabled. The only real difference is a matter of
> power-management (HLT/mwait go to sleep-states, whereas spinning/rt-task
> run full tilt).
>
> Therefore the answer may be as simple as bracketing the FIFO task with
> tick_nohz_stop_sched_tick() + tick_nohz_restart_sched_tick(). The nohz
> code will probably need some minor adjustments so it is not assuming
> things about the state being "idle" (e.g. "isidle") for places when it
> matters (idle_calls++ stat is one example).

Yeah, it's similar to what we do in nohz idle already, but we'd need
to split out some of the functions very carefully to reuse them.

> Potential problems:
>
> a) disabling/renabling the tick on a per-RT task schedule() may prove to
> be prohibitively expensive.

For a single taks consuming 100% CPU it is a non problem. You disable
it once. But yes on a standard system this needs to be investigated.

> b) we will need to make sure the rt-bandwidth protection mechanism is
> defeated so the task is allowed to consume 100% bandwidth.
>
> Perhaps these states should be in the cpuset/root-domain, and configured
> when you create the partition (e.g. "tick=off", "bandwidth=off" makes it
> an "offline" set).

That makes sense and should not be rocket science to implement.

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/