Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels

From: Ingo Molnar
Date: Wed Sep 15 2004 - 03:44:38 EST



* Nick Piggin <nickpiggin@xxxxxxxxxxxx> wrote:

> OK.
>
> Alternatively, I'd say tell everyone who wants really low latency to
> enable CONFIG_PREEMPT, which automatically gives the minimum possible
> preempt latency, delimited (and defined) by critical sections, instead
> of the more ad-hoc "sprinkling" ;)

it's not ad-hoc. These are the 10 remaining points for which there is no
natural might_sleep() point nearby (according to measurements). That's
why i called them 'complementary'. They cause zero problems for the
normal kernel (we already have another 70 cond_resched() points), but
they _are_ the ones needed in addition if might_sleep() also does
cond_resched().

the 'reliability' of latency break-up depends on the basic preemption
model. Believe me, even with CONFIG_PREEMPT there were a boatload of
critical sections that had insanely long latencies that nobody fixed
until the VP patchset came along. Without CONFIG_PREEMPT the number of
possibly latency-paths increases, but the situation is the same as with
CONFIG_PREEMPT: you need tools, people that test stuff and lots of
manual work to break them up reliably. You will never be 'done' but you
can do a reasonably good job for workloads that people care about.

the 'final' preemption model [for hard-RT purposes] that i believe will
make it into the Linux kernel one nice day is total preemptability of
everything but the core preemption code (i.e. the scheduler and
interrupt controllers). _That_ might be something that has provable
latencies. Note that such a 'total preemption' model has prerequisites
too, like the deterministic execution of hardirqs/softirqs.

note that the current lock-break-up activities still make alot of sense
even under the total-preemption model: it decreases the latency of
kernel-using hard-RT applications. (raw total preemption only guarantees
quick scheduling of the hard-RT task - it doesnt guarantee that the task
can complete any useful kernel/syscall work.)

since we already see at least 4 different viable preemption models
placed on different points in the 'latency reliability' spectrum, it
makes little sense to settle for any of them. So i'm aiming to keep the
core code flexible to have them all without much fuss, and usage will
decide which ones are needed. Maybe CONFIG_PREEMPT will merge into
CONFIG_TOTAL_PREEMPT. Maybe CONFIG_NO_PREEMPT will merge into
CONFIG_PREEMPT_VOLUNTARY. Maybe CONFIG_PREEMPT_VOLUNTARY will go away
altogether. We cannot know at this point, it all depends on how usage
(and consequently, hardware) evolves.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/