Re: CFS review

From: Ingo Molnar
Date: Fri Aug 10 2007 - 20:31:22 EST



* Ingo Molnar <mingo@xxxxxxx> wrote:

> * Roman Zippel <zippel@xxxxxxxxxxxxxx> wrote:
>
> > Well, I've sent him the stuff now...
>
> received it - thanks alot, looking at it!

everything looks good in your debug output and the TSC dump data, except
for the wait_runtime values, they are quite out of balance - and that
balance cannot be explained with jiffies granularity or with any sort of
sched_clock() artifact. So this clearly looks like a CFS regression that
should be fixed.

the only relevant thing that comes to mind at the moment is that last
week Peter noticed a buggy aspect of sleeper bonuses (in that we do not
rate-limit their output, hence we 'waste' them instead of redistributing
them), and i've got the small patch below in my queue to fix that -
could you give it a try?

this is just a blind stab into the dark - i couldnt see any real impact
from that patch in various workloads (and it's not upstream yet), so it
might not make a big difference. The trace you did (could you send the
source for that?) seems to implicate sleeper bonuses though.

if this patch doesnt help, could you check the general theory whether
it's related to sleeper-fairness, via turning it off:

echo 30 > /proc/sys/kernel/sched_features

does the bug go away if you do that? If sleeper bonuses are showing too
many artifacts then we could turn it off for final .23.

Ingo

--------------------->
Subject: sched: fix sleeper bonus
From: Ingo Molnar <mingo@xxxxxxx>

Peter Ziljstra noticed that the sleeper bonus deduction code was not
properly rate-limited: a task that scheduled more frequently would get a
disproportionately large deduction. So limit the deduction to delta_exec
and limit production to runtime_limit.

Not-Yet-Signed-off-by: Ingo Molnar <mingo@xxxxxxx>
---
kernel/sched_fair.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)

Index: linux/kernel/sched_fair.c
===================================================================
--- linux.orig/kernel/sched_fair.c
+++ linux/kernel/sched_fair.c
@@ -75,7 +75,7 @@ enum {

unsigned int sysctl_sched_features __read_mostly =
SCHED_FEAT_FAIR_SLEEPERS *1 |
- SCHED_FEAT_SLEEPER_AVG *1 |
+ SCHED_FEAT_SLEEPER_AVG *0 |
SCHED_FEAT_SLEEPER_LOAD_AVG *1 |
SCHED_FEAT_PRECISE_CPU_LOAD *1 |
SCHED_FEAT_START_DEBIT *1 |
@@ -304,11 +304,9 @@ __update_curr(struct cfs_rq *cfs_rq, str
delta_mine = calc_delta_mine(delta_exec, curr->load.weight, lw);

if (cfs_rq->sleeper_bonus > sysctl_sched_granularity) {
- delta = calc_delta_mine(cfs_rq->sleeper_bonus,
- curr->load.weight, lw);
- if (unlikely(delta > cfs_rq->sleeper_bonus))
- delta = cfs_rq->sleeper_bonus;
-
+ delta = min(cfs_rq->sleeper_bonus, (u64)delta_exec);
+ delta = calc_delta_mine(delta, curr->load.weight, lw);
+ delta = min((u64)delta, cfs_rq->sleeper_bonus);
cfs_rq->sleeper_bonus -= delta;
delta_mine -= delta;
}
@@ -521,6 +519,8 @@ static void __enqueue_sleeper(struct cfs
* Track the amount of bonus we've given to sleepers:
*/
cfs_rq->sleeper_bonus += delta_fair;
+ if (unlikely(cfs_rq->sleeper_bonus > sysctl_sched_runtime_limit))
+ cfs_rq->sleeper_bonus = sysctl_sched_runtime_limit;

schedstat_add(cfs_rq, wait_runtime, se->wait_runtime);
}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/