[patch] sched: fix SMT scheduler bug

From: Ingo Molnar
Date: Mon Feb 26 2007 - 08:32:06 EST



* Michal Piotrowski <michal.k.k.piotrowski@xxxxxxxxx> wrote:

> Ok, I think that this is a SMT scheduler problem.
>
> System works fine for fifteen hours.

hmmm. I think it's this piece of smt-nice code in sched.c:

if (dependent_sleeper(cpu, rq, next))
next = rq->idle;

this will skip to run this HT CPU if the sibling CPU is running a task
that is more important. And dependent_sleeper() doesnt skip kernel
threads such as ksoftirqd:

/* kernel/rt threads do not participate in dependent sleeping */
if (!p->mm || rt_task(p))
return 0;

but ... what if the highest-prio thread is a user-space task /and/ there
is ksoftirqd also queued, which is now delayed? We schedule the idle
task instead of processing softirqs. Ouch!

To test this theory, could you turn the SMT scheduler back on but also
apply the patch below. Does it still work fine?

Ingo

---------------------->
Subject: [patch] sched: fix SMT scheduler bug
From: Ingo Molnar <mingo@xxxxxxx>

the SMT scheduler incorrectly skips kernel threads even if they
are runnable (but they are preempted by a higher-prio user-space
task which got SMT-delayed by an even higher-priority task
running on a sibling CPU).

fix this for now by only doing the SMT-nice optimization if
the to-be-delayed task is the only runnable task. (This should
cover most of the real-life cases anyway.)

this bug has been in the SMT scheduler since 2.6.17 or so, but has
only been noticed now by the active check in the dynticks code.

Signed-off-by: Ingo Molnar <mingo@xxxxxxx>
---
kernel/sched.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux/kernel/sched.c
===================================================================
--- linux.orig/kernel/sched.c
+++ linux/kernel/sched.c
@@ -3537,7 +3537,7 @@ need_resched_nonpreemptible:
}
}
next->sleep_type = SLEEP_NORMAL;
- if (dependent_sleeper(cpu, rq, next))
+ if (rq->nr_running == 1 && dependent_sleeper(cpu, rq, next))
next = rq->idle;
switch_tasks:
if (next == rq->idle)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/