[PATCH UPDATED 2.6.32-rc6] sched, kvm: fix race condition involvingsched_in_preempt_notifers

From: Tejun Heo
Date: Mon Nov 30 2009 - 06:42:54 EST


498657a478c60be092208422fefa9c7b248729c2 incorrectly assumed that
preempt wasn't disabled around context_switch() and thus was fixing
imaginary problem. It also broke kvm because it depended on
->sched_in() to be called with irq enabled so that it can do smp calls
from there.

Revert the incorrect commit and add comment describing different
contexts under with the two callbacks are invoked.

Avi: spotted transposed in/out in the added comment.

Signed-off-by: Tejun Heo <tj@xxxxxxxxxx>
Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Cc: Ingo Molnar <mingo@xxxxxxx>
Cc: Avi Kivity <avi@xxxxxxxxxx>
---
Avi, thanks for spotting it.

include/linux/preempt.h | 5 +++++
kernel/sched.c | 2 +-
2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/include/linux/preempt.h b/include/linux/preempt.h
index 72b1a10..2e681d9 100644
--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -105,6 +105,11 @@ struct preempt_notifier;
* @sched_out: we've just been preempted
* notifier: struct preempt_notifier for the task being preempted
* next: the task that's kicking us out
+ *
+ * Please note that sched_in and out are called under different
+ * contexts. sched_out is called with rq lock held and irq disabled
+ * while sched_in is called without rq lock and irq enabled. This
+ * difference is intentional and depended upon by its users.
*/
struct preempt_ops {
void (*sched_in)(struct preempt_notifier *notifier, int cpu);
diff --git a/kernel/sched.c b/kernel/sched.c
index 3c91f11..e36c868 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2758,9 +2758,9 @@ static void finish_task_switch(struct rq *rq, struct task_struct *prev)
prev_state = prev->state;
finish_arch_switch(prev);
perf_event_task_sched_in(current, cpu_of(rq));
- fire_sched_in_preempt_notifiers(current);
finish_lock_switch(rq, prev);

+ fire_sched_in_preempt_notifiers(current);
if (mm)
mmdrop(mm);
if (unlikely(prev_state == TASK_DEAD)) {
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/