Re: -next: Nov 12 - kernel BUG at kernel/sched.c:7359!

From: Sachin Sant
Date: Wed Nov 25 2009 - 23:39:24 EST


Peter Zijlstra wrote:
Correct, Ingo objected to the fastpath overhead.

Could you please try the below patch which tries to address the issue
differently.
Works great. Thanks

Tested-by: Sachin Sant <sachinp@xxxxxxxxxx>

Regards
-Sachin

---
Subject: sched: Fix balance vs hotplug race
From: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>
Date: Wed Nov 25 13:31:39 CET 2009

Since (e761b77: cpu hotplug, sched: Introduce cpu_active_map and redo
sched domain managment) we have cpu_active_mask which is suppose to
rule scheduler migration and load-balancing, except it never did.

The particular problem being solved here is a crash in
try_to_wake_up() where select_task_rq() ends up selecting an offline
cpu because select_task_rq_fair() trusts the sched_domain tree to reflect
the current state of affairs, similarly select_task_rq_rt() trusts the
root_domain.

However, the sched_domains are updated from CPU_DEAD, which is after
the cpu is taken offline and after stop_machine is done. Therefore it
can race perfectly well with code assuming the domains are right.

Cure this by building the domains from cpu_active_mask on
CPU_DOWN_PREPARE.



--

---------------------------------
Sachin Sant
IBM Linux Technology Center
India Systems and Technology Labs
Bangalore, India
---------------------------------

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/