Re: v2.6.26-rc9: kernel BUG at kernel/sched.c:5858!

From: Dmitry Adamushko
Date: Fri Jul 11 2008 - 05:02:52 EST


Vegard,


regarding the first crash. Would you please run your test with the
following debugging patch and let me know its output?

The apperance of " * [ pid ] comm (name), orig_cpu() ... " means we
hit a problematic case (with Miao Xie's patch it shouldn't crash).

I see that you have CONFIG_SCHED_DEBUG=y so I'm also interested in
messages from sched_domain_debug() - "CPU# attaching ...". IOW, all
the kernel messages appearing while a cpu is going down and up.


TIA,


--
Best regards,
Dmitry Adamushko
--- kernel/sched-orig.c 2008-07-10 15:08:01.000000000 +0200
+++ kernel/sched.c 2008-07-11 10:52:48.000000000 +0200
@@ -2081,6 +2081,7 @@ static int try_to_wake_up(struct task_st
unsigned long flags;
long old_state;
struct rq *rq;
+ int oops = 0;

if (!sched_feat(SYNC_WAKEUPS))
sync = 0;
@@ -2103,6 +2104,11 @@ static int try_to_wake_up(struct task_st
goto out_activate;

cpu = p->sched_class->select_task_rq(p, sync);
+ if (unlikely(cpu_is_offline(cpu))) {
+ cpu = orig_cpu;
+ oops = 1;
+ }
+
if (cpu != orig_cpu) {
set_task_cpu(p, cpu);
task_rq_unlock(rq, &flags);
@@ -2159,6 +2165,10 @@ out_running:
out:
task_rq_unlock(rq, &flags);

+ if (oops)
+ printk(KERN_ERR " * [ %d ] comm (%s), orig_cpu (%d), dst_cpu (%d), cpu (%d)\n",
+ p->pid, p->comm, orig_cpu, cpu, task_cpu(p));
+
return success;
}

@@ -5712,6 +5722,10 @@ static int __migrate_task_irq(struct tas
local_irq_disable();
ret = __migrate_task(p, src_cpu, dest_cpu);
local_irq_enable();
+
+ printk(KERN_ERR "__migrate(%d -- %s) -> cpu (%d) == ret (%d)\n",
+ p->pid, p->comm, dest_cpu, ret);
+
return ret;
}

@@ -5868,6 +5882,7 @@ static void migrate_dead(unsigned int de
* fine.
*/
spin_unlock_irq(&rq->lock);
+ printk(KERN_ERR "---> migrate_dead(%d -- %s)\n", p->pid, p->comm);
move_task_off_dead_cpu(dead_cpu, p);
spin_lock_irq(&rq->lock);