Oopses in 2.2.14 (question)

From: Andris Pavenis (andris@stargate.astr.lu.lv)
Date: Fri Jan 14 2000 - 13:02:10 EST


Hi!

I posted info about oopses I'm getting in average once per day earlier
this weak. Seems that problem (but source of problem still remains
unknown) is corruption of task queue.

Oopses happens in kernel/sched.c in procedure del_from_runqueue
as p->prev seems to be NULL (I still know nothing about value of
p->next). I don't know kernel well enough to find a reason of this
problem, but perhaps I could put in some sanity check in this function
with debug output (something similar as it was in 2.0.3X)

I applied following patch to add this test.

*** linux-2.2.15pre1/kernel/sched.c~1 Tue Jan 4 20:12:25 2000
--- linux-2.2.15pre1/kernel/sched.c Fri Jan 14 19:52:39 2000
***************
*** 380,385 ****
--- 380,395 ----
          struct task_struct *next = p->next_run;
          struct task_struct *prev = p->prev_run;
  
+ if (!prev || !next)
+ {
+ printk ("del_from_runqueue(%08X) : Task not in run queue\n",p);
+ printk ("prev_run=%08X next_run=%08X state=%d nr_running=%d\n",
+ prev, next, p->state, nr_running);
+ printk ("prev=%08X next=%08X pid=%d\n",
+ p->prev, p->next, (int) p->pid);
+ return;
+ }
+
          nr_running--;
          next->prev_run = prev;
          prev->next_run = next;

I hope that preventing oops with returning on error should not do harm.
Maybe it's better to zero p->next and p->prev on such error.

Andris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sat Jan 15 2000 - 21:00:24 EST