scheduler problems

From: Anjali Kulkarni (anjali@indranetworks.com)
Date: Wed Jun 12 2002 - 00:20:55 EST


Hi,

I am getting a problem in the scheduler() function....

I am running an in-kernel proxy on linux 2.2.16 and I get a problem in
sched.c at line 384. It is due to the fact that the schedule() function
does not find the 'current' process in the runqueue. (A detailed
explanation of the OOPS message which comes when run without serial
line debugging is given below).

With serial line debugging I got the following backtrace:---

0# schedule() at sched.c:384
1# schedule_timeout(timeout=-806527036) at sched.c:653
2# kupdate (unused=0x0) at buffer.c:1921
3# kernel_thread(fn=0xb, arg=0xbffff86c, flags=0) at process.c:496
4# system_call at process.c:812

Note that paramters to functions schedule_timeout(negative value) and
kernel_thread are incorrect or do not seem right.
When I booted the kernel, I set breakpoints in init/main.c where
kupdate is created, and it shows a correct call to kernel_thread-
>kupdate->schedule_timeout->schedule with all functions called with
correct parameters.

Can anyone tell me what's happening here? My kernel module is no way
the cause of any of this. A detailed explanation is given below...

Thanks!
Anjali

/*--------------------In more detail-------------------------*/
I am running an in-kernel proxy on linux 2.2.16, which places high
demand on the n/w activity of the linux m/c. I am repeatedly getting an
OOPS message at a particular place in the scheduler() function call. I
am trying to analyse the call trace, and it looks something like the
following:-

schedule()
schedule_timeout()
process_timeout()
do_poll()
sys_poll()

After looking at the address where OOPS reports a problem in schedule()
and looking at the objdump of sched.o, I found the problem is due to
the fact that when schedule() calls del_from_runqueue(), it finds that
the current process is *not* there on the runqueue. Further, this
current process *IS* present in the list of task structs, in
TASK_INTERRPUTIBLE state. This process is generally some process like
inetd or some process doing n/w activity.
Now, if I kill all the processes on my linux machine (just to check),
the problem frequency reduces, but it still appears, and now the
process not present on the runqueue is some process like init(pid 1) or
kupdate(pid 3). These are the processes which could not be
killed.
>From this, I concluded what happens is probably that some process
called sys_poll which called do_poll(). In do_poll(), a
process_timeout occured(I assume this a soft interrupt), which will try
and wakeup the process which caused it, ie put the process on the
runqueue. I dont know who calls the schedule_timeout? Is it the process
which wakes up from the call to schedule_timeout() after a context
switch occured? I am probably not aware of exactly how to interpret a
call trace...

Thanks again.

/*----------------------------------------------------------------*/

Anjali Kulkarni
Software Engineer
Indra Networks

~Living Well is the best Revenge~
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sat Jun 15 2002 - 22:00:24 EST