RE: Consistent lock up 2.6.10-rc1-bk7 (mutex/SCHED_RR bug?)

From: Andrew
Date: Fri Oct 29 2004 - 09:43:31 EST



I have reproduced this hang on 2.6.10-rc1-bk7, and have also installed the sysrq-n patch. Even after "SysRq : Nice All RT Tasks",
the system is completely unresponsive as far as user mode is concerned, and will only react to SysRq. It -does- respond to ICMP
pings. Sysrq-e, -k, -i do not stop the offending tt1 process.

I do not have netdump available in 2.6.10-rc1-bk7, and so cannot provide a full sysrq-t output, but the visible section shows two
tt1 threads with identical stacks:

schedule_timeout+0xd0/0xd2
futex_wait+0x140/0x1a9
do_futex+0x33/0x78
sys_futex+0xcd/0xd9
sysenter_past_esp+0x52/0x71

I then tried running this task as non-root user, which should prevent SCHED_RR and PRIO changes of the threads/tasks. Under these
conditions, the system does *not* hang. I noticed that the app periodically ends up in a high-speed loop involving the
ACE_Semaphore class in ACE; having checked the compilation flags, it seems ACE is simulating semaphors using below calls. It is
*not* using POSIX 1003.1b semaphores (sem_wait, etc.)

pthread_mutex_lock()
pthread_cond_wait()
pthread_cond_signal()

Although it appears I need to fix an applicaiton bug, is it normal/desirable for an application calling system mutex facilities to
starve the system so completely, and/or become "unkillable"?

A.


-----Original Message-----
From: Andrew [mailto:aathan-linux-kernel-1542@xxxxxxxxxxxxx]
Sent: Thursday, October 28, 2004 5:10 PM
To: linux-kernel@xxxxxxxxxxxxxxx
Cc: roland@xxxxxxxxxxx; Andrew Morton
Subject: Consistent lock up 2.6.8-1.521 (and 2.6.8.1 w/
high-res-timers/skas/sysemu)



Caveat: This may be an infinite loop in a SCHED_RR process. See very bottom of email for sysrq-t sysrq-p output.

[LARGE EMAIL DELETED]


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/