ERESTARTSYS escaping from sem_wait with RTLinux patch

From: Blaise Gassend
Date: Sat Oct 10 2009 - 05:10:37 EST


The attached python program, in which 500 threads spin with microsecond
sleeps, crashes with a "sem_wait: Unknown error 512" (conditions
described below). This appears to be due to an ERESTARTSYS generated
from futex_wait escaping to user space (libc). My understanding is that
this should never happen and I am trying to track down what is going on.

Questions that would help me make progress:
-------------------------------------------

1) Where is the ERESTARTSYS being prevented from getting to user space?

The only likely place I see for preventing ERESTARTSYS from escaping to
user space is in arch/*/kernel/signal*.c. However, I don't see how the
code there is being called if there no signal pending. Is that a path
for ERESTARTSYS to escape from the kernel?

The following comment in kernel/futex.h in futex_wait makes me wonder if
two threads are getting marked as ERESTARTSYS. The first one to leave
the kernel processes the signal and restarts. The second one doesn't
have a signal to handle, so it returns to user space without getting
into signal*.c and wreaks havoc.

(...)
/*
* We expect signal_pending(current), but another thread may
* have handled it for us already.
*/
if (!abs_time)
return -ERESTARTSYS;
(...)

2) Why would this be happening only with RT kernels?

3) Any suggestions on the best place to patch/workaround this?

My understanding is that if I was to treat ERESTARTSYS as an EAGAIN,
most applications would be perfectly happy. Would bad things happen if I
replaced the ERESTARTSYS in futex_wait with an EAGAIN?

Crash conditions:
-----------------

- RTLinux only.
- More cores seems to make things worse. Lots of crashes on a dual-quad
core machine. None observed yet on dual core. At least one crash on a
dual-quad core when run with "taskset -c 1"
- Various versions, including 2.6.29.6-rt23, and whatever the latest was
earlier today.
- Seen on both ia64 and x86
- Ubuntu hardy and jaunty
- Sometimes hapens within 2 seconds on a dual quad-core machine, other
times will go for up to 30 minutes to an hour without crashing. I
suspect a dependence on system activity, but haven't noticed an obvious
pattern.
- Time to crash appears to drop fast with more CPU cores.
import threading
import time

exiting = False

def spin():
while not exiting:
time.sleep(0.000001)

for i in range(0,500):
threading.Thread(target=spin).start()

try:
spin()
finally:
exiting = True