CONFIG_PREEMPT causes corruption of application's FPU stack

From: Jürgen Mell
Date: Sat May 17 2008 - 12:51:35 EST


I am running the Einstein@home application (version 4.35,
http://einstein.phys.uwm.edu).This application does lots of computations
mostly with FPU and SSE instructions.
After I started experimenting with real-time optimized kernels the
application began to crash with floating point errors like in the
following message:

APP DEBUG: Application caught signal 8.

FPU status word ffffa0e1, flags: ERR_SUMM STACK_FAULT PRECISION INVALID
Obtained 6 stack frames for this thread.
Use gdb command: 'info line *0xADDRESS' to print corresponding line
numbers.
einstein_S5R3_4.35_i686-pc-linux-gnu[0x8069e7e]
einstein_S5R3_4.35_i686-pc-linux-gnu[0x818d436]
einstein_S5R3_4.35_i686-pc-linux-gnu[0x805db8f]
einstein_S5R3_4.35_i686-pc-linux-gnu[0x806b11c]
/lib/libc.so.6(__libc_start_main+0xe0)[0xb7e14fe0]
einstein_S5R3_4.35_i686-pc-linux-gnu(shmat+0x59)[0x804bda1]
Stack trace of LAL functions in worker thread:
GetSemiCohToplist at line 3177 of
file /home/bema/einsteinathome/HierarchicalSearch/EaH_build_release_einstein_S5R3_4.35/extra_sources/lalapps-CVS/src/pulsar/hough/src2/HierarchicalSearch.c
At lowest level status code = 0, description: NO LAL ERROR REGISTERED
called boinc_finish

I tracked this down to a single kernel configuration option. If
CONFIG_PREEMPT is set to 'y' the application will start crashing. If
CONFIG_PREEMPT is replaced by CONFIG_PREEMPT_VOLUNTARY, the application
will run without errors.

The problem is reproducible in so far as the error always occurs when
CONFIG_PREEMPT is set, but the time to the first occurrence varies greatly
from some minutes up to more than 10 CPU hours.

I found this error first on an openSUSE kernel 2.6.22.17-0.1-rt. I verified
the problem on the following kernel versions:

openSUSE 2.6.22.17-0.1-default
openSUSE 2.6.23.17-ccj64-rt
kernel.org 2.6.26-rc1
kernel.org 2.6.26-rc2-git5

My CPU is an Intel Core2Duo 6420, running two of the Einstein applications
in 32-bit mode. From a discussion on the Einstein message boards I know
that other user of the application are also affected.

Please let me know if you need any additional information to track this
down.
Jürgen
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/