Re: kernel 2.6.39 (user mode linux) crashes (2.6.38 works fine)

From: Toralf Förster
Date: Sat May 21 2011 - 04:54:18 EST



Toralf Förster wrote at 00:53:50
> Bisecting gave :
>
>
> git bisect badd123375425d7df4b6081a631fc1203fceafa59b2 is the first bad
> commit commit d123375425d7df4b6081a631fc1203fceafa59b2
> Author: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> Date: Wed Jan 26 21:32:01 2011 +0100
>
> rwsem: Remove redundant asmregparm annotation
>
> Peter Zijlstra pointed out, that the only user of asmregparm (x86) is
> compiling the kernel already with -mregparm=3. So the annotation of
> the rwsem functions is redundant. Remove it.

BTW I double checked that this commit is the culprit of the hang - it is.
Furthermore I added these kernel config options :

tfoerste@n22 ~/devel/linux-2.6 $ diff .config* | grep '<' | grep -v '#'
< CONFIG_X86_CPU=y
< CONFIG_CMPXCHG_LOCAL=y
< CONFIG_IP_FIB_HASH=y
< CONFIG_RPCSEC_GSS_KRB5=y
< CONFIG_DEBUG_RT_MUTEXES=y
< CONFIG_DEBUG_PI_LIST=y
< CONFIG_DEBUG_MUTEXES=y
< CONFIG_BKL=y
< CONFIG_DEBUG_INFO=y
< CONFIG_DEBUG_WRITECOUNT=y
< CONFIG_DEBUG_LIST=y
< CONFIG_DEBUG_PAGEALLOC=y
< CONFIG_WANT_PAGE_DEBUG_FLAGS=y
< CONFIG_PAGE_POISONING=y

attached gdb to the top process of linx and run wiresahrk in parallel when I did
$>firefox https://n22_uml/phpmyadmin/ &


GDB gave :

Program received signal SIGSEGV, Segmentation fault.
0x0829f2bd in rwsem_down_failed_common (sem=0x84f6000, flags=2, adjustment=65535) at lib/rwsem.c:189
189 adjustment += RWSEM_WAITING_BIAS;
(gdb) bt full
#0 0x0829f2bd in rwsem_down_failed_common (sem=0x84f6000, flags=2, adjustment=65535) at lib/rwsem.c:189
waiter = {list = {next = 0x0, prev = 0x6}, task = 0x1809d480, flags = 2}
tsk = 0x1809d480
count = <value optimized out>
#1 0x0829f375 in rwsem_down_write_failed (sem=0x84f6000) at lib/rwsem.c:236
No locals.
#2 0x0829d0e2 in call_rwsem_down_write_failed () at arch/um/sys-i386/../../x86/lib/semaphore_32.S:97
No locals.
#3 0x0829eaa7 in __down_write_nested (sem=0x182aa774)
at /home/tfoerste/devel/linux-2.6/arch/x86/include/asm/rwsem.h:105
tmp = -1
#4 __down_write (sem=0x182aa774) at /home/tfoerste/devel/linux-2.6/arch/x86/include/asm/rwsem.h:121
No locals.
#5 down_write (sem=0x182aa774) at kernel/rwsem.c:51
No locals.
#6 0x080d2de3 in sys_brk (brk=139419648) at mm/mmap.c:254
rlim = <value optimized out>
newbrk = <value optimized out>
oldbrk = 0
mm = 0x182aa740
#7 0x08060d16 in handle_syscall (r=0x1809d650) at arch/um/kernel/skas/syscall.c:35
syscall = <value optimized out>
#8 0x08074ca1 in handle_trap (regs=0x1809d650) at arch/um/os-Linux/skas/process.c:201
err = <value optimized out>
status = 0
#9 userspace (regs=0x1809d650) at arch/um/os-Linux/skas/process.c:417
sig = <value optimized out>
timer = {it_interval = {tv_sec = 0, tv_usec = 0}, it_value = {tv_sec = 0, tv_usec = 5999}}
nsecs = <value optimized out>
err = <value optimized out>
status = 34175
op = 31
pid = 12820
local_using_sysemu = 2
#10 0x0805e0cb in fork_handler () at arch/um/kernel/process.c:181
No locals.
#11 0xaaaaaaaa in ?? ()
No symbol table info available.
(gdb) quit


I attached the wireshark stream onto this mail - the network connections at at packet #92:

TLSv1 754 [TCP Retransmission] Change Cipher Spec, Encrypted Handshake Message, Application Data


might give a hint, that in few cases the UML system even hangs during start of the sshd, isn't it ?


--
MfG/Sincerely
Toralf Förster
pgp finger print: 7B1A 07F4 EC82 0F90 D4C2 8936 872A E508 7DB6 9DA3

Attachment: uml_hang.pcap
Description: Binary data