oopses with openafs 1.3.83 and linux 2.4.31

From: Dimitris Zilaskos
Date: Fri Jun 03 2005 - 16:26:47 EST




(resending due to size , sorry mods)

Hi ,

After moving to linux 2.4.31 and openafs 1.3.83 , I still get
oopses once per day. Usually this causes the system to either freeze
totally (only replies to ping) or all processes segfault. Exacly the same
happened with 2.4.29 and 2.4.30 with openafs 1.3.82. With openafs 1.3.78
the oopses occur usually after 10+ days uptime (dunno with 2.4.31 yet
though,running it now)
My cell consists of one 1.2.x server with the openafs database and
one 1.3.x server , for large files support. The volumes I access are on
the 1.3.x server. All servers are running gentoo linux , the 1.3.x
2.6.11.6 kernels and the 1.2.x 2.4.27. My system is a dual p3 @ 600 Mhz
with 768 Mb ram. I have just memtested the system , in completed
several passes and no errors were detected. The cpu temperature is ok , I also checked thedisks. Without openafs running the system is stable in all
stress tests.Running slackware 10.1.
The problems always occur after 4 am , sometimes exacly at 4 am
sometimes one hour later. At 4 am my afs servers restart themselves.At the
same moment usually there are tenths of concurrent read/write operations
on my volumes. However I have not succeeded in reproducing the oopses while manually performing fileserver restarts with processes writing to afs.

First , the ooposes with 2.4.31 SMP and openafs 1.3.83

ksymoops 2.4.11 on i686 2.4.31. Options used
-V (default)
-k /proc/ksyms (default)
-l /proc/modules (default)
-o /lib/modules/2.4.31/ (default)
-m /usr/src/linux/System.map (default)

Warning: You did not tell me where to find symbol information. I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc. ksymoops -h explains the options.

Warning (compare_maps): libafs-2.4.31.mp symbol sys_chdir not found in /usr/local/lib/openafs/libafs-2.4.31.mp.o. Ignoring /usr/local/lib/openafs/libafs-2.4.31.mp.o entry
Warning (compare_maps): libafs-2.4.31.mp symbol sys_exit not found in /usr/local/lib/openafs/libafs-2.4.31.mp.o. Ignoring /usr/local/lib/openafs/libafs-2.4.31.mp.o entry
Warning (compare_maps): libafs-2.4.31.mp symbol sys_ioctl not found in /usr/local/lib/openafs/libafs-2.4.31.mp.o. Ignoring /usr/local/lib/openafs/libafs-2.4.31.mp.o entry
Warning (compare_maps): libafs-2.4.31.mp symbol sys_open not found in /usr/local/lib/openafs/libafs-2.4.31.mp.o. Ignoring /usr/local/lib/openafs/libafs-2.4.31.mp.o entry
Warning (compare_maps): libafs-2.4.31.mp symbol sys_wait4 not found in /usr/local/lib/openafs/libafs-2.4.31.mp.o. Ignoring /usr/local/lib/openafs/libafs-2.4.31.mp.o entry
Warning (compare_maps): libafs-2.4.31.mp symbol sys_write not found in /usr/local/lib/openafs/libafs-2.4.31.mp.o. Ignoring /usr/local/lib/openafs/libafs-2.4.31.mp.o entry
Jun 2 05:03:24 system kernel: Unable to handle kernel paging request at virtual address a7bd037a
Jun 2 05:03:24 system kernel: c015a6fb
Jun 2 05:03:24 system kernel: *pde = 00000000
Jun 2 05:03:24 system kernel: Oops: 0000
Jun 2 05:03:24 system kernel: CPU: 1
Jun 2 05:03:24 system kernel: EIP: 0010:[<c015a6fb>] Tainted: P Using defaults from ksymoops -t elf32-i386 -a i386
Jun 2 05:03:24 system kernel: EFLAGS: 00010286
Jun 2 05:03:24 system kernel: eax: 00000000 ebx: c9e41d60 ecx: c9e41d70 edx: c9e41d70
Jun 2 05:03:24 system kernel: esi: a7bd035a edi: 00000000 ebp: 0001bec2 esp: effddf04
Jun 2 05:03:24 system kernel: ds: 0018 es: 0018 ss: 0018
Jun 2 05:03:24 system kernel: Process kswapd (pid: 5, stackpage=effdd000)
Jun 2 05:03:24 system kernel: Stack: ee2f4140 c01575d1 eb4deee0 d70f5938 d70f5920 c9e41d60 c0157a2e c9e41d60 Jun 2 05:03:24 system kernel: d70f59a0 00000006 c17e3558 c03a3b1c 000007d7 c0157e04 00021f7b c0138ee0 Jun 2 05:03:24 system kernel: 00000006 000001d0 ffffffff 000001d0 00000006 00000013 000001d0 c03a3b1c Jun 2 05:03:24 system kernel: Call Trace: [<c01575d1>] [<c0157a2e>] [<c0157e04>] [<c0138ee0>] [<c013934a>]
Jun 2 05:03:24 system kernel: [<c01393c2>] [<c0119881>] [<c0139586>] [<c01395f8>] [<c0139738>] [<c0105000>]
Jun 2 05:03:24 system kernel: [<c010740e>] [<c01396a0>]
Jun 2 05:03:24 system kernel: Code: 8b 46 20 85 c0 74 0e 89 c7 8b 40 18 85 c0 0f 85 c6 02 00 00


EIP; c015a6fb <iput+2b/320> <=====

ebx; c9e41d60 <_end+99d9920/304f8c20>
ecx; c9e41d70 <_end+99d9930/304f8c20>
edx; c9e41d70 <_end+99d9930/304f8c20>
esp; effddf04 <_end+2fb75ac4/304f8c20>

Trace; c01575d1 <dput+31/190>
Trace; c0157a2e <prune_dcache+fe/190>
Trace; c0157e04 <shrink_dcache_memory+24/40>
Trace; c0138ee0 <shrink_cache+170/430>
Trace; c013934a <shrink_caches+4a/60>
Trace; c01393c2 <try_to_free_pages_zone+62/100>
Trace; c0119881 <schedule+2c1/550>
Trace; c0139586 <kswapd_balance_pgdat+66/b0>
Trace; c01395f8 <kswapd_balance+28/40>
Trace; c0139738 <kswapd+98/b9>
Trace; c0105000 <_stext+0/0>
Trace; c010740e <arch_kernel_thread+2e/40>
Trace; c01396a0 <kswapd+0/b9>

Code; c015a6fb <iput+2b/320>
00000000 <_EIP>:
Code; c015a6fb <iput+2b/320> <=====
0: 8b 46 20 mov 0x20(%esi),%eax <=====
Code; c015a6fe <iput+2e/320>
3: 85 c0 test %eax,%eax
Code; c015a700 <iput+30/320>
5: 74 0e je 15 <_EIP+0x15>
Code; c015a702 <iput+32/320>
7: 89 c7 mov %eax,%edi
Code; c015a704 <iput+34/320>
9: 8b 40 18 mov 0x18(%eax),%eax
Code; c015a707 <iput+37/320>
c: 85 c0 test %eax,%eax
Code; c015a709 <iput+39/320>
e: 0f 85 c6 02 00 00 jne 2da <_EIP+0x2da>

Jun 2 05:03:24 system kernel: <1>Unable to handle kernel paging request at virtual address 345fc3e1
Jun 2 05:03:24 system kernel: c015a6fb
Jun 2 05:03:24 system kernel: *pde = 00000000
Jun 2 05:03:24 system kernel: Oops: 0000
Jun 2 05:03:24 system kernel: CPU: 0
Jun 2 05:03:24 system kernel: EIP: 0010:[<c015a6fb>] Tainted: P Jun 2 05:03:24 system kernel: EFLAGS: 00010202
Jun 2 05:03:24 system kernel: eax: 00000000 ebx: c9e41b60 ecx: c9e41b70 edx: c9e41b70
Jun 2 05:03:24 system kernel: esi: 345fc3c1 edi: 00000000 ebp: 00000003 esp: ebe39c44
Jun 2 05:03:24 system kernel: ds: 0018 es: 0018 ss: 0018
Jun 2 05:03:24 system kernel: Process httpd (pid: 239, stackpage=ebe39000)
Jun 2 05:03:24 system kernel: Stack: f09c210f f0add800 00000000 d70f58b8 d70f58a0 c9e41b60 c0157a2e c9e41b60 Jun 2 05:03:24 system kernel: 00000003 d4f6fb40 d4f6fb50 d4f6fb40 f0b31b34 c0157dd4 00000003 d4f6fb40 Jun 2 05:03:24 system kernel: c01577c3 d4f6fb40 f0b31a00 f0b31a10 f09bd41b d4f6fb40 00000000 00000000 Jun 2 05:03:24 system kernel: Call Trace: [<f09c210f>] [<c0157a2e>] [<c0157dd4>] [<c01577c3>] [<f09bd41b>]
Jun 2 05:03:24 system kernel: [<f09c9427>] [<f09ab280>] [<f09a549e>] [<f09a55b0>] [<f09d380f>] [<f09af7b8>]
Jun 2 05:03:24 system kernel: [<f09f466e>] [<f09af87c>] [<f09af5cd>] [<f09ca472>] [<f09c1d69>] [<c01575d1>]
Jun 2 05:03:24 system kernel: [<f0a15a60>] [<f09f7206>] [<c0157e41>] [<c014d032>] [<c014d9ec>] [<c014dec9>]
Jun 2 05:03:24 system kernel: [<c014e229>] [<c014a03f>] [<c01411e9>] [<c0108ebb>]
Jun 2 05:03:24 system kernel: Code: 8b 46 20 85 c0 74 0e 89 c7 8b 40 18 85 c0 0f 85 c6 02 00 00


EIP; c015a6fb <iput+2b/320> <=====

ebx; c9e41b60 <_end+99d9720/304f8c20>
ecx; c9e41b70 <_end+99d9730/304f8c20>
edx; c9e41b70 <_end+99d9730/304f8c20>
esp; ebe39c44 <_end+2b9d1804/304f8c20>

Trace; f09c210f <[libafs-2.4.31.mp].text.lock.KBUILD_BASENAME+127/3c8>
Trace; c0157a2e <prune_dcache+fe/190>
Trace; c0157dd4 <shrink_dcache_parent+24/30>
Trace; c01577c3 <d_invalidate+93/b0>
Trace; f09bd41b <[libafs-2.4.31.mp]afs_syscall_call+10cb/1910>
Trace; f09c9427 <[libafs-2.4.31.mp]PCallBackAddr+3d7/405>
Trace; f09ab280 <[libafs-2.4.31.mp]AllocPacketBufs+0/180>
Trace; f09a549e <[libafs-2.4.31.mp]rxi_Start+6ee/790>
Trace; f09a55b0 <[libafs-2.4.31.mp]rxi_Send+70/c0>
Trace; f09d380f <[libafs-2.4.31.mp].rodata.end+9914/e505>
Trace; f09af7b8 <[libafs-2.4.31.mp]_RXAFSCB_WhoAreYou+98/120>
Trace; f09f466e <END_OF_CODE+f592/????>
Trace; f09af87c <[libafs-2.4.31.mp]_RXAFSCB_InitCallBackState3+3c/110>
Trace; f09af5cd <[libafs-2.4.31.mp]_RXAFSCB_GetXStats+27d/2b0>
Trace; f09ca472 <[libafs-2.4.31.mp].rodata.end+577/e505>
Trace; f09c1d69 <[libafs-2.4.31.mp]afs_icl_EnumerateSets+b9/150>
Trace; c01575d1 <dput+31/190>
Trace; f0a15a60 <END_OF_CODE+30984/????>
Trace; f09f7206 <END_OF_CODE+1212a/????>
Trace; c0157e41 <d_alloc+21/190>
Trace; c014d032 <real_lookup+f2/140>
Trace; c014d9ec <link_path_walk+7ec/ac0>
Trace; c014dec9 <path_lookup+39/40>
Trace; c014e229 <__user_walk+49/60>
Trace; c014a03f <sys_stat64+1f/90>
Trace; c01411e9 <sys_open+79/c0>
Trace; c0108ebb <system_call+33/38>

Code; c015a6fb <iput+2b/320>
00000000 <_EIP>:
Code; c015a6fb <iput+2b/320> <=====
0: 8b 46 20 mov 0x20(%esi),%eax <=====
Code; c015a6fe <iput+2e/320>
3: 85 c0 test %eax,%eax
Code; c015a700 <iput+30/320>
5: 74 0e je 15 <_EIP+0x15>
Code; c015a702 <iput+32/320>
7: 89 c7 mov %eax,%edi
Code; c015a704 <iput+34/320>
9: 8b 40 18 mov 0x18(%eax),%eax
Code; c015a707 <iput+37/320>
c: 85 c0 test %eax,%eax
Code; c015a709 <iput+39/320>
e: 0f 85 c6 02 00 00 jne 2da <_EIP+0x2da>



The complete log can be found at http://tassadar.physics.auth.gr/~dzila/oops1


oops with non-smp 2.4.31 and openafs 1.3.82 follows :

ksymoops 2.4.11 on i686 2.4.31. Options used
-V (default)
-k /proc/ksyms (default)
-l /proc/modules (default)
-o /lib/modules/2.4.31/ (default)
-m /usr/src/linux/System.map (default)

Warning: You did not tell me where to find symbol information. I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc. ksymoops -h explains the options.

Jun 3 04:56:55 system kernel: kernel BUG at inode.c:1204!
Jun 3 04:56:55 system kernel: invalid operand: 0000
Jun 3 04:56:55 system kernel: CPU: 0
Jun 3 04:56:55 system kernel: EIP: 0010:[<c014de40>] Tainted: P Using defaults from ksymoops -t elf32-i386 -a i386
Jun 3 04:56:55 system kernel: EFLAGS: 00010246
Jun 3 04:56:55 system kernel: eax: 00000000 ebx: e28b7c20 ecx: e28b7c30 edx: e28b7c30
Jun 3 04:56:55 system kernel: esi: ef86fc00 edi: 00000000 ebp: 00019072 esp: effe3f04
Jun 3 04:56:55 system kernel: ds: 0018 es: 0018 ss: 0018
Jun 3 04:56:55 system kernel: Process kswapd (pid: 4, stackpage=effe3000)
Jun 3 04:56:55 system kernel: Stack: e2190a20 00000010 00000282 e88a0378 e88a0360 e28b7c20 c014b5e3 e28b7c20 Jun 3 04:56:55 system kernel: d67bc240 00000002 c1762b7c c0374638 000003ea c014b934 0002293f c0130796 Jun 3 04:56:55 system kernel: 00000006 000001d0 ffffffff 000001d0 00000002 00000020 000001d0 c0374638 Jun 3 04:56:55 system kernel: Call Trace: [<c014b5e3>] [<c014b934>] [<c0130796>] [<c0130b5d>] [<c0130be2>]
Jun 3 04:56:55 system kernel: [<c0130da6>] [<c0130e18>] [<c0130f58>] [<c0105000>] [<c01072be>] [<c0130ec0>]
Jun 3 04:56:55 system kernel: Code: 0f 0b b4 04 66 12 32 c0 e9 93 fd ff ff c7 04 24 31 00 00 00


EIP; c014de40 <iput+290/2b0> <=====

ebx; e28b7c20 <_end+224930d4/3046f514>
ecx; e28b7c30 <_end+224930e4/3046f514>
edx; e28b7c30 <_end+224930e4/3046f514>
esi; ef86fc00 <_end+2f44b0b4/3046f514>
esp; effe3f04 <_end+2fbbf3b8/3046f514>

Trace; c014b5e3 <prune_dcache+e3/150>
Trace; c014b934 <shrink_dcache_memory+24/40>
Trace; c0130796 <shrink_cache+166/390>
Trace; c0130b5d <shrink_caches+3d/60>
Trace; c0130be2 <try_to_free_pages_zone+62/100>
Trace; c0130da6 <kswapd_balance_pgdat+66/b0>
Trace; c0130e18 <kswapd_balance+28/40>
Trace; c0130f58 <kswapd+98/c0>
Trace; c0105000 <_stext+0/0>
Trace; c01072be <arch_kernel_thread+2e/40>
Trace; c0130ec0 <kswapd+0/c0>

Code; c014de40 <iput+290/2b0>
00000000 <_EIP>:
Code; c014de40 <iput+290/2b0> <=====
0: 0f 0b ud2a <=====
Code; c014de42 <iput+292/2b0>
2: b4 04 mov $0x4,%ah
Code; c014de44 <iput+294/2b0>
4: 66 data16
Code; c014de45 <iput+295/2b0>
5: 12 32 adc (%edx),%dh
Code; c014de47 <iput+297/2b0>
7: c0 e9 93 shr $0x93,%cl
Code; c014de4a <iput+29a/2b0>
a: fd std Code; c014de4b <iput+29b/2b0>
b: ff (bad) Code; c014de4c <iput+29c/2b0>
c: ff c7 inc %edi
Code; c014de4e <iput+29e/2b0>
e: 04 24 add $0x24,%al
Code; c014de50 <iput+2a0/2b0>
10: 31 00 xor %eax,(%eax)

Jun 3 04:57:01 system kernel: kernel BUG at inode.c:1204!
Jun 3 04:57:01 system kernel: invalid operand: 0000
Jun 3 04:57:01 system kernel: CPU: 0
Jun 3 04:57:01 system kernel: EIP: 0010:[<c014de40>] Tainted: P Jun 3 04:57:01 system kernel: EFLAGS: 00010246
Jun 3 04:57:01 system kernel: eax: 00000000 ebx: e28b7e00 ecx: e28b7e10 edx: e28b7e10
Jun 3 04:57:01 system kernel: esi: ef86fc00 edi: 00000000 ebp: 00000005 esp: e9a73cd8
Jun 3 04:57:01 system kernel: ds: 0018 es: 0018 ss: 0018
Jun 3 04:57:01 system kernel: Process lftp (pid: 3126, stackpage=e9a73000)
Jun 3 04:57:01 system kernel: Stack: f095a5b0 e9a73d2c 0015249f e88a03f8 e88a03e0 e28b7e00 c014b5e3 e28b7e00 Jun 3 04:57:01 system kernel: 00000005 c7e37ca0 c7e37cb0 c7e37ca0 f0a85abc c014b904 00000005 c7e37ca0 Jun 3 04:57:01 system kernel: c014b3d9 c7e37ca0 f0a8599c f0a859ac f0972349 c7e37ca0 00000001 c013911f Jun 3 04:57:01 system kernel: Call Trace: [<f095a5b0>] [<c014b5e3>] [<c014b904>] [<c014b3d9>] [<f0972349>]
Jun 3 04:57:01 system kernel: [<c013911f>] [<f0975dc7>] [<c0137ded>] [<f0974044>] [<f0964788>] [<f096484c>]
Jun 3 04:57:01 system kernel: [<f096459d>] [<f097f296>] [<f0976c99>] [<c0205c3e>] [<f09c9fe0>] [<f09abaf5>]
Jun 3 04:57:01 system kernel: [<c014b971>] [<c014230d>] [<c014278b>] [<c0142fa9>] [<c0143259>] [<c013f81f>]
Jun 3 04:57:01 system kernel: [<c010a56b>] [<c0108bab>]
Jun 3 04:57:01 system kernel: Code: 0f 0b b4 04 66 12 32 c0 e9 93 fd ff ff c7 04 24 31 00 00 00


EIP; c014de40 <iput+290/2b0> <=====

ebx; e28b7e00 <_end+224932b4/3046f514>
ecx; e28b7e10 <_end+224932c4/3046f514>
edx; e28b7e10 <_end+224932c4/3046f514>
esi; ef86fc00 <_end+2f44b0b4/3046f514>
esp; e9a73cd8 <_end+2964f18c/3046f514>

Trace; f095a5b0 <END_OF_CODE+babd5/????>
Trace; c014b5e3 <prune_dcache+e3/150>
Trace; c014b904 <shrink_dcache_parent+24/30>
Trace; c014b3d9 <d_invalidate+69/70>
Trace; f0972349 <END_OF_CODE+d296e/????>
Trace; c013911f <fput+cf/120>
Trace; f0975dc7 <END_OF_CODE+d63ec/????>
Trace; c0137ded <filp_close+4d/80>
Trace; f0974044 <END_OF_CODE+d4669/????>
Trace; f0964788 <END_OF_CODE+c4dad/????>
Trace; f096484c <END_OF_CODE+c4e71/????>
Trace; f096459d <END_OF_CODE+c4bc2/????>
Trace; f097f296 <END_OF_CODE+df8bb/????>
Trace; f0976c99 <END_OF_CODE+d72be/????>
Trace; c0205c3e <e1000_alloc_rx_buffers+6e/370>
Trace; f09c9fe0 <END_OF_CODE+12a605/????>
Trace; f09abaf5 <END_OF_CODE+10c11a/????>
Trace; c014b971 <d_alloc+21/180>
Trace; c014230d <real_lookup+cd/f0>
Trace; c014278b <link_path_walk+32b/980>
Trace; c0142fa9 <path_lookup+39/40>
Trace; c0143259 <__user_walk+49/60>
Trace; c013f81f <sys_stat64+1f/90>
Trace; c010a56b <do_IRQ+9b/a0>
Trace; c0108bab <system_call+33/38>

Code; c014de40 <iput+290/2b0>
00000000 <_EIP>:
Code; c014de40 <iput+290/2b0> <=====
0: 0f 0b ud2a <=====
Code; c014de42 <iput+292/2b0>
2: b4 04 mov $0x4,%ah
Code; c014de44 <iput+294/2b0>
4: 66 data16
Code; c014de45 <iput+295/2b0>
5: 12 32 adc (%edx),%dh
Code; c014de47 <iput+297/2b0>
7: c0 e9 93 shr $0x93,%cl
Code; c014de4a <iput+29a/2b0>
a: fd std Code; c014de4b <iput+29b/2b0>
b: ff (bad) Code; c014de4c <iput+29c/2b0>
c: ff c7 inc %edi
Code; c014de4e <iput+29e/2b0>
e: 04 24 add $0x24,%al
Code; c014de50 <iput+2a0/2b0>
10: 31 00 xor %eax,(%eax)


The complete log of that can be found at http://tassadar.physics.auth.gr/~dzila/oops2

Any hints are greatly appreciated.

Best regards ,
--
=============================================================================

Dimitris Zilaskos

Department of Physics @ Aristotle University of Thessaloniki , Greece
PGP key : http://tassadar.physics.auth.gr/~dzila/pgp_public_key.asc
http://egnatia.ee.auth.gr/~dzila/pgp_public_key.asc
MD5sum : de2bd8f73d545f0e4caf3096894ad83f pgp_public_key.asc
=============================================================================

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/