daily buffer list corruption in 2.2.15

From: Michael Stiller (ms@neopoly.de)
Date: Tue May 16 2000 - 02:49:04 EST


Hi,

we use 2.2.15 at our company's samba/nfs server. This is a PIII machine
(UP) using 128Mb Ram
and about 17 Gb disk. Every morning i get (mostly non-fatal) Oops
messages which look
like a buffer list corruption to me. The location in the code is
fs/buffer.c:find_buffer
Something seems to corrupt the buffer list, the value of

next = tmp->b_next;

is often bogus. I manually checked the assembler code of the function
find_buffer, the
generated code is correct. So i don't think this is a compiler related
problem, at least not
in find_buffer. Maybe there should be a sanity check besides

if(!next) ...

I attach the decoded oops messages i found this morning. Notice that
kswapd too was affected this
night (in fs/buffer.c:remove_from_queues).

Any clues how to debug this ?

-Michael

Unable to handle kernel NULL pointer dereference at virtual address
00000834
current->tss.cr3 = 00101000, %cr3 = 00101000
*pde = 00000000
Oops: 0002
CPU: 0
EIP: 0010:[<c0125201>]
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010206
eax: 00000800 ebx: c6db75c0 ecx: c6db75c0 edx: c62a8e00
esi: c6db76e0 edi: 00000000 ebp: c0310080 esp: c7fe3f98
ds: 0018 es: 0018 ss: 0018
Process kswapd (pid: 5, process nr: 5, stackpage=c7fe3000)
Stack: c6db76e0 c01265c1 c6db75c0 c0310080 000001ff 00000030 00001000
c011bc16
       c0310080 00000007 00000006 c0120942 00000006 00000030 c7fe2000
c01a262e
       c7fe21c1 c01209f7 00000030 00000f00 c7ffbfcc c0106000 c0107b43
00000000
Call Trace: [<c01265c1>] [<c011bc16>] [<c0120942>] [<c01a262e>]
[<c01209f7>] [<c0106000>] [<c0107b43>]
Code: 89 50 34 c7 01 00 00 00 00 89 02 c7 41 34 00 00 00 00 ff 0d

>>EIP; c0125201 <remove_from_queues+a9/148> <=====
Trace; c01265c1 <try_to_free_buffers+45/88>
Trace; c011bc16 <shrink_mmap+d6/12c>
Trace; c0120942 <do_try_to_free_pages+26/78>
Trace; c01a262e <tvecs+1bae/3340>
Trace; c01209f7 <kswapd+63/98>
Trace; c0106000 <get_options+0/74>
Trace; c0107b43 <kernel_thread+23/30>
Code; c0125201 <remove_from_queues+a9/148>
00000000 <_EIP>:
Code; c0125201 <remove_from_queues+a9/148> <=====
   0: 89 50 34 movl %edx,0x34(%eax) <=====
Code; c0125204 <remove_from_queues+ac/148>
   3: c7 01 00 00 00 00 movl $0x0,(%ecx)
Code; c012520a <remove_from_queues+b2/148>
   9: 89 02 movl %eax,(%edx)
Code; c012520c <remove_from_queues+b4/148>
   b: c7 41 34 00 00 00 00 movl $0x0,0x34(%ecx)
Code; c0125213 <remove_from_queues+bb/148>
  12: ff 0d 00 00 00 00 decl 0x0

Unable to handle kernel NULL pointer dereference at virtual address
00000800
current->tss.cr3 = 0783c000, %cr3 = 0783c000
*pde = 00000000
Oops: 0000
CPU: 0
EIP: 0010:[<c0125494>]
EFLAGS: 00010206
eax: 00000800 ebx: 00000005 ecx: 00010ae3 edx: 00000800
esi: 0000000b edi: 00000803 ebp: 00006b4f esp: c7855ce8
ds: 0018 es: 0018 ss: 0018
Process dump (pid: 11507, process nr: 76, stackpage=c7855000)
Stack: 00006b4f 00000803 00010ae3 c01254d3 00000803 00006b4f 00000400
c0125806
       00000803 00006b4f 00000400 0000001c 00000000 c534b780 c7855e9c
00000004
       c01288d1 00000803 00006b4f 00000400 c2ec0de0 ffffffea 00000000
00000400
Call Trace: [<c01254d3>] [<c0125806>] [<c01288d1>] [<c014555a>]
[<c01455ff>] [<c01534ab>] [<c01572fc>]
       [<c01454bd>] [<c011053b>] [<c01433fe>] [<c0163228>] [<c014350b>]
[<c0123cfc>] [<c0123eb8>] [<c0123fc6>]
       [<c0109000>]
Code: 8b 00 39 6a 04 75 15 8b 4c 24 20 39 4a 08 75 0c 66 39 7a 0c

>>EIP; c0125494 <find_buffer+68/90> <=====
Trace; c01254d3 <get_hash_table+17/24>
Trace; c0125806 <getblk+1e/144>
Trace; c01288d1 <block_read+2c1/4f4>
Trace; c014555a <kfree_skbmem+32/40>
Trace; c01455ff <__kfree_skb+97/a0>
Trace; c01534ab <tcp_clean_rtx_queue+103/12c>
Trace; c01572fc <tcp_reset_xmit_timer+7c/9c>
Trace; c01454bd <alloc_skb+71/dc>
Trace; c011053b <schedule+153/280>
Trace; c01433fe <sock_recvmsg+42/b4>
Trace; c0163228 <unix_stream_recvmsg+0/328>
Trace; c014350b <sock_read+8f/98>
Trace; c0123cfc <default_llseek+0/78>
Trace; c0123eb8 <sys_llseek+90/f0>
Trace; c0123fc6 <sys_read+ae/c4>
Trace; c0109000 <system_call+34/38>
Code; c0125494 <find_buffer+68/90>
00000000 <_EIP>:
Code; c0125494 <find_buffer+68/90> <=====
   0: 8b 00 movl (%eax),%eax <=====
Code; c0125496 <find_buffer+6a/90>
   2: 39 6a 04 cmpl %ebp,0x4(%edx)
Code; c0125499 <find_buffer+6d/90>
   5: 75 15 jne 1c <_EIP+0x1c> c01254b0
<find_buffer+84/90>
Code; c012549b <find_buffer+6f/90>
   7: 8b 4c 24 20 movl 0x20(%esp,1),%ecx
Code; c012549f <find_buffer+73/90>
   b: 39 4a 08 cmpl %ecx,0x8(%edx)
Code; c01254a2 <find_buffer+76/90>
   e: 75 0c jne 1c <_EIP+0x1c> c01254b0
<find_buffer+84/90>
Code; c01254a4 <find_buffer+78/90>
  10: 66 39 7a 0c cmpw %di,0xc(%edx)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Tue May 23 2000 - 21:00:10 EST