Re: 2.4.24 Paging Fault, Source Line located in slab.c, kmem_cache_reap()

From: Ross Dickson
Date: Mon Feb 16 2004 - 08:50:46 EST


Tracked down Oops to source line in kmem_cache_reap.
...............................................................
This Line is number 1784 in my file.........................
full_free = 0;
p = searchp->slabs_free.next;
while (p != &searchp->slabs_free) {
#if DEBUG
slabp = list_entry(p, slab_t, list);

if (slabp->inuse)
BUG();
#endif
full_free++;
p = p->next; <======Oops Here
}
............................
objdump for above code section:
Oops identified offending instruction at kmem_cache_reap+80 which is af0
............................
add: 8b 4c 24 08 mov 0x8(%esp,1),%ecx
ae1: 31 db xor %ebx,%ebx
ae3: 8b 41 10 mov 0x10(%ecx),%eax
ae6: 89 ca mov %ecx,%edx
ae8: 83 c2 10 add $0x10,%edx
aeb: 39 d0 cmp %edx,%eax
aed: 74 08 je af7 <kmem_cache_reap+0x87>
aef: 90 nop
af0: 8b 00 mov (%eax),%eax {<=====Oops}
af2: 43 inc %ebx
af3: 39 d0 cmp %edx,%eax
af5: 75 f9 jne af0 <kmem_cache_reap+0x80>
af7: 8b 44 24 08 mov 0x8(%esp,1),%eax
...............................
I do not know how this part of the kernel works.
Have we walked off the end of a list or something?
Can anyone help with a theory or still better a fix?

Hopefully thanks in advance,
Ross.


On Monday 16 February 2004 20:25, Ross Dickson wrote:
> On Sunday 15 February 2004 23:49, Ross Dickson wrote:
> > On Sunday 15 February 2004 06:33, you wrote:
> > > On Saturday 14 February 2004 19:06, Ross Dickson wrote:
> > > > I have an imaging system writing files to removable hard drives.
> > > > Compact Flash boot with ram drives so I usually have no swap partition or
> > > > file.
> > > >
> > > > Recently I upgraded kernel from 2.4.20 to 2.4.24.
> Is KM18G Pro (nforce2 dual memory mode), AMD 2400XP, Preempt, Low latency,
> 64Bit jiffies 1000Hz patched.
>
> I found some articles about memory overcommitment, checked the source and saw
> strict in use for arm systems - no swap- so this time I thought I would try
>
> echo 1 > /proc/sys/vm/overcommit_memory
>
> I got another oops under equivalent circumstances to earlier (no swap).
> I ran oops through ksymoops on another machine with same kernel , results below.
> At this point I think the trigger may be a slow (bad blocks?) 80Gb hard drive the files
> are being written to. The PCI bus is quite busy with imaging from 3 cameras on two
> capture cards (bttv and meteor II mc).
>
> > > > Can the paging cache be tuned in /proc or somewhere to prevent it being so
> > > > greedy as to want more memory than the machine has?
> > >
> > > Maybe. But you should concentrate on finding where exactly it oopsed.
> ...snip...
> > ......I added a 16mb ram drive swap (see earlier posting)
> > I note memfree stabilises at around 4Mb when running OK, given it only wanted
> > an extra 1Mb cache swap, can I cat something to /proc/sys/vm/????? to force
> > it to stabilise at around 10Mb or 20Mb? Otherwise can I change a constant
> > and recompile kernel to achieve same? It might help give more headroom when
> > the event occurs.
> >
> > >
> > > > Is the quickest fix to give it more ram. I read on another posting that
> > > > with greater than 512Mb the cache won't grab any more?
> > >
> > > Please don't succumb to 'add more RAM' syndrome. 460 megs should be fine
> > > for you. I'd say better find the root of the problem.
> ...snip...
> > > vda
> > >
>
> Thanks for the response
> Regards
> Ross.
>
> Was run "mem=450" with this Oops
>
> ksymoops 2.4.8 on i686 2.4.24-rd. Options used
> -V (default)
> -k /proc/ksyms (default)
> -L (specified)
> -o /lib/modules/2.4.24-rd/ (default)
> -m /boot/System.map (specified)
>
> Unable to handle kernel paging request at virtual address 6a65656a
> c0133f20
> *pde = 00000000
> Oops: 0000
> CPU: 0
> EIP: 0010:[<c0133f20>] Not tainted
> Using defaults from ksymoops -t elf32-i386 -a i386
> EFLAGS: 00010883
> eax: 6a65656a ebx: 000000f5 ecx: c14dfdd4 edx: c14dfde4
> esi: 00000000 edi: 00000008 ebp: c14dfe40 esp: dc16df38
> ds: 0018 es: 0018 ss: 0018
> Process kswapd (pid: 4, stackpage=dc16d000)
> Stack: 000001d0 00000001 c14dfdd4 00000000 00000005 00000005 00000020 000001d0
> c032c4d8 c032c4d8 c0134ebc dc16df84 000001d0 0000003c 00000020 c0134f58
> dc16df84 dc16c000 00000000 00000000 c032c4d8 dc16c000 c032c400 00000000
> Call Trace: [<c0134ebc>] [<c0134f58>] [<c01350f6>] [<c0135169>] [<c013529d>]
> [<c0135210>] [<c0105000>] [<c01057db>] [<c0135210>]
> Code: 8b 00 43 39 d0 75 f9 8b 44 24 08 89 da 8b 70 24 8b 40 44 89
>
>
> >>EIP; c0133f20 <kmem_cache_reap+80/1f0> <=====
>
> >>ecx; c14dfdd4 <_end+1115228/1e4db4b4>
> >>edx; c14dfde4 <_end+1115238/1e4db4b4>
> >>ebp; c14dfe40 <_end+1115294/1e4db4b4>
> >>esp; dc16df38 <_end+1bda338c/1e4db4b4>
>
> Trace; c0134ebc <shrink_caches+1c/60>
> Trace; c0134f58 <try_to_free_pages_zone+58/e0>
> Trace; c01350f6 <kswapd_balance_pgdat+56/b0>
> Trace; c0135169 <kswapd_balance+19/30>
> Trace; c013529d <kswapd+8d/b0>
> Trace; c0135210 <kswapd+0/b0>
> Trace; c0105000 <_stext+0/0>
> Trace; c01057db <arch_kernel_thread+2b/40>
> Trace; c0135210 <kswapd+0/b0>
>
> Code; c0133f20 <kmem_cache_reap+80/1f0>
> 00000000 <_EIP>:
> Code; c0133f20 <kmem_cache_reap+80/1f0> <=====
> 0: 8b 00 mov (%eax),%eax <=====
> Code; c0133f22 <kmem_cache_reap+82/1f0>
> 2: 43 inc %ebx
> Code; c0133f23 <kmem_cache_reap+83/1f0>
> 3: 39 d0 cmp %edx,%eax
> Code; c0133f25 <kmem_cache_reap+85/1f0>
> 5: 75 f9 jne 0 <_EIP>
> Code; c0133f27 <kmem_cache_reap+87/1f0>
> 7: 8b 44 24 08 mov 0x8(%esp,1),%eax
> Code; c0133f2b <kmem_cache_reap+8b/1f0>
> b: 89 da mov %ebx,%edx
> Code; c0133f2d <kmem_cache_reap+8d/1f0>
> d: 8b 70 24 mov 0x24(%eax),%esi
> Code; c0133f30 <kmem_cache_reap+90/1f0>
> 10: 8b 40 44 mov 0x44(%eax),%eax
> Code; c0133f33 <kmem_cache_reap+93/1f0>
> 13: 89 00 mov %eax,(%eax)
>
> Mem starts like this when programs have been started.
> Was run "mem=460" for these mem readings.
> total: used: free: shared: buffers: cached:
> Mem: 473899008 22151168 451747840 0 327680 7737344
> Swap: 0 0 0
> MemTotal: 462792 kB
> MemFree: 441160 kB
> MemShared: 0 kB
> Buffers: 320 kB
> Cached: 7556 kB
> SwapCached: 0 kB
> Active: 2320 kB
> Inactive: 7072 kB
> HighTotal: 0 kB
> HighFree: 0 kB
> LowTotal: 462792 kB
> LowFree: 441160 kB
> SwapTotal: 0 kB
> SwapFree: 0 kB
>
> And is like this near to Oops time
> total: used: free: shared: buffers: cached:
> Mem: 473899008 469348352 4550656 0 831488 420454400
> Swap: 0 0 0
> MemTotal: 462792 kB
> MemFree: 4444 kB
> MemShared: 0 kB
> Buffers: 812 kB
> Cached: 410600 kB
> SwapCached: 0 kB
> Active: 5064 kB
> Inactive: 410968 kB
> HighTotal: 0 kB
> HighFree: 0 kB
> LowTotal: 462792 kB
> LowFree: 4444 kB
> SwapTotal: 0 kB
> SwapFree: 0 kB
>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/