Re: oops with dual xeon 2.8ghz 4gb ram +smp, software raid, lvm,and xfs

From: David Greaves
Date: Wed Dec 08 2004 - 04:05:09 EST


Andrew Morton wrote:

David Greaves <david@xxxxxxxxxxxx> wrote:


...
I have a system that's running 2.6.10rc2
It has libata sata_promise + sata_sil drives in an md raid5 array that's used by lvm2 and then xfs; then exported via nfs.
I saw this thread, upgraded to 2.6.10rc2 and I'm posting this in case it's related (it's hard to tell)

This oops happened whilst the box was quiet

Hopefully relevant config bits:
Single processor
echo 16384 > /proc/sys/vm/min_free_kbytes
CONFIG_4KSTACKS=n
I've done a memtest.
I haven't applied the inode patch - I'm usually writing a single 1-3Gb files whilst reading another.

Can I help by providing anything else?

Nov 28 09:05:03 cu kernel: Unable to handle kernel paging request at virtual address 00100104



That's the list_del() poisoning pattern.


<snip old log>

It appears that the dentry cache's slab freelists have become corrupted. Odd, because everyone uses that code a lot. I'd suggest that you enable
CONFIG_DEBUG_SLAB, see if that catches anything.


Thanks for the reply Andrew.

I did as you suggested and it's been fine until I got this last night.

Dec 8 06:50:04 cu kernel: slab: Internal list corruption detected in cache 'vm_area_struct'(41), slabp cfedd000(13). Hexdump:
Dec 8 06:50:04 cu kernel:
Dec 8 06:50:04 cu kernel: 000: 00 01 10 00 00 02 20 00 6c 00 00 00 6c d0 ed cf
Dec 8 06:50:04 cu kernel: 010: 0d 00 00 00 11 00 14 08 1a 00 fe ff 0a 00 06 00
Dec 8 06:50:04 cu kernel: 020: fe ff fe ff 02 00 fe ff 22 00 21 00 18 00 27 00
Dec 8 06:50:04 cu kernel: 030: ff ff fe ff fe ff 03 00 00 00 19 00 03 00 fe ff
Dec 8 06:50:04 cu kernel: 040: fe ff 08 00 fe ff fe ff 1c 00 10 00 15 00 fe ff
Dec 8 06:50:04 cu kernel: 050: 25 00 12 00 fe ff
Dec 8 06:50:04 cu kernel: ------------[ cut here ]------------
Dec 8 06:50:04 cu kernel: kernel BUG at mm/slab.c:1947!
Dec 8 06:50:04 cu kernel: invalid operand: 0000 [#1]
Dec 8 06:50:04 cu kernel: Modules linked in: nfs af_packet ipv6 e100 mii usblp uhci_hcd usbcore nfsd exportfs lockd sunrpc sk98lin unix
Dec 8 06:50:04 cu kernel: CPU: 0
Dec 8 06:50:04 cu kernel: EIP: 0060:[check_slabp+180/240] Not tainted VLI
Dec 8 06:50:04 cu kernel: EFLAGS: 00010092 (2.6.10-rc2cu-041128-02)
Dec 8 06:50:04 cu kernel: EIP is at check_slabp+0xb4/0xf0
Dec 8 06:50:04 cu kernel: eax: 00000001 ebx: 00000056 ecx: 00000082 edx: 0000898d
Dec 8 06:50:04 cu kernel: esi: cfedd000 edi: dffe9960 ebp: cfedd018 esp: c1f3bca8
Dec 8 06:50:04 cu kernel: ds: 007b es: 007b ss: 0068
Dec 8 06:50:04 cu kernel: Process munin-node (pid: 6456, threadinfo=c1f3a000 task=c32dea00)
Dec 8 06:50:04 cu kernel: Stack: c0352d03 000000ff 00000029 cfedd000 0000000d cfedd000 0000001b cfedda8c
Dec 8 06:50:04 cu kernel: c013aa19 dffe9960 cfedd000 00000000 dffe996c dffe997c 0000000c 00000010
Dec 8 06:50:04 cu kernel: dffe9960 c094ba2c dffea728 c013ab2b dffe9960 dffe65e8 00000010 dffe65e8
Dec 8 06:50:04 cu kernel: Call Trace:
Dec 8 06:50:04 cu kernel: [free_block+153/336] free_block+0x99/0x150
Dec 8 06:50:04 cu kernel: [cache_flusharray+91/304] cache_flusharray+0x5b/0x130
Dec 8 06:50:04 cu kernel: [kmem_cache_free+122/128] kmem_cache_free+0x7a/0x80
Dec 8 06:50:04 cu kernel: [remove_vm_struct+94/128] remove_vm_struct+0x5e/0x80
Dec 8 06:50:04 cu kernel: [remove_vm_struct+94/128] remove_vm_struct+0x5e/0x80
Dec 8 06:50:04 cu kernel: [exit_mmap+284/320] exit_mmap+0x11c/0x140
Dec 8 06:50:04 cu kernel: [mmput+44/128] mmput+0x2c/0x80
Dec 8 06:50:04 cu kernel: [exec_mmap+121/240] exec_mmap+0x79/0xf0
Dec 8 06:50:04 cu kernel: [flush_old_exec+202/1616] flush_old_exec+0xca/0x650
Dec 8 06:50:04 cu kernel: [kernel_read+80/96] kernel_read+0x50/0x60
Dec 8 06:50:04 cu kernel: [load_elf_binary+827/3184] load_elf_binary+0x33b/0xc70
Dec 8 06:50:04 cu kernel: [get_empty_filp+70/208] get_empty_filp+0x46/0xd0
Dec 8 06:50:04 cu kernel: [autoremove_wake_function+0/96] autoremove_wake_function+0x0/0x60
Dec 8 06:50:04 cu kernel: [kernel_read+80/96] kernel_read+0x50/0x60
Dec 8 06:50:04 cu kernel: [search_binary_handler+93/432] search_binary_handler+0x5d/0x1b0
Dec 8 06:50:04 cu kernel: [load_script+520/576] load_script+0x208/0x240
Dec 8 06:50:04 cu kernel: [__alloc_pages+458/864] __alloc_pages+0x1ca/0x360
Dec 8 06:50:04 cu kernel: [copy_from_user+66/128] copy_from_user+0x42/0x80
Dec 8 06:50:04 cu kernel: [copy_strings+392/512] copy_strings+0x188/0x200
Dec 8 06:50:04 cu kernel: [search_binary_handler+93/432] search_binary_handler+0x5d/0x1b0
Dec 8 06:50:04 cu kernel: [do_execve+409/528] do_execve+0x199/0x210
Dec 8 06:50:04 cu kernel: [sys_execve+66/128] sys_execve+0x42/0x80
Dec 8 06:50:04 cu kernel: [syscall_call+7/11] syscall_call+0x7/0xb
Dec 8 06:50:04 cu kernel: Code: b6 04 33 43 c7 04 24 94 59 34 c0 89 44 24 04 e8 23 d7 fd ff 8b 47 3c 8d 44 00 04 39 c3 72 db c7 04 24 03 2d 35 c0 e8 0c d7 fd ff <0f> 0b 9b 07 1e 58 34 c0 83 c4 14 5b 5e 5f c3 89 5c 24 04 c7 04

Additional info:
when the machine started I got three:
swapper: page allocation failure. order:1, mode:0x20
before I could:
echo 16384 > /proc/sys/vm/min_free_kbytes

Anything else you'd like me to try?

David

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/