Re: [XFS on bad superblock] BUG: unable to handle kernel NULLpointer dereference at 00000003

From: Fengguang Wu
Date: Wed Oct 09 2013 - 23:33:11 EST


On Thu, Oct 10, 2013 at 11:26:37AM +0800, Fengguang Wu wrote:
> Dave,
>
> > I note that you have CONFIG_SLUB=y, which means that the cache slabs
> > are shared with objects of other types. That means that the memory
> > corruption problem is likely to be caused by one of the other
> > filesystems that is probing the block device(s), not XFS.
>
> Good to know that, it would easy to test then: just turn off every
> other filesystems. I'll try it right away.

Seems that we don't even need to do that. A dig through the oops
database and I find stack dumps from other FS.

This happens in the kernel with same kconfig and commit 3.12-rc1.

[ 51.205369] block nbd1: Attempted send on closed socket
[ 51.214126] BUG: unable to handle kernel NULL pointer dereference at 00000004
[ 51.215640] IP: [<c10343fb>] pool_mayday_timeout+0x5f/0x9c
[ 51.216262] *pdpt = 000000000ca90001 *pde = 0000000000000000
[ 51.216262] Oops: 0000 [#1]
[ 51.216262] CPU: 0 PID: 644 Comm: mount Not tainted 3.12.0-rc1 #2
[ 51.216262] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 51.216262] task: ccffd7a0 ti: cca54000 task.ti: cca54000
[ 51.216262] EIP: 0060:[<c10343fb>] EFLAGS: 00000046 CPU: 0
[ 51.216262] EIP is at pool_mayday_timeout+0x5f/0x9c
[ 51.216262] EAX: 00000000 EBX: c1a81d50 ECX: 00000000 EDX: 00000000
[ 51.216262] ESI: cd0d303c EDI: cfff7054 EBP: cca55d2c ESP: cca55d18
[ 51.216262] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
[ 51.216262] CR0: 8005003b CR2: 00000004 CR3: 0ca0b000 CR4: 000006b0
[ 51.216262] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[ 51.216262] DR6: 00000000 DR7: 00000000
[ 51.216262] Stack:
[ 51.216262] c1a81d60 cd0d303c 00000100 c103439c cca55d58 cca55d3c c102cd96 c1ba4700
[ 51.216262] cca55d58 cca55d6c c102cf7e c1a81d50 c1ba5110 c1ba4f10 cca55d58 c103439c
[ 51.216262] cca55d58 cca55d58 00000001 c1ba4588 00000100 cca55d90 c1028f61 00000001
[ 51.216262] Call Trace:
[ 51.216262] [<c103439c>] ? need_to_create_worker+0x32/0x32
[ 51.216262] [<c102cd96>] call_timer_fn.isra.39+0x16/0x60
[ 51.216262] [<c102cf7e>] run_timer_softirq+0x144/0x15e
[ 51.216262] [<c103439c>] ? need_to_create_worker+0x32/0x32
[ 51.216262] [<c1028f61>] __do_softirq+0x87/0x12b
[ 51.216262] [<c10290c4>] irq_exit+0x3a/0x48
[ 51.216262] [<c1002918>] do_IRQ+0x64/0x77
[ 51.216262] [<c175fbac>] common_interrupt+0x2c/0x31
[ 51.216262] [<c12188ee>] ? ocfs2_get_sector+0x14/0x1cd
[ 51.216262] [<c1218b72>] ocfs2_sb_probe+0xcb/0x7ca
[ 51.216262] [<c107bb1c>] ? bdi_lock_two+0x8/0x14
[ 51.216262] [<c12cfc11>] ? string.isra.4+0x26/0x89
[ 51.216262] [<c121a7ba>] ocfs2_fill_super+0x39/0xe84
[ 51.216262] [<c12d1000>] ? pointer.isra.15+0x23f/0x25b
[ 51.216262] [<c12c3660>] ? disk_name+0x20/0x65
[ 51.216262] [<c109d8f6>] mount_bdev+0x105/0x14d
[ 51.216262] [<c1092aaa>] ? slab_pre_alloc_hook.isra.66+0x1e/0x25
[ 51.216262] [<c1095353>] ? __kmalloc_track_caller+0xb8/0xe4
[ 51.216262] [<c10ae5da>] ? alloc_vfsmnt+0xdc/0xff
[ 51.216262] [<c1217173>] ocfs2_mount+0x10/0x12
[ 51.216262] [<c121a781>] ? ocfs2_handle_error+0xa2/0xa2
[ 51.216262] [<c109dad1>] mount_fs+0x55/0x123
[ 51.216262] [<c10aef24>] vfs_kern_mount+0x44/0xac
[ 51.216262] [<c10b030a>] do_mount+0x647/0x768
[ 51.216262] [<c107b043>] ? strndup_user+0x2c/0x3d
[ 51.216262] [<c10b049c>] SyS_mount+0x71/0xa0
[ 51.216262] [<c175f074>] syscall_call+0x7/0xb
[ 51.216262] Code: 43 44 e8 7a 8c ff ff 58 5a 5b 5e 5f 5d c3 8b 43 10 8d 78 fc 8d 43 10 89 45 ec 8d 47 04 3b 45 ec 74 ca 89 f8 e8 44 f0 ff ff 89 c1 <8b> 50 04 83 7a 44 00 74 2c 8b 40 68 8d 71 68 39 f0 75 22 8b 72
[ 51.216262] EIP: [<c10343fb>] pool_mayday_timeout+0x5f/0x9c SS:ESP 0068:cca55d18
[ 51.216262] CR2: 0000000000000004
[ 51.216262] ---[ end trace 267272283b2d7610 ]---
[ 51.216262] Kernel panic - not syncing: Fatal exception in interrupt

[ 3.244964] block nbd1: Attempted send on closed socket
[ 3.246243] block nbd1: Attempted send on closed socket
[ 3.247508] (mount,661,0):ocfs2_get_sector:1861 ERROR: status = -5
[ 3.248906] (mount,661,0):ocfs2_sb_probe:770 ERROR: status = -5
[ 3.250269] (mount,661,0):ocfs2_fill_super:1038 ERROR: superblock probe failed!
[ 3.252100] (mount,661,0):ocfs2_fill_super:1229 ERROR: status = -5
[ 3.253569] BUG: unable to handle kernel NULL pointer dereference at 00000004
[ 3.255322] IP: [<c1034850>] process_one_work+0x1a/0x1cc
[ 3.256681] *pdpt = 000000000c950001 *pde = 0000000000000000
[ 3.256833] Oops: 0000 [#1]
[ 3.256833] CPU: 0 PID: 5 Comm: kworker/0:0H Not tainted 3.12.0-rc1 #2
[ 3.256833] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 3.256833] task: cec44d80 ti: cec54000 task.ti: cec54000
[ 3.256833] EIP: 0060:[<c1034850>] EFLAGS: 00010046 CPU: 0
[ 3.256833] EIP is at process_one_work+0x1a/0x1cc
[ 3.256833] EAX: 00000000 EBX: cec1b900 ECX: ccdf0700 EDX: ccdf0700
[ 3.256833] ESI: ccdf0754 EDI: c1a81d50 EBP: cec55f44 ESP: cec55f2c
[ 3.256833] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
[ 3.256833] CR0: 8005003b CR2: 0000005c CR3: 0cfc5000 CR4: 000006b0
[ 3.256833] Stack:
[ 3.256833] c1a81d50 00000000 c10345b0 cec1b900 cec1b918 cec1b918 cec55f54 c1034a1d
[ 3.256833] cec1b900 c1a81d50 cec55f70 c1034d3b cec44d80 c1a81d60 cec47eac cec1b900
[ 3.256833] c1034c02 cec55fac c10388f7 cec55f94 00000000 00000000 cec1b900 00000000
[ 3.256833] Call Trace:
[ 3.256833] [<c10345b0>] ? manage_workers.isra.33+0x178/0x182
[ 3.256833] [<c1034a1d>] process_scheduled_works+0x1b/0x21
[ 3.256833] [<c1034d3b>] worker_thread+0x139/0x1bd
[ 3.256833] [<c1034c02>] ? rescuer_thread+0x1df/0x1df
[ 3.256833] [<c10388f7>] kthread+0x6d/0x72
[ 3.256833] [<c175f637>] ret_from_kernel_thread+0x1b/0x28
[ 3.256833] [<c103888a>] ? init_completion+0x1d/0x1d
[ 3.256833] Code: 83 f8 10 74 04 f3 90 b2 f5 89 d0 59 5b 5e 5f 5d c3 55 89 e5 57 56 53 83 ec 0c 89 c3 89 d6 89 d0 e8 f3 eb ff ff 89 45 ec 8b 7b 24 <8b> 40 04 8b 80 80 00 00 00 c1 e8 05 83 e0 01 88 45 e8 f6 43 2c
[ 3.256833] EIP: [<c1034850>] process_one_work+0x1a/0x1cc SS:ESP 0068:cec55f2c
[ 3.256833] CR2: 0000000000000004
[ 3.256833] ---[ end trace a45beaff7f786118 ]---
[ 3.256833] BUG: sleeping function called from invalid context at kernel/rwsem.c:20
[ 3.256833] in_atomic(): 1, irqs_disabled(): 1, pid: 5, name: kworker/0:0H

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/