ext3/jbd oops in journal_start

From: Sage Weil
Date: Sat Oct 31 2009 - 02:15:05 EST


Hi,

I'm consistently seeing ext3 oops on a fresh ~60 GB fs on 2.6.32-rc3 (and
2.6.31). data=writeback or data=ordered. It's not the hardware or
drive... I have 8 boxes (each with slightly different hardware) that crash
identically.

The oops is at fs/jbd/transaction.c, journal_start():

J_ASSERT(handle->h_transaction->t_journal == journal);

because handle->h_transaction is 0x1bf (or some other value close to
that). I can trigger on the 10th or so call to journal_start after
mounting.

Has anyone seen this before? I feel like I must be doing something silly
here, since I can't find any references to this particular crash, but I'm
having no problem triggering it right away, even after a fresh mke2fs
-j...

Any suggestions on where to look or should I just start testing older
kernel versions and bisect?

sage


[ 83.550657] handle->h_transaction 00000000000001bf
[ 83.555564] BUG: unable to handle kernel NULL pointer dereference at 00000000000001bf
[ 83.559531] IP: [<ffffffff8118793c>] journal_start+0x87/0x184
[ 83.559531] PGD 10e351067 PUD 10e1cb067 PMD 0
[ 83.559531] Oops: 0000 [#1] PREEMPT SMP
[ 83.559531] last sysfs file: /sys/class/net/lo/operstate
[ 83.559531] CPU 1
[ 83.559531] Modules linked in: btrfs zlib_deflate fan ac battery
ide_pci_generic shpchp k8temp serio_raw psmouse pcspkr ehci_hcd
serverworks processor ohci_hcd pci_hotplug thermal button
[ 83.559531] Pid: 2849, comm: cosd Not tainted 2.6.32-rc5 #7 H8SSL-I2
[ 83.559531] RIP: 0010:[<ffffffff8118793c>] [<ffffffff8118793c>] journal_start+0x87/0x184
[ 83.559531] RSP: 0018:ffff88010e335b28 EFLAGS: 00010292
[ 83.559531] RAX: 00000000000001bf RBX: ffff88010eeee4e0 RCX: 000000000000ad01
[ 83.559531] RDX: ffff88002f400000 RSI: 0000000000000001 RDI: ffffffff81610214
[ 83.559531] RBP: ffff88010e335b58 R08: ffff88010e3359d7 R09: 0000000000000000
[ 83.559531] R10: ffffffff8106314b R11: ffff88010e335908 R12: ffff88010eeee4e0
[ 83.559531] R13: ffff88010e17a200 R14: ffff88010f535800 R15: 000000000000000b
[ 83.559531] FS: 00007fe3bce8b6f0(0000) GS:ffff88002f400000(0000) knlGS:0000000000000000
[ 83.559531] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 83.559531] CR2: 00000000000001bf CR3: 0000000110223000 CR4: 00000000000006e0
[ 83.559531] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 83.559531] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 83.559531] Process cosd (pid: 2849, threadinfo ffff88010e334000, task ffff88010e17a200)
[ 83.559531] Stack:
[ 83.559531] ffff88010e335b58 ffffffff814cbb10 ffffea0006cf6038 ffff88010eeea888
[ 83.559531] <0> 0000000000000000 00000000000005f4 ffff88010e335b68 ffffffff811443b3
[ 83.559531] <0> ffff88010e335c08 ffffffff8113c347 ffff88010e335ca8 ffffffff81070369
[ 83.559531] Call Trace:
[ 83.559531] [<ffffffff811443b3>] ext3_journal_start_sb+0x4a/0x4c
[ 83.559531] [<ffffffff8113c347>] ext3_write_begin+0x9c/0x1e2
[ 83.559531] [<ffffffff81070369>] ? __lock_acquire+0x17d8/0x17ea
[ 83.559531] [<ffffffff810a5021>] generic_file_buffered_write+0x120/0x2a5
[ 83.559531] [<ffffffff810a564d>] __generic_file_aio_write+0x34f/0x383
[ 83.559531] [<ffffffff810a56e4>] generic_file_aio_write+0x63/0xaa
[ 83.559531] [<ffffffff810d98b2>] do_sync_write+0xe7/0x12d
[ 83.559531] [<ffffffff8105f368>] ? autoremove_wake_function+0x0/0x38
[ 83.559531] [<ffffffff8106a7fc>] ? put_lock_stats+0xe/0x27
[ 83.559531] [<ffffffff8125752c>] ? security_file_permission+0x11/0x13
[ 83.559531] [<ffffffff810da240>] vfs_write+0xae/0x14a
[ 83.559531] [<ffffffff810da3a0>] sys_write+0x47/0x6e
[ 83.559531] [<ffffffff8100baab>] system_call_fastpath+0x16/0x1b
[ 83.559531] Code: 89 de 48 c7 c7 e9 01 61 81 31 c0 e8 71 f6 31 00 48 8b
33 48 c7 c7 f7 01 61 81 31 c0 e8 60 f6 31 00 48 8b 03 48 c7 c7 14 02 61 81
<48> 8b 30 31 c0 e8 4c f6 31 00 48 8b 03 48 8b 30 4c 39 f6 74 11
[ 83.559531] RIP [<ffffffff8118793c>] journal_start+0x87/0x184
[ 83.559531] RSP <ffff88010e335b28>
[ 83.559531] CR2: 00000000000001bf
[ 83.847504] ---[ end trace 450f151cbabc2177 ]---

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/