Re: segfault mdadm --write-behind, 2.6.14-mm2 (was: Re: RAID1 ramdisk patch)

From: Sander
Date: Thu Nov 17 2005 - 05:15:36 EST


Sander wrote (ao):
# Sander wrote (ao):
# # Neil Brown wrote (ao):
# # > On Wednesday November 16, akpm@xxxxxxxx wrote:
# # > > Sander <sander@xxxxxxxxxxx> wrote:
# # > > > With 2.6.14-mm2 (x86) and mdadm 2.1 I get a Segmentation fault when I
# # > > > try this:
# # > >
# # > > It oopsed in reiser4. reiserfs-dev added to Cc...
# # > >
# # >
# # > Hmm... It appears that md/bitmap is calling prepare_write and
# # > commit_write with 'file' as NULL - this works for some filesystems,
# # > but not for reiser4.
# # >
# # > Does this patch help.
# #
# # Something changed, but it didn't fix it it seems:
# #
# # # mdadm -C /dev/md1 --bitmap=/storage/raid1.bitmap -l1 -n2 /dev/loop0 --write-behind /dev/loop1
# # mdadm: RUN_ARRAY failed: No such file or directory
#
# FWIW, the following happens when I point --bitmap to /tmp/raid1.bitmap
# which is tmpfs, and also happens when I attach both loop0 and loop1 to
# files on tmpfs.
#
# This would suggest that reiser4 is not solely at fault?
#
# The difference btw is that I can reboot with 'shutdown -r now'
# instead of sysrq. And that mdadm hangs:
#
# # mdadm -C /dev/md1 --bitmap=/tmp/raid1.bitmap -l1 -n2 /dev/loop0 --write-behind /dev/loop1
# mdadm: RUN_ARRAY failed: No such file or directory
#
# # mdadm -C /dev/md1 -f --bitmap=/tmp/raid1.bitmap -l1 -n2 /dev/loop0 --write-behind /dev/loop1
# mdadm: /dev/loop0 appears to be part of a raid array:
# level=raid1 devices=2 ctime=Thu Nov 17 11:04:31 2005
# mdadm: /dev/loop1 appears to be part of a raid array:
# level=raid1 devices=2 ctime=Thu Nov 17 11:04:31 2005
# Continue creating array? yes
# [hang, no prompt, no reaction to ctrl-c, etc]

And even more info. It seems mdadm spins:

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
749 root 25 0 1696 568 492 R 99.9 0.1 8:32.50 mdadm

Would sysrq-t be useful?


# [42949549.780000] md: bind<loop0>
# [42949549.780000] md: bind<loop1>
# [42949549.780000] md: md1: raid array is not clean -- starting background reconstruction
# [42949549.790000] md1: bitmap file is out of date (0 < 1) -- forcing full recovery
# [42949549.790000] md1: bitmap file is out of date, doing full recovery
# [42949549.790000] md1: bitmap initialized from disk: read 0/4 pages, set 0 bits, status: 524288
# [42949549.790000] Bad page state at free_hot_cold_page (in process 'mdadm', page c10dcc20)
# [42949549.790000] flags:0x80000019 mapping:f5155c84 mapcount:0 count:0
# [42949549.790000] Backtrace:
# [42949549.790000] [<c013b320>] bad_page+0x70/0xb0
# [42949549.790000] [<c013bab1>] free_hot_cold_page+0x51/0xd0
# [42949549.790000] [<c02b0a90>] bitmap_file_put+0x30/0x70
# [42949549.790000] [<c02b1f8e>] bitmap_free+0x1e/0xb0
# [42949549.790000] [<c02b2126>] bitmap_create+0xd6/0x2a0
# [42949549.790000] [<c02ab95a>] do_md_run+0x2ba/0x500
# [42949549.790000] [<c02ac8a7>] add_new_disk+0x157/0x3b0
# [42949549.790000] [<c0179034>] mpage_writepages+0x124/0x3d0
# [42949549.790000] [<c013c23e>] __pagevec_free+0x3e/0x60
# [42949549.790000] [<c013eff9>] release_pages+0x29/0x160
# [42949549.790000] [<c02adb81>] md_ioctl+0x5a1/0x630
# [42949549.790000] [<c0137918>] find_get_pages+0x18/0x40
# [42949549.790000] [<c02ad5e0>] md_ioctl+0x0/0x630
# [42949549.790000] [<c01ede74>] blkdev_driver_ioctl+0x54/0x60
# [42949549.790000] [<c01edfb4>] blkdev_ioctl+0x134/0x180
# [42949549.790000] [<c015e158>] block_ioctl+0x18/0x20
# [42949549.790000] [<c015e140>] block_ioctl+0x0/0x20
# [42949549.790000] [<c01674ff>] do_ioctl+0x1f/0x70
# [42949549.790000] [<c016769c>] vfs_ioctl+0x5c/0x1e0
# [42949549.790000] [<c0156c91>] __fput+0xe1/0x140
# [42949549.790000] [<c016785d>] sys_ioctl+0x3d/0x70
# [42949549.790000] [<c0102f49>] syscall_call+0x7/0xb
# [42949549.790000] Trying to fix it up, but a reboot is needed
# [42949549.790000] md1: failed to create bitmap (524288)
# [42949549.790000] md: pers->run() failed ...
# [42949549.790000] md: md1 stopped.
# [42949549.790000] md: unbind<loop1>
# [42949549.790000] md: export_rdev(loop1)
# [42949549.790000] md: unbind<loop0>
# [42949549.790000] md: export_rdev(loop0)

--
Humilis IT Services and Solutions
http://www.humilis.net
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/