AMD64 pdflush Ooopses (2.6.8-rc1)

From: James Bromberger
Date: Thu Aug 05 2004 - 06:08:54 EST


Hello World,

I've got a few IBM xSeries 325 AMD Opteron machines, each with 8GB RAM, a pair of Qlogic SAN HBAs, and 275 GB LUNs from the SAN. When I do a large amount of I/O, I find I get Oopses, generally in pdflush. For example, I set up md multipath on /dev/md0 (from two devices /dev/sd[ce]1), and then set up LVM2 on top of this (pvcreate /dev/md0; vgcreate <vgname> /dev/md0; lvcreate -L 100G -n <lvname> <vgname>). When I then went to format this with ext3, I got:

OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
13107200 inodes, 26214400 blocks
1310720 blocks (5.00%) reserved for the super user
First data block=0
800 block groups
32768 blocks per group, 32768 fragments per group
16384 inodes per group
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424, 20480000, 23887872

Writing inode tables: 437/800
Message from syslogd@localhost at Thu Aug 5 11:27:59 2004 ...
localhost kernel: Oops: 0000 [1] SMP



Attached is the Ooops from the syslog in full.


I have had the same thing with different filesystems on top (eg, XFS), except it didn't panic during XFS creation, but let me get all the way to putting a database on there, and only panicked when doing backups, and again when testing with 'dd' and putting large file son there very quickly.


Happy to provide any more data on request, and or test stuff. I was about to run this through ksymoops, but the machine has now panicked, and its in a datacentre... it seems that any large I/O to any partition can cause and Ooops or a panic.




ug 5 11:27:59 localhost kernel: Unable to handle kernel paging request at 0000000000001770 RIP:
Aug 5 11:27:59 localhost kernel: <ffffffff8015ab54>{kmem_getpages+132}
Aug 5 11:27:59 localhost kernel: PML4 1c8669067 PGD 1c81b0067 PMD 0
Aug 5 11:27:59 localhost kernel: Oops: 0000 [1] SMP
Aug 5 11:27:59 localhost kernel: CPU 0
Aug 5 11:27:59 localhost kernel: Modules linked in: multipath md ipv6 qla2300 qla2xxx scsi_transport_fc ohci_hcd hw_random amd74xx evdev
tg3 rtc bonding dm_snapshot dm_mod ide_generic ide_cd ide_core cdrom isofs ext2 ext3 jbd mbcache unix sd_mod mptscsih mptbase scsi_mod x
fs
Aug 5 11:27:59 localhost kernel: Pid: 44, comm: pdflush Not tainted 2.6.8-rc1
Aug 5 11:27:59 localhost kernel: RIP: 0010:[kmem_getpages+132/448] <ffffffff8015ab54>{kmem_getpages+132}
Aug 5 11:27:59 localhost kernel: RSP: 0018:00000100081338d8 EFLAGS: 00010013
Aug 5 11:27:59 localhost kernel: RAX: ffffffff7fffffff RBX: 00000101ffffe680 RCX: 0000000000000000
Aug 5 11:27:59 localhost kernel: RDX: 0000010000011700 RSI: 00000100000119c0 RDI: 0000010000012500
Aug 5 11:27:59 localhost kernel: RBP: 00000101ffffe680 R08: 000001016bc00000 R09: 0000000000000001
Aug 5 11:27:59 localhost kernel: R10: 0000000000000001 R11: 00000101ffffe6e8 R12: 0000000000000200
Aug 5 11:27:59 localhost kernel: R13: 0000000000000000 R14: 0000000000000003 R15: 0000000000000000
Aug 5 11:27:59 localhost kernel: FS: 0000000000869c80(0000) GS:ffffffff803f3880(0000) knlGS:0000000000000000
Aug 5 11:27:59 localhost kernel: CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
Aug 5 11:27:59 localhost kernel: CR2: 0000000000001770 CR3: 0000000000101000 CR4: 00000000000006e0
Aug 5 11:27:59 localhost kernel: Process pdflush (pid: 44, threadinfo 0000010008132000, task 0000010008131170)
Aug 5 11:27:59 localhost kernel: Stack: 000101016c3fd000 0000000000000040 0000000000000200 ffffffff8015b9f6
Aug 5 11:27:59 localhost kernel: 00000200fff15128 00000101ffffe680 00000101ffffe6c8 0000000000000200
Aug 5 11:27:59 localhost kernel: 00000100081339b8 000001016c3f7be0
Aug 5 11:27:59 localhost kernel: Call Trace:<ffffffff8015b9f6>{cache_grow+182} <ffffffff8015bc31>{cache_alloc_refill+401}
Aug 5 11:27:59 localhost kernel: <ffffffff8015bf16>{kmem_cache_alloc+54} <ffffffffa025f01e>{:multipath:mp_pool_alloc+30}
Aug 5 11:27:59 localhost kernel: <ffffffff80155b11>{mempool_alloc+161} <ffffffff80155b11>{mempool_alloc+161}
Aug 5 11:27:59 localhost kernel: <ffffffff80133d70>{autoremove_wake_function+0} <ffffffff80133d70>{autoremove_wake_function+0}
Aug 5 11:27:59 localhost kernel: <ffffffffa025f336>{:multipath:multipath_make_request+38}
Aug 5 11:27:59 localhost kernel: <ffffffff80217aeb>{generic_make_request+347} <ffffffff801792cd>{bio_clone+13}
Aug 5 11:27:59 localhost kernel: <ffffffffa016e4f6>{:dm_mod:clone_bio+54} <ffffffffa016e5d5>{:dm_mod:__clone_and_map+165}
Aug 5 11:27:59 localhost kernel: <ffffffff80133d70>{autoremove_wake_function+0} <ffffffffa016e808>{:dm_mod:__split_bio+168}
Aug 5 11:27:59 localhost kernel: <ffffffff80155b11>{mempool_alloc+161} <ffffffffa016e8f2>{:dm_mod:dm_request+114}
Aug 5 11:27:59 localhost kernel: <ffffffff80133d70>{autoremove_wake_function+0} <ffffffff80217aeb>{generic_make_request+347}
Aug 5 11:27:59 localhost kernel: <ffffffff80133d70>{autoremove_wake_function+0} <ffffffff80217c10>{submit_bio+272}
Aug 5 11:27:59 localhost kernel: <ffffffff8017723b>{__block_write_full_page+459} <ffffffff8017b6d0>{blkdev_get_block+0}
Aug 5 11:27:59 localhost kernel: <ffffffff801968ff>{mpage_writepages+367} <ffffffff8017b820>{blkdev_writepage+0}
Aug 5 11:27:59 localhost kernel: <ffffffff801588fe>{do_writepages+30} <ffffffff8019501f>{__sync_single_inode+111}
Aug 5 11:27:59 localhost kernel: <ffffffff8019546f>{sync_sb_inodes+495} <ffffffff801955d4>{writeback_inodes+132}
Aug 5 11:27:59 localhost kernel: <ffffffff801585c7>{background_writeout+119} <ffffffff801592a0>{pdflush+0}
Aug 5 11:27:59 localhost kernel: <ffffffff801591e9>{__pdflush+297} <ffffffff801592bc>{pdflush+28}
Aug 5 11:27:59 localhost kernel: <ffffffff80158550>{background_writeout+0} <ffffffff8014a002>{kthread+146}
Aug 5 11:27:59 localhost kernel: <ffffffff80111447>{child_rip+8} <ffffffff801592a0>{pdflush+0}
Aug 5 11:27:59 localhost kernel: <ffffffff8014a040>{keventd_create_kthread+0} <ffffffff80149f70>{kthread+0}
Aug 5 11:27:59 localhost kernel: <ffffffff8011143f>{child_rip+0}
Aug 5 11:27:59 localhost kernel:
Aug 5 11:27:59 localhost kernel: Code: 48 8b 91 70 17 00 00 76 07 b8 00 00 00 80 eb 0a 48 b8 00 00
Aug 5 11:27:59 localhost kernel: RIP <ffffffff8015ab54>{kmem_getpages+132} RSP <00000100081338d8>
Aug 5 11:27:59 localhost kernel: CR2: 0000000000001770
Aug 5 11:27:59 localhost kernel: <1>Unable to handle kernel paging request at 0000000000001770 RIP:
Aug 5 11:27:59 localhost kernel: <ffffffff8015ab54>{kmem_getpages+132}
Aug 5 11:27:59 localhost kernel: PML4 1c8669067 PGD 1c81b0067 PMD 0
Aug 5 11:27:59 localhost kernel: Oops: 0000 [2] SMP
Aug 5 11:27:59 localhost kernel: CPU 0
Aug 5 11:27:59 localhost kernel: Modules linked in: multipath md ipv6 qla2300 qla2xxx scsi_transport_fc ohci_hcd hw_random amd74xx evdev
tg3 rtc bonding dm_snapshot dm_mod ide_generic ide_cd ide_core cdrom isofs ext2 ext3 jbd mbcache unix sd_mod mptscsih mptbase scsi_mod x
fs
Aug 5 11:27:59 localhost kernel: Pid: 32686, comm: mkfs.ext3 Not tainted 2.6.8-rc1
Aug 5 11:27:59 localhost kernel: RIP: 0010:[kmem_getpages+132/448] <ffffffff8015ab54>{kmem_getpages+132}
Aug 5 11:27:59 localhost kernel: RSP: 0018:00000101c8f0db98 EFLAGS: 00010213
Aug 5 11:27:59 localhost kernel: RAX: ffffffff7fffffff RBX: 00000101fffc9680 RCX: 0000000000000000
Aug 5 11:27:59 localhost kernel: RDX: 0000010000011700 RSI: 00000100000119c0 RDI: 0000010000012500
Aug 5 11:27:59 localhost kernel: RBP: 00000101fffc9680 R08: 000001016bc0c000 R09: 000001016ef90cc8
Aug 5 11:27:59 localhost kernel: R10: 00000101fffc96d8 R11: 00000101fffc96e8 R12: 0000000000000050
Aug 5 11:27:59 localhost kernel: R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000010
Aug 5 11:27:59 localhost kernel: FS: 0000000000869c80(0000) GS:ffffffff803f3880(0000) knlGS:0000000000000000
Aug 5 11:27:59 localhost kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Aug 5 11:27:59 localhost kernel: CR2: 0000000000001770 CR3: 0000000000101000 CR4: 00000000000006e0
Aug 5 11:27:59 localhost kernel: Process mkfs.ext3 (pid: 32686, threadinfo 00000101c8f0c000, task 00000101fdb695f0)
Aug 5 11:27:59 localhost kernel: Stack: 0000010000012500 0000000000000012 0000000000000050 ffffffff8015b9f6
Aug 5 11:27:59 localhost kernel: 0000005010001520 00000101fffc9680 00000101fffc96c8 0000000000000050
Aug 5 11:27:59 localhost kernel: 0000010005f92268 0000000000000001
Aug 5 11:27:59 localhost kernel: Call Trace:<ffffffff8015b9f6>{cache_grow+182} <ffffffff8015bc31>{cache_alloc_refill+401}
Aug 5 11:27:59 localhost kernel: <ffffffff8015bf16>{kmem_cache_alloc+54} <ffffffff80178e41>{alloc_buffer_head+17}
Aug 5 11:27:59 localhost kernel: <ffffffff801765ea>{create_buffers+42} <ffffffff80176fa6>{create_empty_buffers+22}
Aug 5 11:27:59 localhost kernel: <ffffffff8017740f>{__block_prepare_write+175} <ffffffff8017b6d0>{blkdev_get_block+0}
Aug 5 11:27:59 localhost kernel: <ffffffff801572a2>{__alloc_pages+818} <ffffffff80177e9a>{block_prepare_write+26}
Aug 5 11:27:59 localhost kernel: <ffffffff80154db3>{generic_file_aio_write_nolock+1315}
Aug 5 11:27:59 localhost kernel: <ffffffff801f80f2>{write_chan+402} <ffffffff801f818c>{write_chan+556}
Aug 5 11:27:59 localhost kernel: <ffffffff80155257>{generic_file_write_nolock+103} <ffffffff8017c71a>{blkdev_file_write+26}
Aug 5 11:27:59 localhost kernel: <ffffffff801740f4>{vfs_write+228} <ffffffff80174209>{sys_write+73}
Aug 5 11:27:59 localhost kernel: <ffffffff8011091a>{system_call+126}
Aug 5 11:27:59 localhost kernel:
Aug 5 11:27:59 localhost kernel: Code: 48 8b 91 70 17 00 00 76 07 b8 00 00 00 80 eb 0a 48 b8 00 00
Aug 5 11:27:59 localhost kernel: RIP <ffffffff8015ab54>{kmem_getpages+132} RSP <00000101c8f0db98>
Aug 5 11:27:59 localhost kernel: CR2: 0000000000001770
Aug 5 11:27:59 localhost kernel: <1>Unable to handle kernel paging request at 0000000000001770 RIP:
Aug 5 11:27:59 localhost kernel: <ffffffff8015ab54>{kmem_getpages+132}
Aug 5 11:27:59 localhost kernel: PML4 1febf4067 PGD 1fe9e3067 PMD 0
Aug 5 11:27:59 localhost kernel: Oops: 0000 [3] SMP
Aug 5 11:27:59 localhost kernel: CPU 0
Aug 5 11:27:59 localhost kernel: Modules linked in: multipath md ipv6 qla2300 qla2xxx scsi_transport_fc ohci_hcd hw_random amd74xx evdev
tg3 rtc bonding dm_snapshot dm_mod ide_generic ide_cd ide_core cdrom isofs ext2 ext3 jbd mbcache unix sd_mod mptscsih mptbase scsi_mod x
fs
Aug 5 11:27:59 localhost kernel: Pid: 32686, comm: mkfs.ext3 Not tainted 2.6.8-rc1
Aug 5 11:27:59 localhost kernel: RIP: 0010:[kmem_getpages+132/448] <ffffffff8015ab54>{kmem_getpages+132}
Aug 5 11:27:59 localhost kernel: RSP: 0018:00000101c8f0d3c8 EFLAGS: 00010013
Aug 5 11:27:59 localhost kernel: RAX: ffffffff7fffffff RBX: 00000101ffffe680 RCX: 0000000000000000
Aug 5 11:27:59 localhost kernel: RDX: 0000010000011700 RSI: 00000100000119c0 RDI: 0000010000012500
Aug 5 11:27:59 localhost kernel: RBP: 00000101ffffe680 R08: 000001016bc0d000 R09: 00000100fbe67010
Aug 5 11:27:59 localhost kernel: R10: 00000101ffffe6d8 R11: 00000101ffffe6e8 R12: 0000000000000200
Aug 5 11:27:59 localhost kernel: R13: 0000000000000000 R14: 0000000000000003 R15: 0000000000000000
Aug 5 11:27:59 localhost kernel: FS: 0000002aa39e7a40(0000) GS:ffffffff803f3880(0000) knlGS:0000000000000000
Aug 5 11:27:59 localhost kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Aug 5 11:27:59 localhost kernel: CR2: 0000000000001770 CR3: 0000000000101000 CR4: 00000000000006e0
Aug 5 11:27:59 localhost kernel: Process mkfs.ext3 (pid: 32686, threadinfo 00000101c8f0c000, task 00000101fdb695f0)
Aug 5 11:27:59 localhost kernel: Stack: 0000000000000000 ffffffff801572d0 0000000000000200 ffffffff8015b9f6
Aug 5 11:27:59 localhost kernel: 000002006e416000 00000101ffffe680 00000101ffffe6c8 0000000000000200
Aug 5 11:27:59 localhost kernel: 00000101c8f0d4a8 00000101d1729fe8
Aug 5 11:27:59 localhost kernel: Call Trace:<ffffffff801572d0>{__get_free_pages+16} <ffffffff8015b9f6>{cache_grow+182}
Aug 5 11:27:59 localhost kernel: <ffffffff8015bc31>{cache_alloc_refill+401} <ffffffff8015bf16>{kmem_cache_alloc+54}
Aug 5 11:27:59 localhost kernel: <ffffffffa025f01e>{:multipath:mp_pool_alloc+30} <ffffffff80155b11>{mempool_alloc+161}
Aug 5 11:27:59 localhost kernel: <ffffffff80155b11>{mempool_alloc+161} <ffffffff80133d70>{autoremove_wake_function+0}
Aug 5 11:27:59 localhost kernel: <ffffffff80133d70>{autoremove_wake_function+0} <ffffffffa025f336>{:multipath:multipath_make_requ
est+38}
Aug 5 11:27:59 localhost kernel: <ffffffff80217aeb>{generic_make_request+347} <ffffffff801792cd>{bio_clone+13}
Aug 5 11:27:59 localhost kernel: <ffffffffa016e4f6>{:dm_mod:clone_bio+54} <ffffffffa016e5d5>{:dm_mod:__clone_and_map+165}
Aug 5 11:27:59 localhost kernel: <ffffffff80133d70>{autoremove_wake_function+0} <ffffffffa016e808>{:dm_mod:__split_bio+168}
Aug 5 11:27:59 localhost kernel: <ffffffff80155b11>{mempool_alloc+161} <ffffffffa016e8f2>{:dm_mod:dm_request+114}
Aug 5 11:27:59 localhost kernel: <ffffffff80133d70>{autoremove_wake_function+0} <ffffffff80217aeb>{generic_make_request+347}
Aug 5 11:27:59 localhost kernel: <ffffffff80133d70>{autoremove_wake_function+0} <ffffffff80217c10>{submit_bio+272}
Aug 5 11:27:59 localhost kernel: <ffffffff8017723b>{__block_write_full_page+459} <ffffffff8017b6d0>{blkdev_get_block+0}
Aug 5 11:27:59 localhost kernel: <ffffffff801968ff>{mpage_writepages+367} <ffffffff8017b820>{blkdev_writepage+0}
Aug 5 11:27:59 localhost kernel: <ffffffff801588fe>{do_writepages+30} <ffffffff801529b4>{__filemap_fdatawrite+132}
Aug 5 11:27:59 localhost kernel: <ffffffff80175668>{sync_blockdev+40} <ffffffff8017c587>{blkdev_put+103}
Aug 5 11:27:59 localhost kernel: <ffffffff80174ef2>{__fput+82} <ffffffff801737ee>{filp_close+126}
Aug 5 11:27:59 localhost kernel: <ffffffff80137c13>{put_files_struct+115} <ffffffff801389e6>{do_exit+534}
Aug 5 11:27:59 localhost kernel: <ffffffff80120eeb>{do_page_fault+1035} <ffffffffa01ce677>{:qla2xxx:qla2x00_next+583}
Aug 5 11:27:59 localhost kernel: <ffffffff80156984>{__rmqueue+228} <ffffffff80111291>{error_exit+0}
Aug 5 11:27:59 localhost kernel: <ffffffff8015ab54>{kmem_getpages+132} <ffffffff8015aaf6>{kmem_getpages+38}
Aug 5 11:27:59 localhost kernel: <ffffffff8015b9f6>{cache_grow+182} <ffffffff8015bc31>{cache_alloc_refill+401}
Aug 5 11:27:59 localhost kernel: <ffffffff8015bf16>{kmem_cache_alloc+54} <ffffffff80178e41>{alloc_buffer_head+17}
Aug 5 11:27:59 localhost kernel: <ffffffff801765ea>{create_buffers+42} <ffffffff80176fa6>{create_empty_buffers+22}
Aug 5 11:27:59 localhost kernel: <ffffffff8017740f>{__block_prepare_write+175} <ffffffff8017b6d0>{blkdev_get_block+0}
Aug 5 11:27:59 localhost kernel: <ffffffff801572a2>{__alloc_pages+818} <ffffffff80177e9a>{block_prepare_write+26}
Aug 5 11:27:59 localhost kernel: <ffffffff80154db3>{generic_file_aio_write_nolock+1315}
Aug 5 11:27:59 localhost kernel: <ffffffff801f80f2>{write_chan+402} <ffffffff801f818c>{write_chan+556}
Aug 5 11:27:59 localhost kernel: <ffffffff80155257>{generic_file_write_nolock+103} <ffffffff8017c71a>{blkdev_file_write+26}
Aug 5 11:27:59 localhost kernel: <ffffffff801740f4>{vfs_write+228} <ffffffff80174209>{sys_write+73}
Aug 5 11:27:59 localhost kernel: <ffffffff8011091a>{system_call+126}
Aug 5 11:27:59 localhost kernel: <ffffffff80155b11>{mempool_alloc+161} <ffffffffa016e8f2>{:dm_mod:dm_request+114}
Aug 5 11:27:59 localhost kernel: <ffffffff80133d70>{autoremove_wake_function+0} <ffffffff80217aeb>{generic_make_request+347}
Aug 5 11:27:59 localhost kernel: <ffffffff80133d70>{autoremove_wake_function+0} <ffffffff80217c10>{submit_bio+272}
Aug 5 11:27:59 localhost kernel: <ffffffff8017723b>{__block_write_full_page+459} <ffffffff8017b6d0>{blkdev_get_block+0}
Aug 5 11:27:59 localhost kernel: <ffffffff801968ff>{mpage_writepages+367} <ffffffff8017b820>{blkdev_writepage+0}
Aug 5 11:27:59 localhost kernel: <ffffffff801588fe>{do_writepages+30} <ffffffff801529b4>{__filemap_fdatawrite+132}
Aug 5 11:27:59 localhost kernel: <ffffffff80175668>{sync_blockdev+40} <ffffffff8017c587>{blkdev_put+103}
Aug 5 11:27:59 localhost kernel: <ffffffff80174ef2>{__fput+82} <ffffffff801737ee>{filp_close+126}
Aug 5 11:27:59 localhost kernel: <ffffffff80137c13>{put_files_struct+115} <ffffffff801389e6>{do_exit+534}
Aug 5 11:27:59 localhost kernel: <ffffffff80120eeb>{do_page_fault+1035} <ffffffffa01ce677>{:qla2xxx:qla2x00_next+583}
Aug 5 11:27:59 localhost kernel: <ffffffff80156984>{__rmqueue+228} <ffffffff80111291>{error_exit+0}
Aug 5 11:27:59 localhost kernel: <ffffffff8015ab54>{kmem_getpages+132} <ffffffff8015aaf6>{kmem_getpages+38}
Aug 5 11:27:59 localhost kernel: <ffffffff8015b9f6>{cache_grow+182} <ffffffff8015bc31>{cache_alloc_refill+401}
Aug 5 11:27:59 localhost kernel: <ffffffff8015bf16>{kmem_cache_alloc+54} <ffffffff80178e41>{alloc_buffer_head+17}
Aug 5 11:27:59 localhost kernel: <ffffffff801765ea>{create_buffers+42} <ffffffff80176fa6>{create_empty_buffers+22}
Aug 5 11:27:59 localhost kernel: <ffffffff8017740f>{__block_prepare_write+175} <ffffffff8017b6d0>{blkdev_get_block+0}
Aug 5 11:27:59 localhost kernel: <ffffffff801572a2>{__alloc_pages+818} <ffffffff80177e9a>{block_prepare_write+26}
Aug 5 11:27:59 localhost kernel: <ffffffff80154db3>{generic_file_aio_write_nolock+1315}
Aug 5 11:27:59 localhost kernel: <ffffffff801f80f2>{write_chan+402} <ffffffff801f818c>{write_chan+556}
Aug 5 11:27:59 localhost kernel: <ffffffff80155257>{generic_file_write_nolock+103} <ffffffff8017c71a>{blkdev_file_write+26}
Aug 5 11:27:59 localhost kernel: <ffffffff801740f4>{vfs_write+228} <ffffffff80174209>{sys_write+73}
Aug 5 11:27:59 localhost kernel: <ffffffff8011091a>{system_call+126}
Aug 5 11:27:59 localhost kernel:
Aug 5 11:27:59 localhost kernel: Code: 48 8b 91 70 17 00 00 76 07 b8 00 00 00 80 eb 0a 48 b8 00 00
Aug 5 11:27:59 localhost kernel: RIP <ffffffff8015ab54>{kmem_getpages+132} RSP <00000101c8f0d3c8>
Aug 5 11:27:59 localhost kernel: CR2: 0000000000001770

Attachment: config-2.6.8-rc1
Description: Binary data



Thanks,

James