Re: Crashes during boot on 2.6.30 / 2.6.31-rc, random programs

From: Joao Correia
Date: Wed Jul 01 2009 - 16:30:56 EST


Hello again

Looks like this just opened the lid on some other limits. I just hit
another thing, which hadn't shown up before i made the changes, but
looks like its just another limit thats too low. This one is harder to
reproduce tho.

Jun 30 21:35:03 hightech kernel: BUG: MAX_LOCKDEP_CHAINS too low!
Jun 30 21:35:03 hightech kernel: turning off the locking correctness validator.
Jun 30 21:35:03 hightech kernel: Pid: 9379, comm: qemu-kvm Not tainted
2.6.30-wl #3
Jun 30 21:35:03 hightech kernel: Call Trace:
Jun 30 21:35:03 hightech kernel: [<c07da76f>] ? printk+0x22/0x3b
Jun 30 21:35:03 hightech kernel: [<c04638f5>] __lock_acquire+0x583/0xb05
Jun 30 21:35:03 hightech kernel: [<c046a107>] ? __module_text_address+0x20/0x6d
Jun 30 21:35:03 hightech kernel: [<c0463f2e>] lock_acquire+0xb7/0xeb
Jun 30 21:35:03 hightech kernel: [<c05cf959>] ? get_hash_bucket+0x3a/0x55
Jun 30 21:35:03 hightech kernel: [<c05cf959>] ? get_hash_bucket+0x3a/0x55
Jun 30 21:35:03 hightech kernel: [<c07dd75f>] _spin_lock_irqsave+0x45/0x89
Jun 30 21:35:03 hightech kernel: [<c05cf959>] ? get_hash_bucket+0x3a/0x55
Jun 30 21:35:03 hightech kernel: [<c05cf959>] get_hash_bucket+0x3a/0x55
Jun 30 21:35:03 hightech kernel: [<c05d09e8>] add_dma_entry+0x1f/0x4f
Jun 30 21:35:03 hightech kernel: [<c05d0c66>] debug_dma_map_sg+0xe1/0x147
Jun 30 21:35:03 hightech kernel: [<c0695a19>] ata_qc_issue+0x1c0/0x254
Jun 30 21:35:03 hightech kernel: [<c069bf44>] __ata_scsi_queuecmd+0x16a/0x1c8
Jun 30 21:35:03 hightech kernel: [<c067a37e>] ? scsi_done+0x0/0x32
Jun 30 21:35:03 hightech kernel: [<c069b9b1>] ? ata_scsi_rw_xlat+0x0/0x207
Jun 30 21:35:03 hightech kernel: [<c069c05b>] ata_scsi_queuecmd+0x55/0x95
Jun 30 21:35:03 hightech kernel: [<c067a37e>] ? scsi_done+0x0/0x32
Jun 30 21:35:03 hightech kernel: [<c067a627>] scsi_dispatch_cmd+0x187/0x207
Jun 30 21:35:03 hightech kernel: [<c068044c>] scsi_request_fn+0x33d/0x480
Jun 30 21:35:03 hightech kernel: [<c0445a1e>] ? del_timer+0x54/0x6e
Jun 30 21:35:03 hightech kernel: [<c05aefbe>] __generic_unplug_device+0x3e/0x53
Jun 30 21:35:03 hightech kernel: [<c05af00a>] blk_start_queueing+0x37/0x4a
Jun 30 21:35:03 hightech kernel: [<c05bcc7e>] cfq_insert_request+0x46f/0x498
Jun 30 21:35:03 hightech kernel: [<c05ad43b>] ? __rcu_read_unlock+0x0/0x3d
Jun 30 21:35:03 hightech kernel: [<c05ac60d>] elv_insert+0x12f/0x1d0
Jun 30 21:35:03 hightech kernel: [<c05ac763>] __elv_add_request+0xb5/0xce
Jun 30 21:35:03 hightech kernel: [<c05ad465>] ? __rcu_read_unlock+0x2a/0x3d
Jun 30 21:35:03 hightech kernel: [<c05ad533>] ? drive_stat_acct+0xbb/0xd3
Jun 30 21:35:03 hightech kernel: [<c05af6b3>] __make_request+0x33b/0x3c1
Jun 30 21:35:03 hightech kernel: [<c04cd85c>] ? check_valid_pointer+0x2c/0x6c
Jun 30 21:35:03 hightech kernel: [<c046249e>] ? mark_lock+0x29/0x1f3
Jun 30 21:35:03 hightech kernel: [<c04cf83a>] ? kmem_cache_alloc+0xc2/0x150
Jun 30 21:35:03 hightech kernel: [<c05ade19>] generic_make_request+0x2fd/0x351
Jun 30 21:35:03 hightech kernel: [<c05ade19>] ? generic_make_request+0x2fd/0x351
Jun 30 21:35:03 hightech kernel: [<c04a8b4e>] ? mempool_alloc+0x5b/0x105
Jun 30 21:35:03 hightech kernel: [<c04626ba>] ? mark_held_locks+0x52/0x7c
Jun 30 21:35:03 hightech kernel: [<c05adf2d>] submit_bio+0xc0/0xd9
Jun 30 21:35:03 hightech kernel: [<c04fd28d>] ? bio_alloc_bioset+0x35/0xd5
Jun 30 21:35:03 hightech kernel: [<c04f845c>] submit_bh+0xea/0x11b
Jun 30 21:35:03 hightech kernel: [<c04fae00>]
__block_write_full_page+0x218/0x2f0
Jun 30 21:35:03 hightech kernel: [<c054427d>] ?
ext4_normal_get_block_write+0x0/0x76
Jun 30 21:35:03 hightech kernel: [<c04fb5ea>] ? end_buffer_async_write+0x0/0x140
Jun 30 21:35:03 hightech kernel: [<c04fb5ea>] ? end_buffer_async_write+0x0/0x140
Jun 30 21:35:03 hightech kernel: [<c04faf4c>]
block_write_full_page_endio+0x74/0xda
Jun 30 21:35:03 hightech kernel: [<c04fb5ea>] ? end_buffer_async_write+0x0/0x140
Jun 30 21:35:03 hightech kernel: [<c054427d>] ?
ext4_normal_get_block_write+0x0/0x76
Jun 30 21:35:03 hightech kernel: [<c04fafd4>] block_write_full_page+0x22/0x38
Jun 30 21:35:03 hightech kernel: [<c04fb5ea>] ? end_buffer_async_write+0x0/0x140
Jun 30 21:35:03 hightech kernel: [<c0543390>] ext4_da_writepage+0x1aa/0x1da
Jun 30 21:35:03 hightech kernel: [<c054315b>] mpage_da_submit_io+0x9b/0xf0
Jun 30 21:35:03 hightech kernel: [<c0545f40>] ext4_da_writepages+0x2c2/0x445
Jun 30 21:35:03 hightech kernel: [<c04ad194>] do_writepages+0x32/0x58
Jun 30 21:35:03 hightech kernel: [<c04a739b>]
__filemap_fdatawrite_range+0x73/0x8c
Jun 30 21:35:03 hightech kernel: [<c04a7c81>] filemap_fdatawrite_range+0x2b/0x44
Jun 30 21:35:03 hightech kernel: [<c04a7e9e>] sync_page_range+0x7f/0xf5
Jun 30 21:35:03 hightech kernel: [<c04629d3>] ? trace_hardirqs_on+0x19/0x2c
Jun 30 21:35:03 hightech kernel: [<c04a7fd1>] generic_file_aio_write+0xbd/0xe1
Jun 30 21:35:03 hightech kernel: [<c053da09>] ext4_file_write+0xc2/0x149
Jun 30 21:35:03 hightech kernel: [<c04d7cf2>] do_sync_write+0xbe/0x10a
Jun 30 21:35:03 hightech kernel: [<c046249e>] ? mark_lock+0x29/0x1f3
Jun 30 21:35:03 hightech kernel: [<c0451586>] ?
autoremove_wake_function+0x0/0x55
Jun 30 21:35:03 hightech kernel: [<c04d920e>] ? __rcu_read_unlock+0x0/0x3d
Jun 30 21:35:03 hightech kernel: [<c058377a>] ?
security_file_permission+0x23/0x36
Jun 30 21:35:03 hightech kernel: [<c04d7eed>] ? rw_verify_area+0xa5/0xd8
Jun 30 21:35:03 hightech kernel: [<c04d7c34>] ? do_sync_write+0x0/0x10a
Jun 30 21:35:03 hightech kernel: [<c04d837f>] vfs_write+0x9f/0x10f
Jun 30 21:35:03 hightech kernel: [<c04d8459>] sys_pwrite64+0x6a/0x93
Jun 30 21:35:03 hightech kernel: [<c04037f3>] sysenter_do_call+0x12/0x38


Now the whole call trace is showing up (so the other limits were
raised correctly).

Im going to go through the source and raise this limit as well, but as
i said above, this one is harder to trigger, so it may take a couple
of days to see it again.

The first problem was indeed just the tip of the iceberg. Perphaps a
more comprehensive change of all the related limits is in order? (I do
not know if such change has already happened or not, if so, please
point me in the right direction.)

Again,
Thank you very much for your time,
Joao Correia
Centro de Informatica
Universidade da Beira Interior
Portugal

On Tue, Jun 30, 2009 at 10:16 AM, Amerigo Wang<xiyou.wangcong@xxxxxxxxx> wrote:
> On Mon, Jun 29, 2009 at 11:30:41AM +0100, Joao Correia wrote:
>>Hello
>>
>>The system seemed to happily ignore all the sysctl.conf changes and
>>all echo VALUE >  /proc/sys/kernel/max_lock_depth
>
>
> Ouch, mistake! max_lock_depth is for rt_mutex... :)
>
>>
>>So i dug a little on the source and changed
>>
>>include/linux/sched.h
>>
>># define MAX_LOCK_DEPTH 48UL
>>
>>to
>>
>># define MAX_LOCK_DEPTH 96UL
>>
>>and im getting no more errors. Of course, now its probably radioactive
>>and about to blow up, but at least it's not complaining anymore :).
>
> Yeah, this is right.
>
> Let's Cc: Peter to see if he would like to change this number...
>
> Peter?
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/