Re: [2.6.26-rc7] shrink_icache from pagefault locking (nee: nfsdhangs for a few sec)...

From: Dave Chinner
Date: Sun Jun 22 2008 - 18:30:11 EST


On Sun, Jun 22, 2008 at 09:56:17PM +0100, Daniel J Blueman wrote:
> On Sun, Jun 22, 2008 at 7:21 PM, Arjan van de Ven <arjan@xxxxxxxxxxxxx> wrote:
> > this sort of thing can easily be exposed with the latencytop tool...
> > it will at least tell you WHAT the system is blocking on.
> > (not so much the why, the tool isn't smart enough to automatically spit
> > out kernel patches yet)
>
> Good plan. I reproduced this without NFS mounted, so pure XFS. I
> wasn't able to capture the same process's (ie 5480) latencytop trace,
> but it may have contributed to the list.
>
> A fair amount of debugging was turned on, hurting the latency some
> (despite it being a 3.6GHz Penryn).
>
> Daniel
>
> --- [1]
>
> $ vmstat 1
> procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
> r b swpd free buff cache si so bi bo in cs us sy id wa
> <snip>
> 1 1 156 14424 12 2329672 0 0 0 110755 177 3820 57 6 36 0
> 2 1 156 14736 12 2348152 0 0 24 25172 204 26018 35 21 43 1
> 5 0 156 24252 12 2363800 0 0 59656 31 545 25292 35 14 28 23
> 4 0 156 14696 12 2317784 0 0 3824 0 38 23083 95 6 0 0
> 4 0 156 14440 12 2319304 0 0 4672 0 72 3372 93 3 3 2
> 2 0 156 14428 12 2318484 0 0 0 4 27 731 52 0 49 0
> 2 0 156 14480 12 2308512 0 0 0 12 32 36629 39 13 49 0
> 2 0 156 14572 12 2301220 0 0 3904 12316 117 10760 58 7 26 11
>
> --- [2]
>
> Cause Maximum Percentage
> down xfs_buf_lock _xfs_buf_find xfs_buf_get_flags 271.1 msec 0.4 %

Waiting on I/O to complete. Your disk is busy.

> down xfs_buf_iowait xfs_buf_iostart xfs_buf_read_f206.1 msec 1.3 %

Waiting on I/O to complete. Your disk is busy.

> down xfs_buf_lock xfs_getsb xfs_trans_getsb xfs_tr160.4 msec 0.5 %

Waiting on a superblock I/O or a transaction to complete. Your
disk is busy. (Note, this one can be avoided with lazy
superblock counters).

[snip reast of "busy disk trace"]

But really, all latencytop is telling you here is that it takes
time to wait for I/O to complete. It's mostly useless for tracking
down locking issues when you've got I/O in progress...

> 2.6.26-rc7-211c #2
> -------------------------------------------------------
> AutopanoPro/5480 is trying to acquire lock:
> (iprune_mutex){--..}, at: [<ffffffff802d2a5d>] shrink_icache_memory+0x7d/0x280
>
> but task is already holding lock:
> (&(&ip->i_iolock)->mr_lock){----}, at: [<ffffffff803a459f>]
> xfs_ilock+0xbf/0x110
>
> which lock already depends on the new lock.
>
> the existing dependency chain (in reverse order) is:
>
> -> #1 (&(&ip->i_iolock)->mr_lock){----}:
> [<ffffffff802769ad>] __lock_acquire+0xbdd/0x1020
> [<ffffffff80276e55>] lock_acquire+0x65/0x90
> [<ffffffff8026b41b>] down_write_nested+0x4b/0x90
> [<ffffffff803a45df>] xfs_ilock+0xff/0x110
> [<ffffffff803a47af>] xfs_ireclaim+0x3f/0x90
> [<ffffffff803c9169>] xfs_finish_reclaim+0x59/0x220
> [<ffffffff803cc3b5>] xfs_reclaim+0x185/0x190
> [<ffffffff803d9211>] xfs_fs_clear_inode+0xe1/0x130
> [<ffffffff802d25c7>] clear_inode+0x87/0x120
> [<ffffffff802d290a>] dispose_list+0x2a/0x100
> [<ffffffff802d2c06>] shrink_icache_memory+0x226/0x280
> [<ffffffff8029d9c5>] shrink_slab+0x125/0x180
> [<ffffffff8029dc52>] try_to_free_pages+0x232/0x360
> [<ffffffff80297f0d>] __alloc_pages_internal+0x1ed/0x4a0
> [<ffffffff802981db>] __alloc_pages+0xb/0x10
> [<ffffffff802a312a>] handle_mm_fault+0x46a/0x6d0
> [<ffffffff8060510a>] do_page_fault+0x3ca/0x830
> [<ffffffff80602add>] error_exit+0x0/0xa9
> [<ffffffffffffffff>] 0xffffffffffffffff

mmap_sem -> iprune_mutex -> xfs_ilock

> -> #0 (iprune_mutex){--..}:
> [<ffffffff80276817>] __lock_acquire+0xa47/0x1020
> [<ffffffff80276e55>] lock_acquire+0x65/0x90
> [<ffffffff8060059a>] mutex_lock_nested+0xba/0x2b0
> [<ffffffff802d2a5d>] shrink_icache_memory+0x7d/0x280
> [<ffffffff8029d9c5>] shrink_slab+0x125/0x180
> [<ffffffff8029dc52>] try_to_free_pages+0x232/0x360
> [<ffffffff80297f0d>] __alloc_pages_internal+0x1ed/0x4a0
> [<ffffffff802981db>] __alloc_pages+0xb/0x10
> [<ffffffff8029a6b6>] __do_page_cache_readahead+0x136/0x230
> [<ffffffff8029aa08>] ondemand_readahead+0x128/0x1f0
> [<ffffffff8029ab45>] page_cache_async_readahead+0x75/0xa0
> [<ffffffff80293a8a>] generic_file_aio_read+0x28a/0x610
> [<ffffffff803d78c4>] xfs_read+0x124/0x270
> [<ffffffff803d4416>] __xfs_file_read+0x46/0x50
> [<ffffffff803d4451>] xfs_file_aio_read+0x11/0x20
> [<ffffffff802bc1b1>] do_sync_read+0xf1/0x130
> [<ffffffff802bca74>] vfs_read+0xc4/0x160
> [<ffffffff802bcf10>] sys_read+0x50/0x90
> [<ffffffff8022639b>] system_call_after_swapgs+0x7b/0x80
> [<ffffffffffffffff>] 0xffffffffffffffff

xfs_ilock -> iprune_mutex

This is exactly the situation I mentioned in the previous email.
There is no potential deadlock here between the xfs_ilock and
iprune_mutex as the xfs_ilock that is held before and/or after
iprune_mutex is guaranteed to be different (the first is in use
so will never be found by shrink_icache_memory())...

Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/