Re: 3.10-rc4 stalls during mmap writes

From: Dave Chinner
Date: Sat Jun 08 2013 - 23:37:59 EST

Next message: Preeti U Murthy: "Re: power-efficient scheduling design"
Previous message: Stephen Warren: "Re: [PATCH 1/2] pinmux: Add TB10x pinmux driver"
Next in thread: Shawn Bohrer: "Re: 3.10-rc4 stalls during mmap writes"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Fri, Jun 07, 2013 at 02:37:12PM -0500, Shawn Bohrer wrote:
> I've started testing the 3.10 kernel, previously I was on 3.4, and I'm
> encounting some fairly large stalls in my memory mapped writes in the
> range of .01 to 1s. I've managed to capture two of these stalls so
> far and both looked like the following:
>
> 1) Writing process writes to a new page and blocks on xfs_ilock:
>
> <...>-21567 [009] 9435.453069: sched_switch: prev_comm=tick_receiver_m prev_pid=21567 prev_prio=79 prev_state=D ==> next_comm=swapper/9 next_pid=0 next_prio=120
> <...>-21567 [009] 9435.453072: kernel_stack: <stack trace>
> => schedule (ffffffff814ca379)
> => rwsem_down_write_failed (ffffffff814cb095)
> => call_rwsem_down_write_failed (ffffffff81275053)
> => xfs_ilock (ffffffff8120b25c)
> => xfs_vn_update_time (ffffffff811cf3d3)
> => update_time (ffffffff81158dd3)
> => file_update_time (ffffffff81158f0c)
> => block_page_mkwrite (ffffffff81171d23)
> => xfs_vm_page_mkwrite (ffffffff811c5375)
> => do_wp_page (ffffffff8110c27f)
> => handle_pte_fault (ffffffff8110dd24)
> => handle_mm_fault (ffffffff8110f430)
> => __do_page_fault (ffffffff814cef72)
> => do_page_fault (ffffffff814cf2e7)
> => page_fault (ffffffff814cbab2)

Changing C/MTIME on the inode. Needs a lock, the update is
transactional.

>
> 2) kworker calls xfs_iunlock and wakes up my process:
>
> kworker/u50:1-403 [013] 9436.027354: sched_wakeup: comm=tick_receiver_m pid=21567 prio=79 success=1 target_cpu=009
> kworker/u50:1-403 [013] 9436.027359: kernel_stack: <stack trace>
> => ttwu_do_activate.constprop.34 (ffffffff8106c556)
> => try_to_wake_up (ffffffff8106e996)
> => wake_up_process (ffffffff8106ea87)
> => __rwsem_do_wake (ffffffff8126e531)
> => rwsem_wake (ffffffff8126e62a)
> => call_rwsem_wake (ffffffff81275077)
> => xfs_iunlock (ffffffff8120b55c)
> => xfs_iomap_write_allocate (ffffffff811ce4e7)
> => xfs_map_blocks (ffffffff811bf145)
> => xfs_vm_writepage (ffffffff811bfbc2)

And allocation during writeback is holding the lock on that inode
as it's already in a transaction.

> So I guess my question is does anyone know why I'm now seeing these
> stalls with 3.10?

Because we made all metadata updates in XFS fully transactional in
3.4:

commit 8a9c9980f24f6d86e0ec0150ed35fba45d0c9f88
Author: Christoph Hellwig <hch@xxxxxxxxxxxxx>
Date: Wed Feb 29 09:53:52 2012 +0000

xfs: log timestamp updates

Timestamps on regular files are the last metadata that XFS does not update
transactionally. Now that we use the delaylog mode exclusively and made
the log scode scale extremly well there is no need to bypass that code for
timestamp updates. Logging all updates allows to drop a lot of code, and
will allow for further performance improvements later on.

Note that this patch drops optimized handling of fdatasync - it will be
added back in a separate commit.

Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx>
Signed-off-by: Christoph Hellwig <hch@xxxxxx>
Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
Signed-off-by: Ben Myers <bpm@xxxxxxx>
$ git describe --contains 8a9c998
v3.4-rc1~55^2~23

IOWs, you're just lucky you haven't noticed it on 3.4....

> Are there any suggestions for how to eliminate them?

Nope. You're stuck with it - there's far more places in the page
fault path where you can get stuck on the same lock for the same
reason - e.g. during block mapping for the newly added pagecache
page...

Hint: mmap() does not provide -deterministic- low latency access to
mapped pages - it is only "mostly low latency". mmap() has exactly
the same worst case page fault latencies as the equivalent write()
syscall. e.g., dirty too many pages and mmap() write page faults
can be throttled, just like a write() syscall....

Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Preeti U Murthy: "Re: power-efficient scheduling design"
Previous message: Stephen Warren: "Re: [PATCH 1/2] pinmux: Add TB10x pinmux driver"
Next in thread: Shawn Bohrer: "Re: 3.10-rc4 stalls during mmap writes"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]