Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression

From: Ye Xiaolong
Date: Fri Aug 12 2016 - 04:54:25 EST


On 08/12, Ye Xiaolong wrote:
>On 08/12, Dave Chinner wrote:

[snip]

>>lkp-folk: the patch I've just tested it attached below - can you
>>feed that through your test and see if it fixes the regression?
>>
>
>Hi, Dave
>
>I am verifying your fix patch in lkp environment now, will send the
>result once I get it.
>

Here is the test result.

commit 636b594f38278080db93f2d67d11d31700924f5d
Author: Dave Chinner <dchinner@xxxxxxxxxx>
AuthorDate: Fri Aug 12 14:23:44 2016 +0800
Commit: Xiaolong Ye <xiaolong.ye@xxxxxxxxx>
CommitDate: Fri Aug 12 14:23:44 2016 +0800

When a write occurs that extends the file, we check to see if we
need to preallocate more delalloc space. When we do sub-page
writes, the new iomap write path passes a sub-block write length to
the block mapping code. xfs_iomap_write_delay does not expect to be
pased byte counts smaller than one filesystem block, so it ends up
checking the BMBT on for blocks beyond EOF on every write,
regardless of whether we need to or not. This causes a regression in
aim7 benchmarks as it is full of sub-page writes.

To fix this, clamp the minimum length of a mapping request coming
through xfs_file_iomap_begin() to one filesystem block. This ensures
we are passing the same length to xfs_iomap_write_delay() as we did
when calling through the get_blocks path. This substantially reduces
the amount of lookup load being placed on the BMBT during sub-block
write loads.

Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
---
fs/xfs/xfs_iomap.c | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 620fc91..5eaace0 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -1015,10 +1015,15 @@ xfs_file_iomap_begin(
* number pulled out of thin air as a best guess for initial
* testing.
*
+ * xfs_iomap_write_delay() only works if the length passed in is
+ * >= one filesystem block. Hence we need to clamp the minimum
+ * length we map, too.
+ *
* Note that the values needs to be less than 32-bits wide until
* the lower level functions are updated.
*/
length = min_t(loff_t, length, 1024 * PAGE_SIZE);
+ length = max_t(loff_t, length, (1 << inode->i_blkbits));
if (xfs_get_extsz_hint(ip)) {
/*
* xfs_iomap_write_direct() expects the shared lock. It

f0c6bcba74ac51cb 68a9f5e7007c1afa2cf6830b69 636b594f38278080db93f2d67d
---------------- -------------------------- --------------------------
%stddev %change %stddev %change %stddev
\ | \ | \
484435 ± 0% -13.3% 420004 ± 0% -14.0% 416777 ± 0% aim7.jobs-per-min
6491 ± 3% +30.8% 8491 ± 0% +35.7% 8806 ± 1% aim7.time.involuntary_context_switches
376 ± 0% +28.4% 484 ± 0% +29.6% 488 ± 0% aim7.time.system_time
430512 ± 0% -20.1% 343838 ± 0% -19.7% 345708 ± 0% aim7.time.voluntary_context_switches
37.37 ± 0% +15.3% 43.09 ± 0% +16.1% 43.41 ± 0% aim7.time.elapsed_time
37.37 ± 0% +15.3% 43.09 ± 0% +16.1% 43.41 ± 0% aim7.time.elapsed_time.max
155184 ± 1% -2.1% 151864 ± 1% -2.7% 150937 ± 1% aim7.time.minor_page_faults
0 ± 0% +Inf% 215412 ±141% +Inf% 334416 ± 75% latency_stats.sum.wait_on_page_bit.__migration_entry_wait.migration_entry_wait.handle_pte_fault.handle_mm_fault.__do_page_fault.do_page_fault.page_fault
24772 ± 0% -28.6% 17675 ± 0% -26.7% 18149 ± 2% vmstat.system.cs
26816 ± 8% +10.2% 29542 ± 1% +13.3% 30370 ± 1% interrupts.CAL:Function_call_interrupts
125122 ± 10% -10.7% 111758 ± 12% -11.1% 111223 ± 11% softirqs.SCHED
3906 ± 0% +28.8% 5032 ± 2% +29.1% 5045 ± 1% proc-vmstat.nr_active_file
3444 ± 5% +41.8% 4884 ± 0% +25.0% 4304 ± 11% proc-vmstat.nr_shmem
4092 ± 14% +61.2% 6595 ± 1% +40.0% 5728 ± 15% proc-vmstat.pgactivate
15627 ± 0% +27.7% 19956 ± 1% +27.4% 19902 ± 0% meminfo.Active(file)
16103 ± 3% +14.3% 18405 ± 8% +11.2% 17900 ± 1% meminfo.AnonHugePages
13777 ± 5% +43.1% 19709 ± 0% +25.0% 17220 ± 11% meminfo.Shmem
1724300 ± 27% -40.5% 1025538 ± 1% -41.3% 1012868 ± 0% sched_debug.cfs_rq:/.load.max
1724300 ± 27% -40.5% 1025538 ± 1% -41.3% 1012868 ± 0% sched_debug.cpu.load.max
37.37 ± 0% +15.3% 43.09 ± 0% +16.1% 43.41 ± 0% time.elapsed_time
37.37 ± 0% +15.3% 43.09 ± 0% +16.1% 43.41 ± 0% time.elapsed_time.max
6491 ± 3% +30.8% 8491 ± 0% +35.7% 8806 ± 1% time.involuntary_context_switches
1037 ± 0% +10.8% 1148 ± 0% +10.9% 1149 ± 0% time.percent_of_cpu_this_job_got
376 ± 0% +28.4% 484 ± 0% +29.6% 488 ± 0% time.system_time
430512 ± 0% -20.1% 343838 ± 0% -19.7% 345708 ± 0% time.voluntary_context_switches
319584 ± 1% -26.5% 234868 ± 1% -23.9% 243331 ± 3% cpuidle.C1-IVT.usage
52991525 ± 1% -19.4% 42687208 ± 0% -20.0% 42368754 ± 0% cpuidle.C1-IVT.time
46760 ± 0% -22.4% 36298 ± 0% -21.6% 36681 ± 1% cpuidle.C1E-IVT.usage
3468808 ± 2% -19.8% 2783341 ± 3% -16.9% 2881608 ± 5% cpuidle.C1E-IVT.time
12590471 ± 0% -22.3% 9788585 ± 1% -21.6% 9866515 ± 1% cpuidle.C3-IVT.time
79965 ± 0% -19.0% 64749 ± 0% -19.1% 64654 ± 0% cpuidle.C3-IVT.usage
1.3e+09 ± 0% +13.3% 1.473e+09 ± 0% +13.9% 1.481e+09 ± 0% cpuidle.C6-IVT.time
24.18 ± 0% +9.0% 26.35 ± 0% +9.6% 26.49 ± 0% turbostat.%Busy
686 ± 0% +9.5% 751 ± 0% +9.2% 749 ± 1% turbostat.Avg_MHz
0.28 ± 0% -25.0% 0.21 ± 0% -23.8% 0.21 ± 4% turbostat.CPU%c3
79 ± 1% -0.4% 78 ± 3% -21.5% 62 ± 2% turbostat.CoreTmp
78 ± 0% +0.4% 79 ± 3% -21.2% 62 ± 1% turbostat.PkgTmp
4.74 ± 0% -2.7% 4.61 ± 1% -13.1% 4.12 ± 0% turbostat.RAMWatt
51 ± 0% +0.0% 51 ± 0% +333.3% 221 ± 10% slabinfo.dio.active_objs
51 ± 0% +0.0% 51 ± 0% +333.3% 221 ± 10% slabinfo.dio.num_objs
876 ± 6% +2.8% 900 ± 3% +16.7% 1022 ± 0% slabinfo.nsproxy.active_objs
876 ± 6% +2.8% 900 ± 3% +16.7% 1022 ± 0% slabinfo.nsproxy.num_objs
1975 ± 15% +63.2% 3224 ± 17% +45.5% 2874 ± 15% slabinfo.scsi_data_buffer.active_objs
1975 ± 15% +63.2% 3224 ± 17% +45.5% 2874 ± 15% slabinfo.scsi_data_buffer.num_objs
464 ± 15% +63.3% 758 ± 17% +46.6% 680 ± 15% slabinfo.xfs_efd_item.active_objs
464 ± 15% +63.3% 758 ± 17% +46.6% 680 ± 15% slabinfo.xfs_efd_item.num_objs
1930 ± 0% +33.9% 2585 ± 3% +24.7% 2407 ± 5% numa-vmstat.node0.nr_active_file
466 ± 4% +29.3% 603 ± 14% +28.9% 601 ± 18% numa-vmstat.node0.nr_dirty
1977 ± 1% +23.6% 2444 ± 1% +33.6% 2641 ± 7% numa-vmstat.node1.nr_active_file
11671 ± 3% +55.9% 18197 ± 24% +43.3% 16730 ± 25% numa-vmstat.node1.nr_anon_pages
3809 ± 6% +16.1% 4422 ± 4% +21.6% 4633 ± 4% numa-vmstat.node1.nr_alloc_batch
12026 ± 4% +64.1% 19734 ± 20% +43.7% 17276 ± 22% numa-vmstat.node1.nr_active_anon
7723 ± 0% +32.6% 10238 ± 5% +19.5% 9228 ± 4% numa-meminfo.node0.Active(file)
8774 ± 29% +5.3% 9238 ± 28% +22.5% 10749 ± 24% numa-meminfo.node1.Mapped
7908 ± 1% +22.9% 9722 ± 3% +35.8% 10736 ± 3% numa-meminfo.node1.Active(file)
46721 ± 3% +55.9% 72837 ± 24% +42.8% 66711 ± 26% numa-meminfo.node1.AnonPages
56052 ± 3% +58.2% 88666 ± 17% +42.2% 79696 ± 19% numa-meminfo.node1.Active
48142 ± 4% +64.0% 78943 ± 19% +43.2% 68960 ± 22% numa-meminfo.node1.Active(anon)
2.658e+11 ± 4% +24.7% 3.316e+11 ± 2% +25.9% 3.346e+11 ± 3% perf-stat.branch-instructions
0.41 ± 1% -9.1% 0.37 ± 1% -9.4% 0.37 ± 1% perf-stat.branch-miss-rate
1.09e+09 ± 3% +13.4% 1.237e+09 ± 1% +14.1% 1.244e+09 ± 2% perf-stat.branch-misses
981138 ± 0% -18.1% 803696 ± 0% -16.0% 823913 ± 1% perf-stat.context-switches
1.511e+12 ± 5% +23.4% 1.864e+12 ± 3% +24.4% 1.88e+12 ± 4% perf-stat.cpu-cycles
102600 ± 1% -7.3% 95075 ± 1% -5.2% 97261 ± 1% perf-stat.cpu-migrations
0.26 ± 12% -30.8% 0.18 ± 10% -28.1% 0.19 ± 27% perf-stat.dTLB-load-miss-rate
3.164e+11 ± 1% +39.9% 4.426e+11 ± 4% +40.0% 4.43e+11 ± 1% perf-stat.dTLB-loads
0.03 ± 26% -41.3% 0.02 ± 13% -41.8% 0.02 ± 5% perf-stat.dTLB-store-miss-rate
2.247e+11 ± 6% +26.4% 2.839e+11 ± 2% +29.2% 2.903e+11 ± 5% perf-stat.dTLB-stores
34415974 ± 6% -1.7% 33840719 ± 12% -6.7% 32119462 ± 2% perf-stat.iTLB-load-misses
17863352 ± 4% +2.1% 18245848 ± 2% -7.9% 16460161 ± 2% perf-stat.iTLB-loads
1.49e+12 ± 4% +30.1% 1.939e+12 ± 2% +31.5% 1.959e+12 ± 3% perf-stat.instructions
43348 ± 2% +34.2% 58161 ± 12% +40.9% 61065 ± 5% perf-stat.instructions-per-iTLB-miss
0.99 ± 0% +5.5% 1.04 ± 0% +5.7% 1.04 ± 0% perf-stat.ipc
262799 ± 0% +4.4% 274251 ± 1% +4.3% 274149 ± 0% perf-stat.minor-faults
34.12 ± 1% +2.1% 34.83 ± 0% +3.5% 35.30 ± 1% perf-stat.node-load-miss-rate
46476754 ± 2% +4.6% 48601269 ± 1% +6.6% 49534267 ± 0% perf-stat.node-load-misses
9.96 ± 0% +13.4% 11.30 ± 0% +13.6% 11.31 ± 2% perf-stat.node-store-miss-rate
24460859 ± 1% +14.4% 27971097 ± 1% +13.8% 27844903 ± 0% perf-stat.node-store-misses
262780 ± 0% +4.4% 274227 ± 1% +4.3% 274117 ± 0% perf-stat.page-faults
0.00 ± 0% +Inf% 52.94 ± 0% +Inf% 52.69 ± 0% perf-profile.cycles-pp.iomap_file_buffered_write.xfs_file_buffered_aio_write.xfs_file_write_iter.__vfs_write.vfs_write
0.00 ± 0% +Inf% 52.29 ± 0% +Inf% 52.11 ± 0% perf-profile.cycles-pp.iomap_apply.iomap_file_buffered_write.xfs_file_buffered_aio_write.xfs_file_write_iter.__vfs_write
0.00 ± 0% +Inf% 34.35 ± 0% +Inf% 34.05 ± 0% perf-profile.cycles-pp.iomap_write_actor.iomap_apply.iomap_file_buffered_write.xfs_file_buffered_aio_write.xfs_file_write_iter
0.00 ± 0% +Inf% 16.48 ± 0% +Inf% 16.35 ± 1% perf-profile.cycles-pp.iomap_write_begin.iomap_write_actor.iomap_apply.iomap_file_buffered_write.xfs_file_buffered_aio_write
0.00 ± 0% +Inf% 16.05 ± 0% +Inf% 16.21 ± 1% perf-profile.cycles-pp.xfs_file_iomap_begin.iomap_apply.iomap_file_buffered_write.xfs_file_buffered_aio_write.xfs_file_write_iter
0.00 ± 0% +Inf% 9.85 ± 0% +Inf% 9.75 ± 1% perf-profile.cycles-pp.grab_cache_page_write_begin.iomap_write_begin.iomap_write_actor.iomap_apply.iomap_file_buffered_write
0.00 ± 0% +Inf% 9.25 ± 0% +Inf% 9.18 ± 1% perf-profile.cycles-pp.pagecache_get_page.grab_cache_page_write_begin.iomap_write_begin.iomap_write_actor.iomap_apply
0.00 ± 0% +Inf% 9.08 ± 0% +Inf% 9.08 ± 1% perf-profile.cycles-pp.xfs_iomap_write_delay.xfs_file_iomap_begin.iomap_apply.iomap_file_buffered_write.xfs_file_buffered_aio_write
0.00 ± 0% +Inf% 7.91 ± 1% +Inf% 7.90 ± 0% perf-profile.cycles-pp.generic_write_end.iomap_write_actor.iomap_apply.iomap_file_buffered_write.xfs_file_buffered_aio_write
0.00 ± 0% +Inf% 4.69 ± 0% +Inf% 4.66 ± 0% perf-profile.cycles-pp.block_write_end.generic_write_end.iomap_write_actor.iomap_apply.iomap_file_buffered_write
0.00 ± 0% +Inf% 4.45 ± 1% +Inf% 4.45 ± 0% perf-profile.cycles-pp.__block_commit_write.isra.24.block_write_end.generic_write_end.iomap_write_actor.iomap_apply
0.00 ± 0% +Inf% 4.14 ± 0% +Inf% 4.12 ± 1% perf-profile.cycles-pp.xfs_iomap_eof_want_preallocate.constprop.8.xfs_iomap_write_delay.xfs_file_iomap_begin.iomap_apply.iomap_file_buffered_write
0.00 ± 0% +Inf% 3.69 ± 1% +Inf% 3.69 ± 2% perf-profile.cycles-pp.add_to_page_cache_lru.pagecache_get_page.grab_cache_page_write_begin.iomap_write_begin.iomap_write_actor
0.00 ± 0% +Inf% 3.64 ± 0% +Inf% 3.62 ± 0% perf-profile.cycles-pp.__block_write_begin_int.iomap_write_begin.iomap_write_actor.iomap_apply.iomap_file_buffered_write
0.00 ± 0% +Inf% 3.44 ± 1% +Inf% 3.35 ± 2% perf-profile.cycles-pp.mark_page_accessed.iomap_write_actor.iomap_apply.iomap_file_buffered_write.xfs_file_buffered_aio_write
0.00 ± 0% +Inf% 3.04 ± 1% +Inf% 3.00 ± 3% perf-profile.cycles-pp.xfs_bmapi_read.xfs_file_iomap_begin.iomap_apply.iomap_file_buffered_write.xfs_file_buffered_aio_write
0.00 ± 0% +Inf% 3.22 ± 0% +Inf% 3.15 ± 1% perf-profile.cycles-pp.copy_user_enhanced_fast_string.iomap_write_actor.iomap_apply.iomap_file_buffered_write.xfs_file_buffered_aio_write
0.00 ± 0% +Inf% 3.06 ± 1% +Inf% 3.09 ± 0% perf-profile.cycles-pp.xfs_bmapi_delay.xfs_iomap_write_delay.xfs_file_iomap_begin.iomap_apply.iomap_file_buffered_write
0.00 ± 0% +Inf% 3.05 ± 1% +Inf% 3.05 ± 2% perf-profile.cycles-pp.xfs_bmapi_read.xfs_iomap_eof_want_preallocate.constprop.8.xfs_iomap_write_delay.xfs_file_iomap_begin.iomap_apply
0.00 ± 0% +Inf% 2.78 ± 0% +Inf% 2.83 ± 1% perf-profile.cycles-pp.mark_buffer_dirty.__block_commit_write.isra.24.block_write_end.generic_write_end.iomap_write_actor
0.00 ± 0% +Inf% 2.68 ± 2% +Inf% 2.60 ± 1% perf-profile.cycles-pp.__page_cache_alloc.pagecache_get_page.grab_cache_page_write_begin.iomap_write_begin.iomap_write_actor
0.00 ± 0% +Inf% 2.56 ± 2% +Inf% 2.46 ± 0% perf-profile.cycles-pp.alloc_pages_current.__page_cache_alloc.pagecache_get_page.grab_cache_page_write_begin.iomap_write_begin
0.00 ± 0% +Inf% 2.43 ± 0% +Inf% 2.42 ± 0% perf-profile.cycles-pp.memset_erms.iomap_write_begin.iomap_write_actor.iomap_apply.iomap_file_buffered_write
0.00 ± 0% +Inf% 1.97 ± 2% +Inf% 1.90 ± 4% perf-profile.cycles-pp.xfs_bmap_search_extents.xfs_bmapi_read.xfs_file_iomap_begin.iomap_apply.iomap_file_buffered_write
0.00 ± 0% +Inf% 1.55 ± 3% +Inf% 1.62 ± 2% perf-profile.cycles-pp.find_get_entry.pagecache_get_page.grab_cache_page_write_begin.iomap_write_begin.iomap_write_actor
0.00 ± 0% +Inf% 1.68 ± 1% +Inf% 1.66 ± 2% perf-profile.cycles-pp.__add_to_page_cache_locked.add_to_page_cache_lru.pagecache_get_page.grab_cache_page_write_begin.iomap_write_begin
0.00 ± 0% +Inf% 1.73 ± 1% +Inf% 1.71 ± 2% perf-profile.cycles-pp.xfs_bmap_search_extents.xfs_bmapi_delay.xfs_iomap_write_delay.xfs_file_iomap_begin.iomap_apply
0.00 ± 0% +Inf% 1.61 ± 2% +Inf% 1.64 ± 3% perf-profile.cycles-pp.xfs_bmap_search_extents.xfs_bmapi_read.xfs_iomap_eof_want_preallocate.constprop.8.xfs_iomap_write_delay.xfs_file_iomap_begin
0.00 ± 0% +Inf% 1.52 ± 2% +Inf% 1.51 ± 4% perf-profile.cycles-pp.workingset_activation.mark_page_accessed.iomap_write_actor.iomap_apply.iomap_file_buffered_write
0.00 ± 0% +Inf% 1.55 ± 1% +Inf% 1.55 ± 1% perf-profile.cycles-pp.lru_cache_add.add_to_page_cache_lru.pagecache_get_page.grab_cache_page_write_begin.iomap_write_begin
0.00 ± 0% +Inf% 1.53 ± 1% +Inf% 1.52 ± 1% perf-profile.cycles-pp.create_page_buffers.__block_write_begin_int.iomap_write_begin.iomap_write_actor.iomap_apply
0.00 ± 0% +Inf% 1.46 ± 1% +Inf% 1.45 ± 3% perf-profile.cycles-pp.xfs_bmap_search_multi_extents.xfs_bmap_search_extents.xfs_bmapi_read.xfs_file_iomap_begin.iomap_apply
0.00 ± 0% +Inf% 1.36 ± 1% +Inf% 1.39 ± 1% perf-profile.cycles-pp.unlock_page.generic_write_end.iomap_write_actor.iomap_apply.iomap_file_buffered_write
0.00 ± 0% +Inf% 1.18 ± 1% +Inf% 1.19 ± 1% perf-profile.cycles-pp.create_empty_buffers.create_page_buffers.__block_write_begin_int.iomap_write_begin.iomap_write_actor
0.00 ± 0% +Inf% 1.21 ± 2% +Inf% 1.23 ± 2% perf-profile.cycles-pp.xfs_bmap_search_multi_extents.xfs_bmap_search_extents.xfs_bmapi_read.xfs_iomap_eof_want_preallocate.constprop.8.xfs_iomap_write_delay
0.00 ± 0% +Inf% 1.24 ± 2% +Inf% 1.21 ± 2% perf-profile.cycles-pp.xfs_bmap_search_multi_extents.xfs_bmap_search_extents.xfs_bmapi_delay.xfs_iomap_write_delay.xfs_file_iomap_begin
0.00 ± 0% +Inf% 1.14 ± 3% +Inf% 1.16 ± 3% perf-profile.cycles-pp.xfs_ilock.xfs_file_iomap_begin.iomap_apply.iomap_file_buffered_write.xfs_file_buffered_aio_write
0.00 ± 0% +Inf% 1.09 ± 2% +Inf% 1.08 ± 1% perf-profile.cycles-pp.__mark_inode_dirty.generic_write_end.iomap_write_actor.iomap_apply.iomap_file_buffered_write
0.00 ± 0% +Inf% 0.95 ± 0% +Inf% 1.01 ± 3% perf-profile.cycles-pp.radix_tree_lookup_slot.find_get_entry.pagecache_get_page.grab_cache_page_write_begin.iomap_write_begin
43.95 ± 1% -100.0% 0.00 ± 0% -100.0% 0.00 ± 0% perf-profile.cycles-pp.generic_perform_write.xfs_file_buffered_aio_write.xfs_file_write_iter.__vfs_write.vfs_write
25.10 ± 1% -100.0% 0.00 ± 0% -100.0% 0.00 ± 0% perf-profile.cycles-pp.xfs_vm_write_begin.generic_perform_write.xfs_file_buffered_aio_write.xfs_file_write_iter.__vfs_write
13.71 ± 1% -100.0% 0.00 ± 0% -100.0% 0.00 ± 0% perf-profile.cycles-pp.__block_write_begin.xfs_vm_write_begin.generic_perform_write.xfs_file_buffered_aio_write.xfs_file_write_iter
11.03 ± 1% -100.0% 0.00 ± 0% -100.0% 0.00 ± 0% perf-profile.cycles-pp.xfs_vm_write_end.generic_perform_write.xfs_file_buffered_aio_write.xfs_file_write_iter.__vfs_write
10.68 ± 1% -100.0% 0.00 ± 0% -100.0% 0.00 ± 0% perf-profile.cycles-pp.generic_write_end.xfs_vm_write_end.generic_perform_write.xfs_file_buffered_aio_write.xfs_file_write_iter
10.96 ± 1% -100.0% 0.00 ± 0% -100.0% 0.00 ± 0% perf-profile.cycles-pp.grab_cache_page_write_begin.xfs_vm_write_begin.generic_perform_write.xfs_file_buffered_aio_write.xfs_file_write_iter
10.36 ± 1% -100.0% 0.00 ± 0% -100.0% 0.00 ± 0% perf-profile.cycles-pp.__block_write_begin_int.__block_write_begin.xfs_vm_write_begin.generic_perform_write.xfs_file_buffered_aio_write
10.37 ± 2% -100.0% 0.00 ± 0% -100.0% 0.00 ± 0% perf-profile.cycles-pp.pagecache_get_page.grab_cache_page_write_begin.xfs_vm_write_begin.generic_perform_write.xfs_file_buffered_aio_write
6.46 ± 1% -100.0% 0.00 ± 0% -100.0% 0.00 ± 0% perf-profile.cycles-pp.xfs_get_blocks.__block_write_begin_int.__block_write_begin.xfs_vm_write_begin.generic_perform_write
6.34 ± 1% -100.0% 0.00 ± 0% -100.0% 0.00 ± 0% perf-profile.cycles-pp.__xfs_get_blocks.xfs_get_blocks.__block_write_begin_int.__block_write_begin.xfs_vm_write_begin
6.24 ± 0% -100.0% 0.00 ± 0% -100.0% 0.00 ± 0% perf-profile.cycles-pp.block_write_end.generic_write_end.xfs_vm_write_end.generic_perform_write.xfs_file_buffered_aio_write
5.93 ± 0% -100.0% 0.00 ± 0% -100.0% 0.00 ± 0% perf-profile.cycles-pp.__block_commit_write.isra.24.block_write_end.generic_write_end.xfs_vm_write_end.generic_perform_write
3.95 ± 2% -100.0% 0.00 ± 0% -100.0% 0.00 ± 0% perf-profile.cycles-pp.copy_user_enhanced_fast_string.generic_perform_write.xfs_file_buffered_aio_write.xfs_file_write_iter.__vfs_write
4.02 ± 1% -100.0% 0.00 ± 0% -100.0% 0.00 ± 0% perf-profile.cycles-pp.add_to_page_cache_lru.pagecache_get_page.grab_cache_page_write_begin.xfs_vm_write_begin.generic_perform_write
3.39 ± 1% -100.0% 0.00 ± 0% -100.0% 0.00 ± 0% perf-profile.cycles-pp.mark_buffer_dirty.__block_commit_write.isra.24.block_write_end.generic_write_end.xfs_vm_write_end
3.28 ± 2% -100.0% 0.00 ± 0% -100.0% 0.00 ± 0% perf-profile.cycles-pp.xfs_iomap_write_delay.__xfs_get_blocks.xfs_get_blocks.__block_write_begin_int.__block_write_begin
3.03 ± 0% -100.0% 0.00 ± 0% -100.0% 0.00 ± 0% perf-profile.cycles-pp.memset_erms.__block_write_begin.xfs_vm_write_begin.generic_perform_write.xfs_file_buffered_aio_write
3.04 ± 3% -100.0% 0.00 ± 0% -100.0% 0.00 ± 0% perf-profile.cycles-pp.__page_cache_alloc.pagecache_get_page.grab_cache_page_write_begin.xfs_vm_write_begin.generic_perform_write
2.91 ± 3% -100.0% 0.00 ± 0% -100.0% 0.00 ± 0% perf-profile.cycles-pp.alloc_pages_current.__page_cache_alloc.pagecache_get_page.grab_cache_page_write_begin.xfs_vm_write_begin
1.86 ± 2% -100.0% 0.00 ± 0% -100.0% 0.00 ± 0% perf-profile.cycles-pp.create_page_buffers.__block_write_begin_int.__block_write_begin.xfs_vm_write_begin.generic_perform_write
1.72 ± 4% -100.0% 0.00 ± 0% -100.0% 0.00 ± 0% perf-profile.cycles-pp.unlock_page.generic_write_end.xfs_vm_write_end.generic_perform_write.xfs_file_buffered_aio_write
1.80 ± 1% -100.0% 0.00 ± 0% -100.0% 0.00 ± 0% perf-profile.cycles-pp.__add_to_page_cache_locked.add_to_page_cache_lru.pagecache_get_page.grab_cache_page_write_begin.xfs_vm_write_begin
1.83 ± 2% -100.0% 0.00 ± 0% -100.0% 0.00 ± 0% perf-profile.cycles-pp.find_get_entry.pagecache_get_page.grab_cache_page_write_begin.xfs_vm_write_begin.generic_perform_write
1.72 ± 2% -100.0% 0.00 ± 0% -100.0% 0.00 ± 0% perf-profile.cycles-pp.lru_cache_add.add_to_page_cache_lru.pagecache_get_page.grab_cache_page_write_begin.xfs_vm_write_begin
1.44 ± 3% -100.0% 0.00 ± 0% -100.0% 0.00 ± 0% perf-profile.cycles-pp.create_empty_buffers.create_page_buffers.__block_write_begin_int.__block_write_begin.xfs_vm_write_begin
1.32 ± 4% -100.0% 0.00 ± 0% -100.0% 0.00 ± 0% perf-profile.cycles-pp.__mark_inode_dirty.generic_write_end.xfs_vm_write_end.generic_perform_write.xfs_file_buffered_aio_write
1.25 ± 0% -100.0% 0.00 ± 0% -100.0% 0.00 ± 0% perf-profile.cycles-pp.xfs_bmapi_delay.xfs_iomap_write_delay.__xfs_get_blocks.xfs_get_blocks.__block_write_begin_int
1.23 ± 4% -100.0% 0.00 ± 0% -100.0% 0.00 ± 0% perf-profile.cycles-pp.xfs_iomap_eof_want_preallocate.constprop.6.xfs_iomap_write_delay.__xfs_get_blocks.xfs_get_blocks.__block_write_begin_int
1.17 ± 3% -100.0% 0.00 ± 0% -100.0% 0.00 ± 0% perf-profile.cycles-pp.radix_tree_lookup_slot.find_get_entry.pagecache_get_page.grab_cache_page_write_begin.xfs_vm_write_begin
1.04 ± 0% -100.0% 0.00 ± 0% -100.0% 0.00 ± 0% perf-profile.cycles-pp.xfs_bmapi_read.__xfs_get_blocks.xfs_get_blocks.__block_write_begin_int.__block_write_begin
0.98 ± 5% -100.0% 0.00 ± 0% -100.0% 0.00 ± 0% perf-profile.cycles-pp.alloc_page_buffers.create_empty_buffers.create_page_buffers.__block_write_begin_int.__block_write_begin
1.79 ± 2% -28.2% 1.28 ± 3% -27.8% 1.29 ± 4% perf-profile.cycles-pp.do_unlinkat.sys_unlink.entry_SYSCALL_64_fastpath
1.79 ± 3% -27.9% 1.29 ± 3% -27.7% 1.30 ± 4% perf-profile.cycles-pp.sys_unlink.entry_SYSCALL_64_fastpath
1.27 ± 0% -22.5% 0.99 ± 4% -24.6% 0.96 ± 4% perf-profile.cycles-pp.destroy_inode.evict.iput.__dentry_kill.dput
2.61 ± 1% -24.3% 1.98 ± 1% -24.1% 1.98 ± 1% perf-profile.cycles-pp.do_filp_open.do_sys_open.sys_creat.entry_SYSCALL_64_fastpath
2.58 ± 1% -24.1% 1.96 ± 0% -24.1% 1.96 ± 1% perf-profile.cycles-pp.path_openat.do_filp_open.do_sys_open.sys_creat.entry_SYSCALL_64_fastpath
1.07 ± 3% -23.3% 0.82 ± 3% -21.1% 0.85 ± 2% perf-profile.cycles-pp.down_write.xfs_file_buffered_aio_write.xfs_file_write_iter.__vfs_write.vfs_write
2.66 ± 1% -24.3% 2.01 ± 1% -23.7% 2.03 ± 1% perf-profile.cycles-pp.do_sys_open.sys_creat.entry_SYSCALL_64_fastpath
2.67 ± 1% -24.2% 2.02 ± 1% -23.8% 2.03 ± 1% perf-profile.cycles-pp.sys_creat.entry_SYSCALL_64_fastpath
1.24 ± 1% -23.1% 0.95 ± 4% -24.2% 0.94 ± 4% perf-profile.cycles-pp.xfs_fs_destroy_inode.destroy_inode.evict.iput.__dentry_kill
1.21 ± 1% -23.4% 0.93 ± 4% -24.7% 0.91 ± 4% perf-profile.cycles-pp.xfs_inactive.xfs_fs_destroy_inode.destroy_inode.evict.iput
0.94 ± 4% -19.8% 0.76 ± 0% -21.6% 0.74 ± 4% perf-profile.cycles-pp.cancel_dirty_page.try_to_free_buffers.xfs_vm_releasepage.try_to_release_page.block_invalidatepage
1.32 ± 2% -21.5% 1.04 ± 1% -22.2% 1.03 ± 3% perf-profile.cycles-pp.xfs_create.xfs_generic_create.xfs_vn_mknod.xfs_vn_create.path_openat
1.42 ± 2% -20.7% 1.13 ± 1% -22.1% 1.11 ± 3% perf-profile.cycles-pp.xfs_vn_create.path_openat.do_filp_open.do_sys_open.sys_creat
2.35 ± 1% -21.0% 1.86 ± 1% -20.7% 1.86 ± 1% perf-profile.cycles-pp.xfs_vm_releasepage.try_to_release_page.block_invalidatepage.xfs_vm_invalidatepage.truncate_inode_page
1.91 ± 3% -16.4% 1.59 ± 1% -19.9% 1.53 ± 1% perf-profile.cycles-pp.get_page_from_freelist.__alloc_pages_nodemask.alloc_pages_current.__page_cache_alloc.pagecache_get_page
2.07 ± 1% -20.4% 1.65 ± 2% -19.9% 1.66 ± 1% perf-profile.cycles-pp.try_to_free_buffers.xfs_vm_releasepage.try_to_release_page.block_invalidatepage.xfs_vm_invalidatepage
1.42 ± 2% -20.5% 1.13 ± 1% -22.1% 1.10 ± 2% perf-profile.cycles-pp.xfs_vn_mknod.xfs_vn_create.path_openat.do_filp_open.do_sys_open
1.42 ± 2% -21.2% 1.12 ± 1% -22.4% 1.10 ± 3% perf-profile.cycles-pp.xfs_generic_create.xfs_vn_mknod.xfs_vn_create.path_openat.do_filp_open
1.12 ± 2% -17.6% 0.92 ± 4% -22.3% 0.87 ± 4% perf-profile.cycles-pp.__sb_start_write.vfs_write.sys_write.entry_SYSCALL_64_fastpath
2.40 ± 1% -21.0% 1.89 ± 2% -20.6% 1.90 ± 1% perf-profile.cycles-pp.try_to_release_page.block_invalidatepage.xfs_vm_invalidatepage.truncate_inode_page.truncate_inode_pages_range
1.29 ± 3% -18.9% 1.04 ± 1% -17.9% 1.06 ± 1% perf-profile.cycles-pp.xfs_ilock.xfs_file_buffered_aio_write.xfs_file_write_iter.__vfs_write.vfs_write
3.42 ± 0% -20.9% 2.71 ± 2% -20.3% 2.73 ± 2% perf-profile.cycles-pp.block_invalidatepage.xfs_vm_invalidatepage.truncate_inode_page.truncate_inode_pages_range.truncate_inode_pages_final
5.96 ± 1% -20.0% 4.77 ± 0% -19.4% 4.81 ± 1% perf-profile.cycles-pp.truncate_inode_page.truncate_inode_pages_range.truncate_inode_pages_final.evict.iput
3.54 ± 0% -20.8% 2.81 ± 1% -20.0% 2.83 ± 2% perf-profile.cycles-pp.xfs_vm_invalidatepage.truncate_inode_page.truncate_inode_pages_range.truncate_inode_pages_final.evict
2.55 ± 3% -14.2% 2.19 ± 2% -17.5% 2.10 ± 1% perf-profile.cycles-pp.__alloc_pages_nodemask.alloc_pages_current.__page_cache_alloc.pagecache_get_page.grab_cache_page_write_begin
1.04 ± 2% -18.9% 0.84 ± 1% -19.6% 0.84 ± 0% perf-profile.cycles-pp.__delete_from_page_cache.delete_from_page_cache.truncate_inode_page.truncate_inode_pages_range.truncate_inode_pages_final
1.74 ± 2% -19.9% 1.40 ± 3% -19.3% 1.41 ± 1% perf-profile.cycles-pp.delete_from_page_cache.truncate_inode_page.truncate_inode_pages_range.truncate_inode_pages_final.evict
1.01 ± 3% -17.9% 0.83 ± 2% -18.2% 0.82 ± 1% perf-profile.cycles-pp.down_write.xfs_ilock.xfs_file_buffered_aio_write.xfs_file_write_iter.__vfs_write
11.21 ± 2% -18.1% 9.18 ± 0% -18.4% 9.14 ± 1% perf-profile.cycles-pp.evict.iput.__dentry_kill.dput.__fput
11.24 ± 2% -18.1% 9.21 ± 0% -18.4% 9.18 ± 1% perf-profile.cycles-pp.__dentry_kill.dput.__fput.____fput.task_work_run
11.22 ± 2% -18.1% 9.19 ± 0% -18.4% 9.16 ± 1% perf-profile.cycles-pp.iput.__dentry_kill.dput.__fput.____fput
1.79 ± 3% -22.2% 1.39 ± 0% -18.2% 1.46 ± 0% perf-profile.cycles-pp.security_file_permission.rw_verify_area.vfs_write.sys_write.entry_SYSCALL_64_fastpath
11.26 ± 2% -18.1% 9.23 ± 0% -18.3% 9.20 ± 1% perf-profile.cycles-pp.dput.__fput.____fput.task_work_run.exit_to_usermode_loop
11.31 ± 1% -18.1% 9.27 ± 0% -18.2% 9.25 ± 1% perf-profile.cycles-pp.____fput.task_work_run.exit_to_usermode_loop.syscall_return_slowpath.entry_SYSCALL_64_fastpath
11.34 ± 2% -18.1% 9.29 ± 0% -18.3% 9.27 ± 1% perf-profile.cycles-pp.exit_to_usermode_loop.syscall_return_slowpath.entry_SYSCALL_64_fastpath
11.31 ± 2% -18.1% 9.26 ± 0% -18.3% 9.24 ± 1% perf-profile.cycles-pp.__fput.____fput.task_work_run.exit_to_usermode_loop.syscall_return_slowpath
11.32 ± 1% -18.0% 9.28 ± 0% -18.2% 9.26 ± 1% perf-profile.cycles-pp.task_work_run.exit_to_usermode_loop.syscall_return_slowpath.entry_SYSCALL_64_fastpath
11.34 ± 1% -18.1% 9.29 ± 0% -18.2% 9.27 ± 1% perf-profile.cycles-pp.syscall_return_slowpath.entry_SYSCALL_64_fastpath
2.06 ± 3% -22.5% 1.60 ± 2% -18.1% 1.69 ± 0% perf-profile.cycles-pp.rw_verify_area.vfs_write.sys_write.entry_SYSCALL_64_fastpath
9.87 ± 2% -17.5% 8.15 ± 0% -17.6% 8.14 ± 1% perf-profile.cycles-pp.truncate_inode_pages_range.truncate_inode_pages_final.evict.iput.__dentry_kill
9.89 ± 2% -17.4% 8.17 ± 0% -17.5% 8.16 ± 1% perf-profile.cycles-pp.truncate_inode_pages_final.evict.iput.__dentry_kill.dput
1.00 ± 1% -18.0% 0.82 ± 1% -14.3% 0.86 ± 3% perf-profile.cycles-pp.__radix_tree_lookup.radix_tree_lookup_slot.find_get_entry.pagecache_get_page.grab_cache_page_write_begin
51.83 ± 1% +14.3% 59.25 ± 0% +13.8% 58.97 ± 0% perf-profile.cycles-pp.xfs_file_buffered_aio_write.xfs_file_write_iter.__vfs_write.vfs_write.sys_write
1.38 ± 2% -13.3% 1.19 ± 1% -9.9% 1.24 ± 2% perf-profile.cycles-pp.__set_page_dirty.mark_buffer_dirty.__block_commit_write.isra.24.block_write_end.generic_write_end
53.16 ± 1% +13.6% 60.40 ± 0% +13.0% 60.10 ± 0% perf-profile.cycles-pp.xfs_file_write_iter.__vfs_write.vfs_write.sys_write.entry_SYSCALL_64_fastpath
54.10 ± 1% +13.1% 61.20 ± 0% +12.5% 60.86 ± 0% perf-profile.cycles-pp.__vfs_write.vfs_write.sys_write.entry_SYSCALL_64_fastpath
1.32 ± 4% -21.4% 1.04 ± 0% -14.9% 1.13 ± 1% perf-profile.cycles-pp.selinux_file_permission.security_file_permission.rw_verify_area.vfs_write.sys_write
19.79 ± 5% -9.9% 17.84 ± 0% -7.5% 18.31 ± 3% perf-profile.cycles-pp.start_secondary
19.75 ± 5% -9.8% 17.81 ± 0% -7.4% 18.28 ± 3% perf-profile.cycles-pp.cpu_startup_entry.start_secondary
2.50 ± 3% -11.5% 2.21 ± 0% -13.1% 2.17 ± 0% perf-profile.cycles-pp.__pagevec_release.truncate_inode_pages_range.truncate_inode_pages_final.evict.iput
2.39 ± 3% -11.2% 2.12 ± 0% -13.0% 2.08 ± 0% perf-profile.cycles-pp.release_pages.__pagevec_release.truncate_inode_pages_range.truncate_inode_pages_final.evict
59.63 ± 1% +10.2% 65.72 ± 0% +9.7% 65.43 ± 0% perf-profile.cycles-pp.vfs_write.sys_write.entry_SYSCALL_64_fastpath
0.00 ± 0% +Inf% 1.91 ± 1% +Inf% 1.83 ± 1% perf-profile.func.cycles-pp.mark_page_accessed
0.00 ± 0% +Inf% 1.12 ± 1% +Inf% 1.12 ± 0% perf-profile.func.cycles-pp.iomap_write_actor
0.00 ± 0% +Inf% 1.10 ± 3% +Inf% 1.10 ± 2% perf-profile.func.cycles-pp.xfs_iomap_eof_want_preallocate.constprop.8
1.30 ± 2% -100.0% 0.00 ± 0% -100.0% 0.00 ± 0% perf-profile.func.cycles-pp.generic_perform_write
1.08 ± 2% -100.0% 0.00 ± 0% -100.0% 0.00 ± 0% perf-profile.func.cycles-pp.__xfs_get_blocks
0.37 ± 2% +243.6% 1.26 ± 2% +236.4% 1.23 ± 0% perf-profile.func.cycles-pp.xfs_bmap_search_extents
0.70 ± 5% +219.5% 2.24 ± 0% +213.8% 2.20 ± 2% perf-profile.func.cycles-pp.xfs_bmapi_read
0.41 ± 1% +198.4% 1.22 ± 2% +190.2% 1.19 ± 1% perf-profile.func.cycles-pp.xfs_bmap_search_multi_extents
0.64 ± 1% +182.8% 1.81 ± 4% +181.3% 1.80 ± 0% perf-profile.func.cycles-pp.xfs_iext_bno_to_ext
0.46 ± 4% +161.6% 1.20 ± 1% +163.0% 1.21 ± 1% perf-profile.func.cycles-pp.xfs_iomap_write_delay
1.31 ± 2% -46.7% 0.70 ± 0% -46.9% 0.69 ± 2% perf-profile.func.cycles-pp.generic_write_end
2.49 ± 0% -34.5% 1.63 ± 1% -36.0% 1.59 ± 1% perf-profile.func.cycles-pp.__block_commit_write.isra.24
1.50 ± 1% -20.9% 1.19 ± 1% -21.3% 1.18 ± 1% perf-profile.func.cycles-pp.mark_buffer_dirty
3.24 ± 0% -19.8% 2.60 ± 0% -20.0% 2.59 ± 0% perf-profile.func.cycles-pp.memset_erms
3.96 ± 2% -18.4% 3.23 ± 0% -20.3% 3.16 ± 1% perf-profile.func.cycles-pp.copy_user_enhanced_fast_string
1.79 ± 4% -16.8% 1.49 ± 1% -19.6% 1.44 ± 0% perf-profile.func.cycles-pp.__mark_inode_dirty
1.41 ± 3% -20.6% 1.12 ± 3% -21.3% 1.11 ± 3% perf-profile.func.cycles-pp.entry_SYSCALL_64_fastpath
1.16 ± 0% -18.1% 0.95 ± 1% -18.1% 0.95 ± 3% perf-profile.func.cycles-pp._raw_spin_lock
1.16 ± 1% -21.6% 0.91 ± 1% -20.1% 0.93 ± 2% perf-profile.func.cycles-pp.vfs_write
1.75 ± 2% -18.9% 1.42 ± 1% -17.7% 1.44 ± 2% perf-profile.func.cycles-pp.unlock_page
1.32 ± 0% -16.4% 1.10 ± 1% -14.1% 1.13 ± 3% perf-profile.func.cycles-pp.__radix_tree_lookup
1.51 ± 2% +15.4% 1.75 ± 1% +15.9% 1.75 ± 0% perf-profile.func.cycles-pp.__block_write_begin_int
1.02 ± 4% -7.5% 0.94 ± 2% -12.4% 0.89 ± 2% perf-profile.func.cycles-pp.pagecache_get_page
1.05 ± 2% -15.6% 0.88 ± 3% -15.6% 0.88 ± 5% perf-profile.func.cycles-pp.xfs_file_write_iter

raw perf profile data:

"perf-profile.func.cycles-pp.intel_idle": 17.0,
"perf-profile.func.cycles-pp.copy_user_enhanced_fast_string": 3.15,
"perf-profile.func.cycles-pp.memset_erms": 2.59,
"perf-profile.func.cycles-pp.xfs_bmapi_read": 2.25,
"perf-profile.func.cycles-pp.___might_sleep": 2.13,
"perf-profile.func.cycles-pp.mark_page_accessed": 1.8,
"perf-profile.func.cycles-pp.xfs_iext_bno_to_ext": 1.8,
"perf-profile.func.cycles-pp.__block_write_begin_int": 1.74,
"perf-profile.func.cycles-pp.__block_commit_write.isra.24": 1.62,
"perf-profile.func.cycles-pp.up_write": 1.62,
"perf-profile.func.cycles-pp.down_write": 1.48,
"perf-profile.func.cycles-pp.unlock_page": 1.48,
"perf-profile.func.cycles-pp.__mark_inode_dirty": 1.43,
"perf-profile.func.cycles-pp.xfs_iomap_write_delay": 1.23,
"perf-profile.func.cycles-pp.xfs_bmap_search_extents": 1.23,
"perf-profile.func.cycles-pp.__radix_tree_lookup": 1.19,
"perf-profile.func.cycles-pp.xfs_bmap_search_multi_extents": 1.18,
"perf-profile.func.cycles-pp.__might_sleep": 1.17,
"perf-profile.func.cycles-pp.mark_buffer_dirty": 1.16,
"perf-profile.func.cycles-pp.entry_SYSCALL_64_fastpath": 1.12,
"perf-profile.func.cycles-pp.iomap_write_actor": 1.11,
"perf-profile.func.cycles-pp.xfs_iomap_eof_want_preallocate.constprop.8": 1.07,
"perf-profile.func.cycles-pp._raw_spin_lock": 1.0,
"perf-profile.func.cycles-pp.vfs_write": 0.94,
"perf-profile.func.cycles-pp.pagecache_get_page": 0.92,
"perf-profile.func.cycles-pp.xfs_bmapi_delay": 0.92,
"perf-profile.func.cycles-pp.xfs_file_iomap_begin": 0.91,
"perf-profile.func.cycles-pp.xfs_file_write_iter": 0.9,
"perf-profile.func.cycles-pp.workingset_activation": 0.82,
"perf-profile.func.cycles-pp.iomap_apply": 0.77,
"perf-profile.func.cycles-pp.xfs_bmapi_trim_map.isra.14": 0.75,
"perf-profile.func.cycles-pp.xfs_file_buffered_aio_write": 0.74,
"perf-profile.func.cycles-pp.mem_cgroup_zone_lruvec": 0.73,
"perf-profile.func.cycles-pp.native_queued_spin_lock_slowpath": 0.73,
"perf-profile.func.cycles-pp.get_page_from_freelist": 0.72,
"perf-profile.func.cycles-pp.generic_write_end": 0.71,
"perf-profile.func.cycles-pp.__vfs_write": 0.66,
"perf-profile.func.cycles-pp.rwsem_spin_on_owner": 0.66,
"perf-profile.func.cycles-pp.iov_iter_copy_from_user_atomic": 0.66,
"perf-profile.func.cycles-pp.release_pages": 0.65,
"perf-profile.func.cycles-pp.find_get_entry": 0.65,

Thanks,
Xiaolong