Re: [LKP] [f2fs] 089842de57: aim7.jobs-per-min 15.4% improvement

From: Chao Yu
Date: Tue Dec 11 2018 - 05:12:39 EST


Hi all,

The commit only clean up codes which are unused currently, so why we can
improve performance with it? could you retest to make sure?

Thanks,

On 2018/12/11 17:59, kernel test robot wrote:
> Greeting,
>
> FYI, we noticed a 15.4% improvement of aim7.jobs-per-min due to commit:
>
>
> commit: 089842de5750f434aa016eb23f3d3a3a151083bd ("f2fs: remove codes of unused wio_mutex")
> https://git.kernel.org/cgit/linux/kernel/git/jaegeuk/f2fs.git dev-test
>
> in testcase: aim7
> on test machine: 40 threads Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz with 384G memory
> with following parameters:
>
> disk: 4BRD_12G
> md: RAID1
> fs: f2fs
> test: disk_rw
> load: 3000
> cpufreq_governor: performance
>
> test-description: AIM7 is a traditional UNIX system level benchmark suite which is used to test and measure the performance of multiuser system.
> test-url: https://sourceforge.net/projects/aimbench/files/aim-suite7/
>
> In addition to that, the commit also has significant impact on the following tests:
>
> +------------------+-----------------------------------------------------------------------+
> | testcase: change | aim7: aim7.jobs-per-min 8.8% improvement |
> | test machine | 40 threads Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz with 384G memory |
> | test parameters | cpufreq_governor=performance |
> | | disk=4BRD_12G |
> | | fs=f2fs |
> | | load=3000 |
> | | md=RAID1 |
> | | test=disk_rr |
> +------------------+-----------------------------------------------------------------------+
>
>
> Details are as below:
> -------------------------------------------------------------------------------------------------->
>
>
> To reproduce:
>
> git clone https://github.com/intel/lkp-tests.git
> cd lkp-tests
> bin/lkp install job.yaml # job file is attached in this email
> bin/lkp run job.yaml
>
> =========================================================================================
> compiler/cpufreq_governor/disk/fs/kconfig/load/md/rootfs/tbox_group/test/testcase:
> gcc-7/performance/4BRD_12G/f2fs/x86_64-rhel-7.2/3000/RAID1/debian-x86_64-2018-04-03.cgz/lkp-ivb-ep01/disk_rw/aim7
>
> commit:
> d6c66cd19e ("f2fs: fix count of seg_freed to make sec_freed correct")
> 089842de57 ("f2fs: remove codes of unused wio_mutex")
>
> d6c66cd19ef322fe 089842de5750f434aa016eb23f
> ---------------- --------------------------
> %stddev %change %stddev
> \ | \
> 96213 +15.4% 110996 aim7.jobs-per-min
> 191.50 ± 3% -15.1% 162.52 aim7.time.elapsed_time
> 191.50 ± 3% -15.1% 162.52 aim7.time.elapsed_time.max
> 1090253 ± 2% -17.5% 899165 aim7.time.involuntary_context_switches
> 176713 -7.5% 163478 aim7.time.minor_page_faults
> 6882 -14.6% 5875 aim7.time.system_time
> 127.97 +4.7% 134.00 aim7.time.user_time
> 760923 +7.1% 814632 aim7.time.voluntary_context_switches
> 78499 ± 2% -11.2% 69691 interrupts.CAL:Function_call_interrupts
> 3183861 ± 4% -16.7% 2651390 ± 4% softirqs.TIMER
> 191.54 ± 13% +45.4% 278.59 ± 12% iostat.md0.w/s
> 6118 ± 3% +16.5% 7126 ± 2% iostat.md0.wkB/s
> 151257 ± 2% -10.1% 135958 ± 2% meminfo.AnonHugePages
> 46754 ± 3% +14.0% 53307 ± 3% meminfo.max_used_kB
> 0.03 ± 62% -0.0 0.01 ± 78% mpstat.cpu.soft%
> 1.73 ± 3% +0.4 2.13 ± 3% mpstat.cpu.usr%
> 16062961 ± 2% -12.1% 14124403 ± 2% turbostat.IRQ
> 0.76 ± 37% -71.8% 0.22 ± 83% turbostat.Pkg%pc6
> 9435 ± 7% -18.1% 7730 ± 4% turbostat.SMI
> 6113 ± 3% +16.5% 7120 ± 2% vmstat.io.bo
> 11293 ± 2% +12.3% 12688 ± 2% vmstat.system.cs
> 81879 ± 2% +2.5% 83951 vmstat.system.in
> 2584 -4.4% 2469 ± 2% proc-vmstat.nr_active_file
> 2584 -4.4% 2469 ± 2% proc-vmstat.nr_zone_active_file
> 28564 ± 4% -23.6% 21817 ± 12% proc-vmstat.numa_hint_faults
> 10958 ± 5% -43.9% 6147 ± 26% proc-vmstat.numa_hint_faults_local
> 660531 ± 3% -10.7% 590059 ± 2% proc-vmstat.pgfault
> 1191 ± 7% -16.5% 995.25 ± 12% slabinfo.UNIX.active_objs
> 1191 ± 7% -16.5% 995.25 ± 12% slabinfo.UNIX.num_objs
> 10552 ± 4% -7.8% 9729 slabinfo.ext4_io_end.active_objs
> 10552 ± 4% -7.8% 9729 slabinfo.ext4_io_end.num_objs
> 18395 +12.3% 20656 ± 8% slabinfo.kmalloc-32.active_objs
> 18502 ± 2% +12.3% 20787 ± 8% slabinfo.kmalloc-32.num_objs
> 1.291e+12 -12.3% 1.131e+12 perf-stat.branch-instructions
> 0.66 +0.1 0.76 ± 3% perf-stat.branch-miss-rate%
> 1.118e+10 ± 4% -7.5% 1.034e+10 perf-stat.cache-misses
> 2.772e+10 ± 8% -6.6% 2.589e+10 perf-stat.cache-references
> 2214958 -3.6% 2136237 perf-stat.context-switches
> 3.95 ± 2% -5.8% 3.72 perf-stat.cpi
> 2.24e+13 -16.4% 1.873e+13 perf-stat.cpu-cycles
> 1.542e+12 -10.4% 1.382e+12 perf-stat.dTLB-loads
> 0.18 ± 6% +0.0 0.19 ± 4% perf-stat.dTLB-store-miss-rate%
> 5.667e+12 -11.3% 5.029e+12 perf-stat.instructions
> 5534 -13.1% 4809 ± 6% perf-stat.instructions-per-iTLB-miss
> 0.25 ± 2% +6.1% 0.27 perf-stat.ipc
> 647970 ± 2% -10.7% 578955 ± 2% perf-stat.minor-faults
> 2.783e+09 ± 18% -17.8% 2.288e+09 ± 4% perf-stat.node-loads
> 5.706e+09 ± 2% -5.2% 5.407e+09 perf-stat.node-store-misses
> 7.693e+09 -4.4% 7.352e+09 perf-stat.node-stores
> 647979 ± 2% -10.7% 578955 ± 2% perf-stat.page-faults
> 70960 ± 16% -26.6% 52062 sched_debug.cfs_rq:/.exec_clock.avg
> 70628 ± 16% -26.7% 51787 sched_debug.cfs_rq:/.exec_clock.min
> 22499 ± 3% -10.5% 20133 ± 3% sched_debug.cfs_rq:/.load.avg
> 7838 ± 23% -67.6% 2536 ± 81% sched_debug.cfs_rq:/.load.min
> 362.19 ± 12% +58.3% 573.50 ± 25% sched_debug.cfs_rq:/.load_avg.max
> 3092960 ± 16% -28.5% 2211400 sched_debug.cfs_rq:/.min_vruntime.avg
> 3244162 ± 15% -27.0% 2367437 ± 2% sched_debug.cfs_rq:/.min_vruntime.max
> 2984299 ± 16% -28.9% 2121271 sched_debug.cfs_rq:/.min_vruntime.min
> 0.73 ± 4% -65.7% 0.25 ± 57% sched_debug.cfs_rq:/.nr_running.min
> 0.12 ± 13% +114.6% 0.26 ± 9% sched_debug.cfs_rq:/.nr_running.stddev
> 8.44 ± 23% -36.8% 5.33 ± 15% sched_debug.cfs_rq:/.nr_spread_over.max
> 1.49 ± 21% -29.6% 1.05 ± 7% sched_debug.cfs_rq:/.nr_spread_over.stddev
> 16.53 ± 20% -38.8% 10.12 ± 23% sched_debug.cfs_rq:/.runnable_load_avg.avg
> 15259 ± 7% -33.3% 10176 ± 22% sched_debug.cfs_rq:/.runnable_weight.avg
> 796.65 ± 93% -74.8% 200.68 ± 17% sched_debug.cfs_rq:/.util_est_enqueued.avg
> 669258 ± 3% -13.3% 580068 sched_debug.cpu.avg_idle.avg
> 116020 ± 12% -21.4% 91239 sched_debug.cpu.clock.avg
> 116076 ± 12% -21.4% 91261 sched_debug.cpu.clock.max
> 115967 ± 12% -21.3% 91215 sched_debug.cpu.clock.min
> 116020 ± 12% -21.4% 91239 sched_debug.cpu.clock_task.avg
> 116076 ± 12% -21.4% 91261 sched_debug.cpu.clock_task.max
> 115967 ± 12% -21.3% 91215 sched_debug.cpu.clock_task.min
> 15.41 ± 4% -32.0% 10.48 ± 24% sched_debug.cpu.cpu_load[0].avg
> 15.71 ± 6% -26.6% 11.53 ± 22% sched_debug.cpu.cpu_load[1].avg
> 16.20 ± 8% -22.9% 12.49 ± 21% sched_debug.cpu.cpu_load[2].avg
> 16.92 ± 7% -21.2% 13.33 ± 21% sched_debug.cpu.cpu_load[3].avg
> 2650 ± 6% -15.6% 2238 ± 3% sched_debug.cpu.curr->pid.avg
> 1422 ± 8% -68.5% 447.42 ± 57% sched_debug.cpu.curr->pid.min
> 7838 ± 23% -67.6% 2536 ± 81% sched_debug.cpu.load.min
> 86066 ± 14% -26.3% 63437 sched_debug.cpu.nr_load_updates.min
> 3.97 ± 88% -70.9% 1.15 ± 10% sched_debug.cpu.nr_running.avg
> 0.73 ± 4% -65.7% 0.25 ± 57% sched_debug.cpu.nr_running.min
> 1126 ± 16% -27.6% 816.02 ± 9% sched_debug.cpu.sched_count.stddev
> 1468 ± 16% +31.1% 1925 ± 5% sched_debug.cpu.sched_goidle.avg
> 1115 ± 16% +37.8% 1538 ± 4% sched_debug.cpu.sched_goidle.min
> 3979 ± 13% -27.4% 2888 ± 5% sched_debug.cpu.ttwu_local.max
> 348.96 ± 8% -26.3% 257.16 ± 13% sched_debug.cpu.ttwu_local.stddev
> 115966 ± 12% -21.3% 91214 sched_debug.cpu_clk
> 113505 ± 12% -21.8% 88773 sched_debug.ktime
> 116416 ± 12% -21.3% 91663 sched_debug.sched_clk
> 0.26 ±100% +0.3 0.57 ± 6% perf-profile.calltrace.cycles-pp.security_file_permission.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 0.29 ±100% +0.4 0.66 ± 5% perf-profile.calltrace.cycles-pp.find_get_entry.pagecache_get_page.f2fs_write_begin.generic_perform_write.__generic_file_write_iter
> 0.67 ± 65% +0.4 1.11 perf-profile.calltrace.cycles-pp.copy_user_enhanced_fast_string.copyin.iov_iter_copy_from_user_atomic.generic_perform_write.__generic_file_write_iter
> 0.69 ± 65% +0.5 1.14 perf-profile.calltrace.cycles-pp.copyin.iov_iter_copy_from_user_atomic.generic_perform_write.__generic_file_write_iter.f2fs_file_write_iter
> 1.07 ± 57% +0.5 1.61 ± 5% perf-profile.calltrace.cycles-pp.pagecache_get_page.f2fs_write_begin.generic_perform_write.__generic_file_write_iter.f2fs_file_write_iter
> 0.79 ± 64% +0.5 1.33 perf-profile.calltrace.cycles-pp.iov_iter_copy_from_user_atomic.generic_perform_write.__generic_file_write_iter.f2fs_file_write_iter.__vfs_write
> 0.73 ± 63% +0.6 1.32 ± 3% perf-profile.calltrace.cycles-pp.syscall_return_via_sysret
> 0.81 ± 63% +0.6 1.43 ± 3% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64
> 0.06 ± 58% +0.0 0.09 ± 4% perf-profile.children.cycles-pp.__pagevec_lru_add_fn
> 0.05 ± 58% +0.0 0.09 ± 13% perf-profile.children.cycles-pp.down_write_trylock
> 0.06 ± 58% +0.0 0.10 ± 4% perf-profile.children.cycles-pp.__x64_sys_write
> 0.07 ± 58% +0.0 0.11 ± 3% perf-profile.children.cycles-pp.account_page_dirtied
> 0.04 ± 57% +0.0 0.09 ± 5% perf-profile.children.cycles-pp.account_page_cleaned
> 0.06 ± 58% +0.0 0.10 ± 7% perf-profile.children.cycles-pp.free_pcppages_bulk
> 0.10 ± 58% +0.1 0.15 ± 6% perf-profile.children.cycles-pp.page_mapping
> 0.09 ± 57% +0.1 0.14 ± 7% perf-profile.children.cycles-pp.__lru_cache_add
> 0.10 ± 57% +0.1 0.15 ± 9% perf-profile.children.cycles-pp.__might_sleep
> 0.12 ± 58% +0.1 0.19 ± 3% perf-profile.children.cycles-pp.set_page_dirty
> 0.08 ± 64% +0.1 0.15 ± 10% perf-profile.children.cycles-pp.dquot_claim_space_nodirty
> 0.06 ± 61% +0.1 0.13 ± 5% perf-profile.children.cycles-pp.percpu_counter_add_batch
> 0.18 ± 57% +0.1 0.27 ± 2% perf-profile.children.cycles-pp.iov_iter_fault_in_readable
> 0.17 ± 57% +0.1 0.26 ± 2% perf-profile.children.cycles-pp.__set_page_dirty_nobuffers
> 0.09 ± 57% +0.1 0.18 ± 27% perf-profile.children.cycles-pp.free_unref_page_list
> 0.16 ± 58% +0.1 0.30 ± 18% perf-profile.children.cycles-pp.__pagevec_release
> 0.30 ± 57% +0.1 0.43 ± 5% perf-profile.children.cycles-pp.add_to_page_cache_lru
> 0.17 ± 58% +0.1 0.31 ± 16% perf-profile.children.cycles-pp.release_pages
> 0.29 ± 58% +0.2 0.45 ± 7% perf-profile.children.cycles-pp.selinux_file_permission
> 0.38 ± 57% +0.2 0.58 ± 6% perf-profile.children.cycles-pp.security_file_permission
> 0.78 ± 57% +0.3 1.12 perf-profile.children.cycles-pp.copy_user_enhanced_fast_string
> 0.80 ± 57% +0.3 1.15 perf-profile.children.cycles-pp.copyin
> 0.92 ± 57% +0.4 1.34 perf-profile.children.cycles-pp.iov_iter_copy_from_user_atomic
> 0.98 ± 54% +0.5 1.43 ± 3% perf-profile.children.cycles-pp.entry_SYSCALL_64
> 0.98 ± 53% +0.5 1.50 ± 3% perf-profile.children.cycles-pp.syscall_return_via_sysret
> 1.64 ± 57% +0.8 2.45 ± 5% perf-profile.children.cycles-pp.pagecache_get_page
> 0.04 ± 57% +0.0 0.06 perf-profile.self.cycles-pp.__pagevec_lru_add_fn
> 0.04 ± 58% +0.0 0.07 ± 7% perf-profile.self.cycles-pp.release_pages
> 0.05 ± 58% +0.0 0.08 ± 15% perf-profile.self.cycles-pp._cond_resched
> 0.04 ± 58% +0.0 0.08 ± 6% perf-profile.self.cycles-pp.ksys_write
> 0.05 ± 58% +0.0 0.09 ± 13% perf-profile.self.cycles-pp.down_write_trylock
> 0.09 ± 58% +0.1 0.14 ± 9% perf-profile.self.cycles-pp.page_mapping
> 0.01 ±173% +0.1 0.07 ± 7% perf-profile.self.cycles-pp.__fdget_pos
> 0.11 ± 57% +0.1 0.17 ± 7% perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
> 0.05 ± 59% +0.1 0.12 ± 5% perf-profile.self.cycles-pp.percpu_counter_add_batch
> 0.12 ± 58% +0.1 0.19 ± 4% perf-profile.self.cycles-pp.iov_iter_copy_from_user_atomic
> 0.17 ± 57% +0.1 0.24 ± 4% perf-profile.self.cycles-pp.generic_perform_write
> 0.17 ± 58% +0.1 0.26 ± 2% perf-profile.self.cycles-pp.iov_iter_fault_in_readable
> 0.19 ± 57% +0.1 0.30 ± 2% perf-profile.self.cycles-pp.f2fs_set_data_page_dirty
> 0.18 ± 58% +0.1 0.30 ± 4% perf-profile.self.cycles-pp.pagecache_get_page
> 0.27 ± 57% +0.1 0.41 ± 4% perf-profile.self.cycles-pp.do_syscall_64
> 0.40 ± 57% +0.2 0.62 ± 5% perf-profile.self.cycles-pp.find_get_entry
> 0.77 ± 57% +0.3 1.11 perf-profile.self.cycles-pp.copy_user_enhanced_fast_string
> 0.96 ± 54% +0.5 1.43 ± 3% perf-profile.self.cycles-pp.entry_SYSCALL_64
> 0.98 ± 53% +0.5 1.50 ± 2% perf-profile.self.cycles-pp.syscall_return_via_sysret
> 0.72 ± 59% +0.5 1.26 ± 10% perf-profile.self.cycles-pp.f2fs_lookup_extent_cache
>
>
>
> aim7.jobs-per-min
>
> 114000 +-+----------------------------------------------------------------+
> 112000 +-+ O |
> O O O O O O O O O |
> 110000 +-+ O O O O O O O |
> 108000 +-+ |
> | O O O O O |
> 106000 +-+O |
> 104000 +-+ |
> 102000 +-+ |
> | |
> 100000 +-+ |
> 98000 +-+ |
> |.. .+..+.+.. .+.. .+.. .+..+..+.+.. .+..+.+..+..+.+.. +.. |
> 96000 +-++ .+. + + + + +.+..|
> 94000 +-+----------------------------------------------------------------+
>
>
> aim7.time.system_time
>
> 7200 +-+------------------------------------------------------------------+
> | |
> 7000 +-+ .+.. +.. .+.. |
> | .+. .+ +.. + .+. .+. .+. .+. .+ .+.+..|
> 6800 +-+ +..+. + +. +..+. +. +..+. +..+. +. |
> | |
> 6600 +-+ |
> | |
> 6400 +-+ |
> | O |
> 6200 +-+ |
> | O O O O O |
> 6000 +-+ O O O |
> O O O O O O O O O O O |
> 5800 +-+-----O---------------O-------------------------O------------------+
>
>
> aim7.time.elapsed_time
>
> 205 +-+-------------------------------------------------------------------+
> | :: |
> 200 +-+ : : |
> 195 +-+ : :|
> | .+.. +.. : :|
> 190 +-++. .+ +.. .+. .+.. .+.. .+.. .+.. + .+ |
> 185 +-+ +..+. +. +. +.+. + +..+ +..+..+ +. |
> | |
> 180 +-+ |
> 175 +-+ |
> | O |
> 170 +-+ O O O |
> 165 +-+ O O O |
> O O O O O O O O O O O O O O O |
> 160 +-+-----O-------------------------------------------------------------+
>
>
> aim7.time.elapsed_time.max
>
> 205 +-+-------------------------------------------------------------------+
> | :: |
> 200 +-+ : : |
> 195 +-+ : :|
> | .+.. +.. : :|
> 190 +-++. .+ +.. .+. .+.. .+.. .+.. .+.. + .+ |
> 185 +-+ +..+. +. +. +.+. + +..+ +..+..+ +. |
> | |
> 180 +-+ |
> 175 +-+ |
> | O |
> 170 +-+ O O O |
> 165 +-+ O O O |
> O O O O O O O O O O O O O O O |
> 160 +-+-----O-------------------------------------------------------------+
>
>
> aim7.time.involuntary_context_switches
>
> 1.15e+06 +-+--------------------------------------------------------------+
> | +.. + |
> 1.1e+06 +-++ .+.. .+.. + .+.. .+. .+ .+.. .+. : +|
> |. + .+ + + + .+. +. + .+ +.+. +..+ : |
> | +. + +. + : |
> 1.05e+06 +-+ + |
> | |
> 1e+06 +-+ |
> | |
> 950000 +-+ |
> | O |
> O O O O O O O O O |
> 900000 +-+ O O O O O O O O O O O |
> | O O |
> 850000 +-+--------------------------------------------------------------+
>
>
> [*] bisect-good sample
> [O] bisect-bad sample
>
> ***************************************************************************************************
> lkp-ivb-ep01: 40 threads Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz with 384G memory
> =========================================================================================
> compiler/cpufreq_governor/disk/fs/kconfig/load/md/rootfs/tbox_group/test/testcase:
> gcc-7/performance/4BRD_12G/f2fs/x86_64-rhel-7.2/3000/RAID1/debian-x86_64-2018-04-03.cgz/lkp-ivb-ep01/disk_rr/aim7
>
> commit:
> d6c66cd19e ("f2fs: fix count of seg_freed to make sec_freed correct")
> 089842de57 ("f2fs: remove codes of unused wio_mutex")
>
> d6c66cd19ef322fe 089842de5750f434aa016eb23f
> ---------------- --------------------------
> fail:runs %reproduction fail:runs
> | | |
> :4 50% 2:4 dmesg.WARNING:at#for_ip_interrupt_entry/0x
> :4 25% 1:4 kmsg.DHCP/BOOTP:Reply_not_for_us_on_eth#,op[#]xid[#]
> :4 25% 1:4 kmsg.IP-Config:Reopening_network_devices
> %stddev %change %stddev
> \ | \
> 102582 +8.8% 111626 aim7.jobs-per-min
> 176.57 -8.5% 161.64 aim7.time.elapsed_time
> 176.57 -8.5% 161.64 aim7.time.elapsed_time.max
> 1060618 -12.5% 927723 aim7.time.involuntary_context_switches
> 6408 -8.9% 5839 aim7.time.system_time
> 785554 +4.5% 820987 aim7.time.voluntary_context_switches
> 1077477 -9.5% 975130 ± 2% softirqs.RCU
> 184.77 ± 6% +41.2% 260.90 ± 11% iostat.md0.w/s
> 6609 ± 2% +9.6% 7246 iostat.md0.wkB/s
> 0.00 ± 94% +0.0 0.02 ± 28% mpstat.cpu.soft%
> 1.89 ± 4% +0.3 2.15 ± 3% mpstat.cpu.usr%
> 6546 ± 19% -49.1% 3328 ± 63% numa-numastat.node0.other_node
> 1470 ± 86% +222.9% 4749 ± 45% numa-numastat.node1.other_node
> 959.75 ± 8% +16.8% 1120 ± 7% slabinfo.UNIX.active_objs
> 959.75 ± 8% +16.8% 1120 ± 7% slabinfo.UNIX.num_objs
> 38.35 +3.2% 39.57 ± 2% turbostat.RAMWatt
> 8800 ± 2% -10.7% 7855 ± 3% turbostat.SMI
> 103925 ± 27% -59.5% 42134 ± 61% numa-meminfo.node0.AnonHugePages
> 14267 ± 61% -54.9% 6430 ± 76% numa-meminfo.node0.Inactive(anon)
> 52220 ± 18% +104.0% 106522 ± 40% numa-meminfo.node1.AnonHugePages
> 6614 ± 2% +9.6% 7248 vmstat.io.bo
> 316.00 ± 2% -15.4% 267.25 ± 8% vmstat.procs.r
> 12256 ± 2% +6.9% 13098 vmstat.system.cs
> 2852 ± 3% +12.5% 3208 ± 3% numa-vmstat.node0.nr_active_file
> 3566 ± 61% -54.9% 1607 ± 76% numa-vmstat.node0.nr_inactive_anon
> 2852 ± 3% +12.4% 3207 ± 3% numa-vmstat.node0.nr_zone_active_file
> 3566 ± 61% -54.9% 1607 ± 76% numa-vmstat.node0.nr_zone_inactive_anon
> 95337 +2.3% 97499 proc-vmstat.nr_active_anon
> 5746 ± 2% +4.3% 5990 proc-vmstat.nr_active_file
> 89732 +2.0% 91532 proc-vmstat.nr_anon_pages
> 95337 +2.3% 97499 proc-vmstat.nr_zone_active_anon
> 5746 ± 2% +4.3% 5990 proc-vmstat.nr_zone_active_file
> 10407 ± 4% -49.3% 5274 ± 52% proc-vmstat.numa_hint_faults_local
> 615058 -6.0% 578344 ± 2% proc-vmstat.pgfault
> 1.187e+12 -8.7% 1.084e+12 perf-stat.branch-instructions
> 0.65 ± 3% +0.0 0.70 ± 2% perf-stat.branch-miss-rate%
> 2219706 -2.5% 2164425 perf-stat.context-switches
> 2.071e+13 -10.0% 1.864e+13 perf-stat.cpu-cycles
> 641874 -2.7% 624703 perf-stat.cpu-migrations
> 1.408e+12 -7.3% 1.305e+12 perf-stat.dTLB-loads
> 39182891 ± 4% +796.4% 3.512e+08 ±150% perf-stat.iTLB-loads
> 5.184e+12 -8.0% 4.77e+12 perf-stat.instructions
> 5035 ± 2% -14.1% 4325 ± 13% perf-stat.instructions-per-iTLB-miss
> 604219 -6.2% 566725 perf-stat.minor-faults
> 4.962e+09 -2.7% 4.827e+09 perf-stat.node-stores
> 604097 -6.2% 566730 perf-stat.page-faults
> 110.81 ± 13% +25.7% 139.25 ± 8% sched_debug.cfs_rq:/.load_avg.stddev
> 12.76 ± 74% +114.6% 27.39 ± 38% sched_debug.cfs_rq:/.removed.load_avg.avg
> 54.23 ± 62% +66.2% 90.10 ± 17% sched_debug.cfs_rq:/.removed.load_avg.stddev
> 585.18 ± 74% +115.8% 1262 ± 38% sched_debug.cfs_rq:/.removed.runnable_sum.avg
> 2489 ± 62% +66.9% 4153 ± 17% sched_debug.cfs_rq:/.removed.runnable_sum.stddev
> 11909 ± 10% +44.7% 17229 ± 18% sched_debug.cfs_rq:/.runnable_weight.avg
> 1401 ± 2% +36.5% 1913 ± 5% sched_debug.cpu.sched_goidle.avg
> 2350 ± 2% +21.9% 2863 ± 5% sched_debug.cpu.sched_goidle.max
> 1082 ± 5% +39.2% 1506 ± 4% sched_debug.cpu.sched_goidle.min
> 7327 +14.7% 8401 ± 2% sched_debug.cpu.ttwu_count.avg
> 5719 ± 3% +18.3% 6767 ± 2% sched_debug.cpu.ttwu_count.min
> 1518 ± 3% +15.6% 1755 ± 3% sched_debug.cpu.ttwu_local.min
> 88.70 -1.0 87.65 perf-profile.calltrace.cycles-pp.generic_perform_write.__generic_file_write_iter.f2fs_file_write_iter.__vfs_write.vfs_write
> 54.51 -1.0 53.48 perf-profile.calltrace.cycles-pp._raw_spin_lock.f2fs_inode_dirtied.f2fs_mark_inode_dirty_sync.f2fs_write_end.generic_perform_write
> 54.55 -1.0 53.53 perf-profile.calltrace.cycles-pp.f2fs_mark_inode_dirty_sync.f2fs_write_end.generic_perform_write.__generic_file_write_iter.f2fs_file_write_iter
> 56.32 -1.0 55.30 perf-profile.calltrace.cycles-pp.f2fs_write_end.generic_perform_write.__generic_file_write_iter.f2fs_file_write_iter.__vfs_write
> 54.54 -1.0 53.53 perf-profile.calltrace.cycles-pp.f2fs_inode_dirtied.f2fs_mark_inode_dirty_sync.f2fs_write_end.generic_perform_write.__generic_file_write_iter
> 88.93 -1.0 87.96 perf-profile.calltrace.cycles-pp.__generic_file_write_iter.f2fs_file_write_iter.__vfs_write.vfs_write.ksys_write
> 89.94 -0.8 89.14 perf-profile.calltrace.cycles-pp.f2fs_file_write_iter.__vfs_write.vfs_write.ksys_write.do_syscall_64
> 90.01 -0.8 89.26 perf-profile.calltrace.cycles-pp.__vfs_write.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 90.72 -0.7 90.00 perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 90.59 -0.7 89.87 perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 13.32 -0.3 13.01 perf-profile.calltrace.cycles-pp._raw_spin_lock.f2fs_inode_dirtied.f2fs_mark_inode_dirty_sync.f2fs_reserve_new_blocks.f2fs_reserve_block
> 13.33 -0.3 13.01 perf-profile.calltrace.cycles-pp.f2fs_inode_dirtied.f2fs_mark_inode_dirty_sync.f2fs_reserve_new_blocks.f2fs_reserve_block.f2fs_get_block
> 13.33 -0.3 13.01 perf-profile.calltrace.cycles-pp.f2fs_mark_inode_dirty_sync.f2fs_reserve_new_blocks.f2fs_reserve_block.f2fs_get_block.f2fs_write_begin
> 13.26 -0.3 12.94 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.f2fs_inode_dirtied.f2fs_mark_inode_dirty_sync.f2fs_reserve_new_blocks
> 1.30 ± 2% +0.1 1.40 ± 2% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64
> 2.20 ± 6% +0.2 2.40 ± 3% perf-profile.calltrace.cycles-pp.generic_file_read_iter.__vfs_read.vfs_read.ksys_read.do_syscall_64
> 2.28 ± 5% +0.2 2.52 ± 5% perf-profile.calltrace.cycles-pp.__vfs_read.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 2.85 ± 4% +0.3 3.16 ± 5% perf-profile.calltrace.cycles-pp.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 2.97 ± 4% +0.3 3.31 ± 5% perf-profile.calltrace.cycles-pp.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 88.74 -1.0 87.70 perf-profile.children.cycles-pp.generic_perform_write
> 56.33 -1.0 55.31 perf-profile.children.cycles-pp.f2fs_write_end
> 88.95 -1.0 87.98 perf-profile.children.cycles-pp.__generic_file_write_iter
> 89.95 -0.8 89.15 perf-profile.children.cycles-pp.f2fs_file_write_iter
> 90.03 -0.8 89.28 perf-profile.children.cycles-pp.__vfs_write
> 90.73 -0.7 90.02 perf-profile.children.cycles-pp.ksys_write
> 90.60 -0.7 89.89 perf-profile.children.cycles-pp.vfs_write
> 0.22 ± 5% -0.1 0.17 ± 19% perf-profile.children.cycles-pp.f2fs_invalidate_page
> 0.08 ± 10% +0.0 0.10 ± 5% perf-profile.children.cycles-pp.page_mapping
> 0.09 +0.0 0.11 ± 7% perf-profile.children.cycles-pp.__cancel_dirty_page
> 0.06 ± 6% +0.0 0.09 ± 28% perf-profile.children.cycles-pp.read_node_page
> 0.10 ± 4% +0.0 0.14 ± 14% perf-profile.children.cycles-pp.current_time
> 0.07 ± 12% +0.0 0.11 ± 9% perf-profile.children.cycles-pp.percpu_counter_add_batch
> 0.00 +0.1 0.05 perf-profile.children.cycles-pp.__x64_sys_write
> 0.38 ± 3% +0.1 0.43 ± 5% perf-profile.children.cycles-pp.selinux_file_permission
> 0.55 ± 4% +0.1 0.61 ± 4% perf-profile.children.cycles-pp.security_file_permission
> 1.30 +0.1 1.40 ± 2% perf-profile.children.cycles-pp.entry_SYSCALL_64
> 2.21 ± 6% +0.2 2.41 ± 3% perf-profile.children.cycles-pp.generic_file_read_iter
> 2.29 ± 6% +0.2 2.53 ± 5% perf-profile.children.cycles-pp.__vfs_read
> 2.86 ± 4% +0.3 3.18 ± 5% perf-profile.children.cycles-pp.vfs_read
> 2.99 ± 4% +0.3 3.32 ± 5% perf-profile.children.cycles-pp.ksys_read
> 0.37 -0.1 0.24 ± 23% perf-profile.self.cycles-pp.__get_node_page
> 0.21 ± 3% -0.1 0.15 ± 16% perf-profile.self.cycles-pp.f2fs_invalidate_page
> 0.07 ± 5% +0.0 0.09 ± 11% perf-profile.self.cycles-pp.page_mapping
> 0.06 ± 11% +0.0 0.08 ± 8% perf-profile.self.cycles-pp.vfs_read
> 0.07 ± 7% +0.0 0.10 ± 21% perf-profile.self.cycles-pp.__generic_file_write_iter
> 0.06 ± 14% +0.0 0.10 ± 10% perf-profile.self.cycles-pp.percpu_counter_add_batch
> 0.20 ± 11% +0.0 0.25 ± 12% perf-profile.self.cycles-pp.selinux_file_permission
> 0.05 ± 8% +0.1 0.11 ± 52% perf-profile.self.cycles-pp.__vfs_read
> 0.33 ± 9% +0.1 0.41 ± 9% perf-profile.self.cycles-pp.f2fs_lookup_extent_cache
> 1.30 +0.1 1.40 ± 2% perf-profile.self.cycles-pp.entry_SYSCALL_64
>
>
>
>
>
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are provided
> for informational purposes only. Any difference in system hardware or software
> design or configuration may affect actual performance.
>
>
> Thanks,
> Rong Chen
>