Re: [LKP] [f2fs] 089842de57: aim7.jobs-per-min 15.4% improvement

From: Chao Yu
Date: Tue Dec 11 2018 - 22:23:38 EST


On 2018/12/12 11:01, Rong Chen wrote:
>
>
> On 12/11/2018 06:12 PM, Chao Yu wrote:
>> Hi all,
>>
>> The commit only clean up codes which are unused currently, so why we can
>> improve performance with it? could you retest to make sure?
>
> Hi Chao,
>
> the improvement is exist in 0day environment.

Hi Rong,

Logically, the deleted codes are dead, and removal of them shouldn't impact
any flows.

Instead, I expect below patch can improve performance in some cases:

https://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git/commit/?h=dev-test&id=1e771e83ce26d0ba2ce6c7df7effb7822f032c4a

So I suspect there is problem in current test result, not sure the problem
is in test suit or test environment.

Any thoughts?

Thanks,

>
> â job cat
> /result/aim7/4BRD_12G-RAID1-f2fs-disk_rw-3000-performance/lkp-ivb-ep01/debian-x86_64-2018-04-03.cgz/x86_64-rhel-7.2/gcc-7/089842de5750f434aa016eb23f3d3a3a151083bd/*/aim7.json
> | grep -A1 min\"
> Â "aim7.jobs-per-min": [
> ÂÂÂ 111406.82
> --
> Â "aim7.jobs-per-min": [
> ÂÂÂ 110851.09
> --
> Â "aim7.jobs-per-min": [
> ÂÂÂ 111399.93
> --
> Â "aim7.jobs-per-min": [
> ÂÂÂ 110327.92
> --
> Â "aim7.jobs-per-min": [
> ÂÂÂ 110321.16
>
> â job cat
> /result/aim7/4BRD_12G-RAID1-f2fs-disk_rw-3000-performance/lkp-ivb-ep01/debian-x86_64-2018-04-03.cgz/x86_64-rhel-7.2/gcc-7/d6c66cd19ef322fe0d51ba09ce1b7f386acab04a/*/aim7.json
> | grep -A1 min\"
> Â "aim7.jobs-per-min": [
> ÂÂÂ 97082.14
> --
> Â "aim7.jobs-per-min": [
> ÂÂÂ 95959.06
> --
> Â "aim7.jobs-per-min": [
> ÂÂÂ 95959.06
> --
> Â "aim7.jobs-per-min": [
> ÂÂÂ 95851.75
> --
> Â "aim7.jobs-per-min": [
> ÂÂÂ 96946.19
>
> Best Regards,
> Rong Chen
>
>>
>> Thanks,
>>
>> On 2018/12/11 17:59, kernel test robot wrote:
>>> Greeting,
>>>
>>> FYI, we noticed a 15.4% improvement of aim7.jobs-per-min due to commit:
>>>
>>>
>>> commit: 089842de5750f434aa016eb23f3d3a3a151083bd ("f2fs: remove codes of unused wio_mutex")
>>> https://git.kernel.org/cgit/linux/kernel/git/jaegeuk/f2fs.git dev-test
>>>
>>> in testcase: aim7
>>> on test machine: 40 threads Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz with 384G memory
>>> with following parameters:
>>>
>>> disk: 4BRD_12G
>>> md: RAID1
>>> fs: f2fs
>>> test: disk_rw
>>> load: 3000
>>> cpufreq_governor: performance
>>>
>>> test-description: AIM7 is a traditional UNIX system level benchmark suite which is used to test and measure the performance of multiuser system.
>>> test-url: https://sourceforge.net/projects/aimbench/files/aim-suite7/
>>>
>>> In addition to that, the commit also has significant impact on the following tests:
>>>
>>> +------------------+-----------------------------------------------------------------------+
>>> | testcase: change | aim7: aim7.jobs-per-min 8.8% improvement |
>>> | test machine | 40 threads Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz with 384G memory |
>>> | test parameters | cpufreq_governor=performance |
>>> | | disk=4BRD_12G |
>>> | | fs=f2fs |
>>> | | load=3000 |
>>> | | md=RAID1 |
>>> | | test=disk_rr |
>>> +------------------+-----------------------------------------------------------------------+
>>>
>>>
>>> Details are as below:
>>> -------------------------------------------------------------------------------------------------->
>>>
>>>
>>> To reproduce:
>>>
>>> git clone https://github.com/intel/lkp-tests.git
>>> cd lkp-tests
>>> bin/lkp install job.yaml # job file is attached in this email
>>> bin/lkp run job.yaml
>>>
>>> =========================================================================================
>>> compiler/cpufreq_governor/disk/fs/kconfig/load/md/rootfs/tbox_group/test/testcase:
>>> gcc-7/performance/4BRD_12G/f2fs/x86_64-rhel-7.2/3000/RAID1/debian-x86_64-2018-04-03.cgz/lkp-ivb-ep01/disk_rw/aim7
>>>
>>> commit:
>>> d6c66cd19e ("f2fs: fix count of seg_freed to make sec_freed correct")
>>> 089842de57 ("f2fs: remove codes of unused wio_mutex")
>>>
>>> d6c66cd19ef322fe 089842de5750f434aa016eb23f
>>> ---------------- --------------------------
>>> %stddev %change %stddev
>>> \ | \
>>> 96213 +15.4% 110996 aim7.jobs-per-min
>>> 191.50 Â 3% -15.1% 162.52 aim7.time.elapsed_time
>>> 191.50 Â 3% -15.1% 162.52 aim7.time.elapsed_time.max
>>> 1090253 Â 2% -17.5% 899165 aim7.time.involuntary_context_switches
>>> 176713 -7.5% 163478 aim7.time.minor_page_faults
>>> 6882 -14.6% 5875 aim7.time.system_time
>>> 127.97 +4.7% 134.00 aim7.time.user_time
>>> 760923 +7.1% 814632 aim7.time.voluntary_context_switches
>>> 78499 Â 2% -11.2% 69691 interrupts.CAL:Function_call_interrupts
>>> 3183861 Â 4% -16.7% 2651390 Â 4% softirqs.TIMER
>>> 191.54 Â 13% +45.4% 278.59 Â 12% iostat.md0.w/s
>>> 6118 Â 3% +16.5% 7126 Â 2% iostat.md0.wkB/s
>>> 151257 Â 2% -10.1% 135958 Â 2% meminfo.AnonHugePages
>>> 46754 Â 3% +14.0% 53307 Â 3% meminfo.max_used_kB
>>> 0.03 Â 62% -0.0 0.01 Â 78% mpstat.cpu.soft%
>>> 1.73 Â 3% +0.4 2.13 Â 3% mpstat.cpu.usr%
>>> 16062961 Â 2% -12.1% 14124403 Â 2% turbostat.IRQ
>>> 0.76 Â 37% -71.8% 0.22 Â 83% turbostat.Pkg%pc6
>>> 9435 Â 7% -18.1% 7730 Â 4% turbostat.SMI
>>> 6113 Â 3% +16.5% 7120 Â 2% vmstat.io.bo
>>> 11293 Â 2% +12.3% 12688 Â 2% vmstat.system.cs
>>> 81879 Â 2% +2.5% 83951 vmstat.system.in
>>> 2584 -4.4% 2469 Â 2% proc-vmstat.nr_active_file
>>> 2584 -4.4% 2469 Â 2% proc-vmstat.nr_zone_active_file
>>> 28564 Â 4% -23.6% 21817 Â 12% proc-vmstat.numa_hint_faults
>>> 10958 Â 5% -43.9% 6147 Â 26% proc-vmstat.numa_hint_faults_local
>>> 660531 Â 3% -10.7% 590059 Â 2% proc-vmstat.pgfault
>>> 1191 Â 7% -16.5% 995.25 Â 12% slabinfo.UNIX.active_objs
>>> 1191 Â 7% -16.5% 995.25 Â 12% slabinfo.UNIX.num_objs
>>> 10552 Â 4% -7.8% 9729 slabinfo.ext4_io_end.active_objs
>>> 10552 Â 4% -7.8% 9729 slabinfo.ext4_io_end.num_objs
>>> 18395 +12.3% 20656 Â 8% slabinfo.kmalloc-32.active_objs
>>> 18502 Â 2% +12.3% 20787 Â 8% slabinfo.kmalloc-32.num_objs
>>> 1.291e+12 -12.3% 1.131e+12 perf-stat.branch-instructions
>>> 0.66 +0.1 0.76 Â 3% perf-stat.branch-miss-rate%
>>> 1.118e+10 Â 4% -7.5% 1.034e+10 perf-stat.cache-misses
>>> 2.772e+10 Â 8% -6.6% 2.589e+10 perf-stat.cache-references
>>> 2214958 -3.6% 2136237 perf-stat.context-switches
>>> 3.95 Â 2% -5.8% 3.72 perf-stat.cpi
>>> 2.24e+13 -16.4% 1.873e+13 perf-stat.cpu-cycles
>>> 1.542e+12 -10.4% 1.382e+12 perf-stat.dTLB-loads
>>> 0.18 Â 6% +0.0 0.19 Â 4% perf-stat.dTLB-store-miss-rate%
>>> 5.667e+12 -11.3% 5.029e+12 perf-stat.instructions
>>> 5534 -13.1% 4809 Â 6% perf-stat.instructions-per-iTLB-miss
>>> 0.25 Â 2% +6.1% 0.27 perf-stat.ipc
>>> 647970 Â 2% -10.7% 578955 Â 2% perf-stat.minor-faults
>>> 2.783e+09 Â 18% -17.8% 2.288e+09 Â 4% perf-stat.node-loads
>>> 5.706e+09 Â 2% -5.2% 5.407e+09 perf-stat.node-store-misses
>>> 7.693e+09 -4.4% 7.352e+09 perf-stat.node-stores
>>> 647979 Â 2% -10.7% 578955 Â 2% perf-stat.page-faults
>>> 70960 Â 16% -26.6% 52062 sched_debug.cfs_rq:/.exec_clock.avg
>>> 70628 Â 16% -26.7% 51787 sched_debug.cfs_rq:/.exec_clock.min
>>> 22499 Â 3% -10.5% 20133 Â 3% sched_debug.cfs_rq:/.load.avg
>>> 7838 Â 23% -67.6% 2536 Â 81% sched_debug.cfs_rq:/.load.min
>>> 362.19 Â 12% +58.3% 573.50 Â 25% sched_debug.cfs_rq:/.load_avg.max
>>> 3092960 Â 16% -28.5% 2211400 sched_debug.cfs_rq:/.min_vruntime.avg
>>> 3244162 Â 15% -27.0% 2367437 Â 2% sched_debug.cfs_rq:/.min_vruntime.max
>>> 2984299 Â 16% -28.9% 2121271 sched_debug.cfs_rq:/.min_vruntime.min
>>> 0.73 Â 4% -65.7% 0.25 Â 57% sched_debug.cfs_rq:/.nr_running.min
>>> 0.12 Â 13% +114.6% 0.26 Â 9% sched_debug.cfs_rq:/.nr_running.stddev
>>> 8.44 Â 23% -36.8% 5.33 Â 15% sched_debug.cfs_rq:/.nr_spread_over.max
>>> 1.49 Â 21% -29.6% 1.05 Â 7% sched_debug.cfs_rq:/.nr_spread_over.stddev
>>> 16.53 Â 20% -38.8% 10.12 Â 23% sched_debug.cfs_rq:/.runnable_load_avg.avg
>>> 15259 Â 7% -33.3% 10176 Â 22% sched_debug.cfs_rq:/.runnable_weight.avg
>>> 796.65 Â 93% -74.8% 200.68 Â 17% sched_debug.cfs_rq:/.util_est_enqueued.avg
>>> 669258 Â 3% -13.3% 580068 sched_debug.cpu.avg_idle.avg
>>> 116020 Â 12% -21.4% 91239 sched_debug.cpu.clock.avg
>>> 116076 Â 12% -21.4% 91261 sched_debug.cpu.clock.max
>>> 115967 Â 12% -21.3% 91215 sched_debug.cpu.clock.min
>>> 116020 Â 12% -21.4% 91239 sched_debug.cpu.clock_task.avg
>>> 116076 Â 12% -21.4% 91261 sched_debug.cpu.clock_task.max
>>> 115967 Â 12% -21.3% 91215 sched_debug.cpu.clock_task.min
>>> 15.41 Â 4% -32.0% 10.48 Â 24% sched_debug.cpu.cpu_load[0].avg
>>> 15.71 Â 6% -26.6% 11.53 Â 22% sched_debug.cpu.cpu_load[1].avg
>>> 16.20 Â 8% -22.9% 12.49 Â 21% sched_debug.cpu.cpu_load[2].avg
>>> 16.92 Â 7% -21.2% 13.33 Â 21% sched_debug.cpu.cpu_load[3].avg
>>> 2650 Â 6% -15.6% 2238 Â 3% sched_debug.cpu.curr->pid.avg
>>> 1422 Â 8% -68.5% 447.42 Â 57% sched_debug.cpu.curr->pid.min
>>> 7838 Â 23% -67.6% 2536 Â 81% sched_debug.cpu.load.min
>>> 86066 Â 14% -26.3% 63437 sched_debug.cpu.nr_load_updates.min
>>> 3.97 Â 88% -70.9% 1.15 Â 10% sched_debug.cpu.nr_running.avg
>>> 0.73 Â 4% -65.7% 0.25 Â 57% sched_debug.cpu.nr_running.min
>>> 1126 Â 16% -27.6% 816.02 Â 9% sched_debug.cpu.sched_count.stddev
>>> 1468 Â 16% +31.1% 1925 Â 5% sched_debug.cpu.sched_goidle.avg
>>> 1115 Â 16% +37.8% 1538 Â 4% sched_debug.cpu.sched_goidle.min
>>> 3979 Â 13% -27.4% 2888 Â 5% sched_debug.cpu.ttwu_local.max
>>> 348.96 Â 8% -26.3% 257.16 Â 13% sched_debug.cpu.ttwu_local.stddev
>>> 115966 Â 12% -21.3% 91214 sched_debug.cpu_clk
>>> 113505 Â 12% -21.8% 88773 sched_debug.ktime
>>> 116416 Â 12% -21.3% 91663 sched_debug.sched_clk
>>> 0.26 Â100% +0.3 0.57 Â 6% perf-profile.calltrace.cycles-pp.security_file_permission.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
>>> 0.29 Â100% +0.4 0.66 Â 5% perf-profile.calltrace.cycles-pp.find_get_entry.pagecache_get_page.f2fs_write_begin.generic_perform_write.__generic_file_write_iter
>>> 0.67 Â 65% +0.4 1.11 perf-profile.calltrace.cycles-pp.copy_user_enhanced_fast_string.copyin.iov_iter_copy_from_user_atomic.generic_perform_write.__generic_file_write_iter
>>> 0.69 Â 65% +0.5 1.14 perf-profile.calltrace.cycles-pp.copyin.iov_iter_copy_from_user_atomic.generic_perform_write.__generic_file_write_iter.f2fs_file_write_iter
>>> 1.07 Â 57% +0.5 1.61 Â 5% perf-profile.calltrace.cycles-pp.pagecache_get_page.f2fs_write_begin.generic_perform_write.__generic_file_write_iter.f2fs_file_write_iter
>>> 0.79 Â 64% +0.5 1.33 perf-profile.calltrace.cycles-pp.iov_iter_copy_from_user_atomic.generic_perform_write.__generic_file_write_iter.f2fs_file_write_iter.__vfs_write
>>> 0.73 Â 63% +0.6 1.32 Â 3% perf-profile.calltrace.cycles-pp.syscall_return_via_sysret
>>> 0.81 Â 63% +0.6 1.43 Â 3% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64
>>> 0.06 Â 58% +0.0 0.09 Â 4% perf-profile.children.cycles-pp.__pagevec_lru_add_fn
>>> 0.05 Â 58% +0.0 0.09 Â 13% perf-profile.children.cycles-pp.down_write_trylock
>>> 0.06 Â 58% +0.0 0.10 Â 4% perf-profile.children.cycles-pp.__x64_sys_write
>>> 0.07 Â 58% +0.0 0.11 Â 3% perf-profile.children.cycles-pp.account_page_dirtied
>>> 0.04 Â 57% +0.0 0.09 Â 5% perf-profile.children.cycles-pp.account_page_cleaned
>>> 0.06 Â 58% +0.0 0.10 Â 7% perf-profile.children.cycles-pp.free_pcppages_bulk
>>> 0.10 Â 58% +0.1 0.15 Â 6% perf-profile.children.cycles-pp.page_mapping
>>> 0.09 Â 57% +0.1 0.14 Â 7% perf-profile.children.cycles-pp.__lru_cache_add
>>> 0.10 Â 57% +0.1 0.15 Â 9% perf-profile.children.cycles-pp.__might_sleep
>>> 0.12 Â 58% +0.1 0.19 Â 3% perf-profile.children.cycles-pp.set_page_dirty
>>> 0.08 Â 64% +0.1 0.15 Â 10% perf-profile.children.cycles-pp.dquot_claim_space_nodirty
>>> 0.06 Â 61% +0.1 0.13 Â 5% perf-profile.children.cycles-pp.percpu_counter_add_batch
>>> 0.18 Â 57% +0.1 0.27 Â 2% perf-profile.children.cycles-pp.iov_iter_fault_in_readable
>>> 0.17 Â 57% +0.1 0.26 Â 2% perf-profile.children.cycles-pp.__set_page_dirty_nobuffers
>>> 0.09 Â 57% +0.1 0.18 Â 27% perf-profile.children.cycles-pp.free_unref_page_list
>>> 0.16 Â 58% +0.1 0.30 Â 18% perf-profile.children.cycles-pp.__pagevec_release
>>> 0.30 Â 57% +0.1 0.43 Â 5% perf-profile.children.cycles-pp.add_to_page_cache_lru
>>> 0.17 Â 58% +0.1 0.31 Â 16% perf-profile.children.cycles-pp.release_pages
>>> 0.29 Â 58% +0.2 0.45 Â 7% perf-profile.children.cycles-pp.selinux_file_permission
>>> 0.38 Â 57% +0.2 0.58 Â 6% perf-profile.children.cycles-pp.security_file_permission
>>> 0.78 Â 57% +0.3 1.12 perf-profile.children.cycles-pp.copy_user_enhanced_fast_string
>>> 0.80 Â 57% +0.3 1.15 perf-profile.children.cycles-pp.copyin
>>> 0.92 Â 57% +0.4 1.34 perf-profile.children.cycles-pp.iov_iter_copy_from_user_atomic
>>> 0.98 Â 54% +0.5 1.43 Â 3% perf-profile.children.cycles-pp.entry_SYSCALL_64
>>> 0.98 Â 53% +0.5 1.50 Â 3% perf-profile.children.cycles-pp.syscall_return_via_sysret
>>> 1.64 Â 57% +0.8 2.45 Â 5% perf-profile.children.cycles-pp.pagecache_get_page
>>> 0.04 Â 57% +0.0 0.06 perf-profile.self.cycles-pp.__pagevec_lru_add_fn
>>> 0.04 Â 58% +0.0 0.07 Â 7% perf-profile.self.cycles-pp.release_pages
>>> 0.05 Â 58% +0.0 0.08 Â 15% perf-profile.self.cycles-pp._cond_resched
>>> 0.04 Â 58% +0.0 0.08 Â 6% perf-profile.self.cycles-pp.ksys_write
>>> 0.05 Â 58% +0.0 0.09 Â 13% perf-profile.self.cycles-pp.down_write_trylock
>>> 0.09 Â 58% +0.1 0.14 Â 9% perf-profile.self.cycles-pp.page_mapping
>>> 0.01 Â173% +0.1 0.07 Â 7% perf-profile.self.cycles-pp.__fdget_pos
>>> 0.11 Â 57% +0.1 0.17 Â 7% perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
>>> 0.05 Â 59% +0.1 0.12 Â 5% perf-profile.self.cycles-pp.percpu_counter_add_batch
>>> 0.12 Â 58% +0.1 0.19 Â 4% perf-profile.self.cycles-pp.iov_iter_copy_from_user_atomic
>>> 0.17 Â 57% +0.1 0.24 Â 4% perf-profile.self.cycles-pp.generic_perform_write
>>> 0.17 Â 58% +0.1 0.26 Â 2% perf-profile.self.cycles-pp.iov_iter_fault_in_readable
>>> 0.19 Â 57% +0.1 0.30 Â 2% perf-profile.self.cycles-pp.f2fs_set_data_page_dirty
>>> 0.18 Â 58% +0.1 0.30 Â 4% perf-profile.self.cycles-pp.pagecache_get_page
>>> 0.27 Â 57% +0.1 0.41 Â 4% perf-profile.self.cycles-pp.do_syscall_64
>>> 0.40 Â 57% +0.2 0.62 Â 5% perf-profile.self.cycles-pp.find_get_entry
>>> 0.77 Â 57% +0.3 1.11 perf-profile.self.cycles-pp.copy_user_enhanced_fast_string
>>> 0.96 Â 54% +0.5 1.43 Â 3% perf-profile.self.cycles-pp.entry_SYSCALL_64
>>> 0.98 Â 53% +0.5 1.50 Â 2% perf-profile.self.cycles-pp.syscall_return_via_sysret
>>> 0.72 Â 59% +0.5 1.26 Â 10% perf-profile.self.cycles-pp.f2fs_lookup_extent_cache
>>>
>>>
>>>
>>> aim7.jobs-per-min
>>>
>>> 114000 +-+----------------------------------------------------------------+
>>> 112000 +-+ O |
>>> O O O O O O O O O |
>>> 110000 +-+ O O O O O O O |
>>> 108000 +-+ |
>>> | O O O O O |
>>> 106000 +-+O |
>>> 104000 +-+ |
>>> 102000 +-+ |
>>> | |
>>> 100000 +-+ |
>>> 98000 +-+ |
>>> |.. .+..+.+.. .+.. .+.. .+..+..+.+.. .+..+.+..+..+.+.. +.. |
>>> 96000 +-++ .+. + + + + +.+..|
>>> 94000 +-+----------------------------------------------------------------+
>>>
>>>
>>> aim7.time.system_time
>>>
>>> 7200 +-+------------------------------------------------------------------+
>>> | |
>>> 7000 +-+ .+.. +.. .+.. |
>>> | .+. .+ +.. + .+. .+. .+. .+. .+ .+.+..|
>>> 6800 +-+ +..+. + +. +..+. +. +..+. +..+. +. |
>>> | |
>>> 6600 +-+ |
>>> | |
>>> 6400 +-+ |
>>> | O |
>>> 6200 +-+ |
>>> | O O O O O |
>>> 6000 +-+ O O O |
>>> O O O O O O O O O O O |
>>> 5800 +-+-----O---------------O-------------------------O------------------+
>>>
>>>
>>> aim7.time.elapsed_time
>>>
>>> 205 +-+-------------------------------------------------------------------+
>>> | :: |
>>> 200 +-+ : : |
>>> 195 +-+ : :|
>>> | .+.. +.. : :|
>>> 190 +-++. .+ +.. .+. .+.. .+.. .+.. .+.. + .+ |
>>> 185 +-+ +..+. +. +. +.+. + +..+ +..+..+ +. |
>>> | |
>>> 180 +-+ |
>>> 175 +-+ |
>>> | O |
>>> 170 +-+ O O O |
>>> 165 +-+ O O O |
>>> O O O O O O O O O O O O O O O |
>>> 160 +-+-----O-------------------------------------------------------------+
>>>
>>>
>>> aim7.time.elapsed_time.max
>>>
>>> 205 +-+-------------------------------------------------------------------+
>>> | :: |
>>> 200 +-+ : : |
>>> 195 +-+ : :|
>>> | .+.. +.. : :|
>>> 190 +-++. .+ +.. .+. .+.. .+.. .+.. .+.. + .+ |
>>> 185 +-+ +..+. +. +. +.+. + +..+ +..+..+ +. |
>>> | |
>>> 180 +-+ |
>>> 175 +-+ |
>>> | O |
>>> 170 +-+ O O O |
>>> 165 +-+ O O O |
>>> O O O O O O O O O O O O O O O |
>>> 160 +-+-----O-------------------------------------------------------------+
>>>
>>>
>>> aim7.time.involuntary_context_switches
>>>
>>> 1.15e+06 +-+--------------------------------------------------------------+
>>> | +.. + |
>>> 1.1e+06 +-++ .+.. .+.. + .+.. .+. .+ .+.. .+. : +|
>>> |. + .+ + + + .+. +. + .+ +.+. +..+ : |
>>> | +. + +. + : |
>>> 1.05e+06 +-+ + |
>>> | |
>>> 1e+06 +-+ |
>>> | |
>>> 950000 +-+ |
>>> | O |
>>> O O O O O O O O O |
>>> 900000 +-+ O O O O O O O O O O O |
>>> | O O |
>>> 850000 +-+--------------------------------------------------------------+
>>>
>>>
>>> [*] bisect-good sample
>>> [O] bisect-bad sample
>>>
>>> ***************************************************************************************************
>>> lkp-ivb-ep01: 40 threads Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz with 384G memory
>>> =========================================================================================
>>> compiler/cpufreq_governor/disk/fs/kconfig/load/md/rootfs/tbox_group/test/testcase:
>>> gcc-7/performance/4BRD_12G/f2fs/x86_64-rhel-7.2/3000/RAID1/debian-x86_64-2018-04-03.cgz/lkp-ivb-ep01/disk_rr/aim7
>>>
>>> commit:
>>> d6c66cd19e ("f2fs: fix count of seg_freed to make sec_freed correct")
>>> 089842de57 ("f2fs: remove codes of unused wio_mutex")
>>>
>>> d6c66cd19ef322fe 089842de5750f434aa016eb23f
>>> ---------------- --------------------------
>>> fail:runs %reproduction fail:runs
>>> | | |
>>> :4 50% 2:4 dmesg.WARNING:at#for_ip_interrupt_entry/0x
>>> :4 25% 1:4 kmsg.DHCP/BOOTP:Reply_not_for_us_on_eth#,op[#]xid[#]
>>> :4 25% 1:4 kmsg.IP-Config:Reopening_network_devices
>>> %stddev %change %stddev
>>> \ | \
>>> 102582 +8.8% 111626 aim7.jobs-per-min
>>> 176.57 -8.5% 161.64 aim7.time.elapsed_time
>>> 176.57 -8.5% 161.64 aim7.time.elapsed_time.max
>>> 1060618 -12.5% 927723 aim7.time.involuntary_context_switches
>>> 6408 -8.9% 5839 aim7.time.system_time
>>> 785554 +4.5% 820987 aim7.time.voluntary_context_switches
>>> 1077477 -9.5% 975130 Â 2% softirqs.RCU
>>> 184.77 Â 6% +41.2% 260.90 Â 11% iostat.md0.w/s
>>> 6609 Â 2% +9.6% 7246 iostat.md0.wkB/s
>>> 0.00 Â 94% +0.0 0.02 Â 28% mpstat.cpu.soft%
>>> 1.89 Â 4% +0.3 2.15 Â 3% mpstat.cpu.usr%
>>> 6546 Â 19% -49.1% 3328 Â 63% numa-numastat.node0.other_node
>>> 1470 Â 86% +222.9% 4749 Â 45% numa-numastat.node1.other_node
>>> 959.75 Â 8% +16.8% 1120 Â 7% slabinfo.UNIX.active_objs
>>> 959.75 Â 8% +16.8% 1120 Â 7% slabinfo.UNIX.num_objs
>>> 38.35 +3.2% 39.57 Â 2% turbostat.RAMWatt
>>> 8800 Â 2% -10.7% 7855 Â 3% turbostat.SMI
>>> 103925 Â 27% -59.5% 42134 Â 61% numa-meminfo.node0.AnonHugePages
>>> 14267 Â 61% -54.9% 6430 Â 76% numa-meminfo.node0.Inactive(anon)
>>> 52220 Â 18% +104.0% 106522 Â 40% numa-meminfo.node1.AnonHugePages
>>> 6614 Â 2% +9.6% 7248 vmstat.io.bo
>>> 316.00 Â 2% -15.4% 267.25 Â 8% vmstat.procs.r
>>> 12256 Â 2% +6.9% 13098 vmstat.system.cs
>>> 2852 Â 3% +12.5% 3208 Â 3% numa-vmstat.node0.nr_active_file
>>> 3566 Â 61% -54.9% 1607 Â 76% numa-vmstat.node0.nr_inactive_anon
>>> 2852 Â 3% +12.4% 3207 Â 3% numa-vmstat.node0.nr_zone_active_file
>>> 3566 Â 61% -54.9% 1607 Â 76% numa-vmstat.node0.nr_zone_inactive_anon
>>> 95337 +2.3% 97499 proc-vmstat.nr_active_anon
>>> 5746 Â 2% +4.3% 5990 proc-vmstat.nr_active_file
>>> 89732 +2.0% 91532 proc-vmstat.nr_anon_pages
>>> 95337 +2.3% 97499 proc-vmstat.nr_zone_active_anon
>>> 5746 Â 2% +4.3% 5990 proc-vmstat.nr_zone_active_file
>>> 10407 Â 4% -49.3% 5274 Â 52% proc-vmstat.numa_hint_faults_local
>>> 615058 -6.0% 578344 Â 2% proc-vmstat.pgfault
>>> 1.187e+12 -8.7% 1.084e+12 perf-stat.branch-instructions
>>> 0.65 Â 3% +0.0 0.70 Â 2% perf-stat.branch-miss-rate%
>>> 2219706 -2.5% 2164425 perf-stat.context-switches
>>> 2.071e+13 -10.0% 1.864e+13 perf-stat.cpu-cycles
>>> 641874 -2.7% 624703 perf-stat.cpu-migrations
>>> 1.408e+12 -7.3% 1.305e+12 perf-stat.dTLB-loads
>>> 39182891 Â 4% +796.4% 3.512e+08 Â150% perf-stat.iTLB-loads
>>> 5.184e+12 -8.0% 4.77e+12 perf-stat.instructions
>>> 5035 Â 2% -14.1% 4325 Â 13% perf-stat.instructions-per-iTLB-miss
>>> 604219 -6.2% 566725 perf-stat.minor-faults
>>> 4.962e+09 -2.7% 4.827e+09 perf-stat.node-stores
>>> 604097 -6.2% 566730 perf-stat.page-faults
>>> 110.81 Â 13% +25.7% 139.25 Â 8% sched_debug.cfs_rq:/.load_avg.stddev
>>> 12.76 Â 74% +114.6% 27.39 Â 38% sched_debug.cfs_rq:/.removed.load_avg.avg
>>> 54.23 Â 62% +66.2% 90.10 Â 17% sched_debug.cfs_rq:/.removed.load_avg.stddev
>>> 585.18 Â 74% +115.8% 1262 Â 38% sched_debug.cfs_rq:/.removed.runnable_sum.avg
>>> 2489 Â 62% +66.9% 4153 Â 17% sched_debug.cfs_rq:/.removed.runnable_sum.stddev
>>> 11909 Â 10% +44.7% 17229 Â 18% sched_debug.cfs_rq:/.runnable_weight.avg
>>> 1401 Â 2% +36.5% 1913 Â 5% sched_debug.cpu.sched_goidle.avg
>>> 2350 Â 2% +21.9% 2863 Â 5% sched_debug.cpu.sched_goidle.max
>>> 1082 Â 5% +39.2% 1506 Â 4% sched_debug.cpu.sched_goidle.min
>>> 7327 +14.7% 8401 Â 2% sched_debug.cpu.ttwu_count.avg
>>> 5719 Â 3% +18.3% 6767 Â 2% sched_debug.cpu.ttwu_count.min
>>> 1518 Â 3% +15.6% 1755 Â 3% sched_debug.cpu.ttwu_local.min
>>> 88.70 -1.0 87.65 perf-profile.calltrace.cycles-pp.generic_perform_write.__generic_file_write_iter.f2fs_file_write_iter.__vfs_write.vfs_write
>>> 54.51 -1.0 53.48 perf-profile.calltrace.cycles-pp._raw_spin_lock.f2fs_inode_dirtied.f2fs_mark_inode_dirty_sync.f2fs_write_end.generic_perform_write
>>> 54.55 -1.0 53.53 perf-profile.calltrace.cycles-pp.f2fs_mark_inode_dirty_sync.f2fs_write_end.generic_perform_write.__generic_file_write_iter.f2fs_file_write_iter
>>> 56.32 -1.0 55.30 perf-profile.calltrace.cycles-pp.f2fs_write_end.generic_perform_write.__generic_file_write_iter.f2fs_file_write_iter.__vfs_write
>>> 54.54 -1.0 53.53 perf-profile.calltrace.cycles-pp.f2fs_inode_dirtied.f2fs_mark_inode_dirty_sync.f2fs_write_end.generic_perform_write.__generic_file_write_iter
>>> 88.93 -1.0 87.96 perf-profile.calltrace.cycles-pp.__generic_file_write_iter.f2fs_file_write_iter.__vfs_write.vfs_write.ksys_write
>>> 89.94 -0.8 89.14 perf-profile.calltrace.cycles-pp.f2fs_file_write_iter.__vfs_write.vfs_write.ksys_write.do_syscall_64
>>> 90.01 -0.8 89.26 perf-profile.calltrace.cycles-pp.__vfs_write.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
>>> 90.72 -0.7 90.00 perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
>>> 90.59 -0.7 89.87 perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
>>> 13.32 -0.3 13.01 perf-profile.calltrace.cycles-pp._raw_spin_lock.f2fs_inode_dirtied.f2fs_mark_inode_dirty_sync.f2fs_reserve_new_blocks.f2fs_reserve_block
>>> 13.33 -0.3 13.01 perf-profile.calltrace.cycles-pp.f2fs_inode_dirtied.f2fs_mark_inode_dirty_sync.f2fs_reserve_new_blocks.f2fs_reserve_block.f2fs_get_block
>>> 13.33 -0.3 13.01 perf-profile.calltrace.cycles-pp.f2fs_mark_inode_dirty_sync.f2fs_reserve_new_blocks.f2fs_reserve_block.f2fs_get_block.f2fs_write_begin
>>> 13.26 -0.3 12.94 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.f2fs_inode_dirtied.f2fs_mark_inode_dirty_sync.f2fs_reserve_new_blocks
>>> 1.30 Â 2% +0.1 1.40 Â 2% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64
>>> 2.20 Â 6% +0.2 2.40 Â 3% perf-profile.calltrace.cycles-pp.generic_file_read_iter.__vfs_read.vfs_read.ksys_read.do_syscall_64
>>> 2.28 Â 5% +0.2 2.52 Â 5% perf-profile.calltrace.cycles-pp.__vfs_read.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
>>> 2.85 Â 4% +0.3 3.16 Â 5% perf-profile.calltrace.cycles-pp.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
>>> 2.97 Â 4% +0.3 3.31 Â 5% perf-profile.calltrace.cycles-pp.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
>>> 88.74 -1.0 87.70 perf-profile.children.cycles-pp.generic_perform_write
>>> 56.33 -1.0 55.31 perf-profile.children.cycles-pp.f2fs_write_end
>>> 88.95 -1.0 87.98 perf-profile.children.cycles-pp.__generic_file_write_iter
>>> 89.95 -0.8 89.15 perf-profile.children.cycles-pp.f2fs_file_write_iter
>>> 90.03 -0.8 89.28 perf-profile.children.cycles-pp.__vfs_write
>>> 90.73 -0.7 90.02 perf-profile.children.cycles-pp.ksys_write
>>> 90.60 -0.7 89.89 perf-profile.children.cycles-pp.vfs_write
>>> 0.22 Â 5% -0.1 0.17 Â 19% perf-profile.children.cycles-pp.f2fs_invalidate_page
>>> 0.08 Â 10% +0.0 0.10 Â 5% perf-profile.children.cycles-pp.page_mapping
>>> 0.09 +0.0 0.11 Â 7% perf-profile.children.cycles-pp.__cancel_dirty_page
>>> 0.06 Â 6% +0.0 0.09 Â 28% perf-profile.children.cycles-pp.read_node_page
>>> 0.10 Â 4% +0.0 0.14 Â 14% perf-profile.children.cycles-pp.current_time
>>> 0.07 Â 12% +0.0 0.11 Â 9% perf-profile.children.cycles-pp.percpu_counter_add_batch
>>> 0.00 +0.1 0.05 perf-profile.children.cycles-pp.__x64_sys_write
>>> 0.38 Â 3% +0.1 0.43 Â 5% perf-profile.children.cycles-pp.selinux_file_permission
>>> 0.55 Â 4% +0.1 0.61 Â 4% perf-profile.children.cycles-pp.security_file_permission
>>> 1.30 +0.1 1.40 Â 2% perf-profile.children.cycles-pp.entry_SYSCALL_64
>>> 2.21 Â 6% +0.2 2.41 Â 3% perf-profile.children.cycles-pp.generic_file_read_iter
>>> 2.29 Â 6% +0.2 2.53 Â 5% perf-profile.children.cycles-pp.__vfs_read
>>> 2.86 Â 4% +0.3 3.18 Â 5% perf-profile.children.cycles-pp.vfs_read
>>> 2.99 Â 4% +0.3 3.32 Â 5% perf-profile.children.cycles-pp.ksys_read
>>> 0.37 -0.1 0.24 Â 23% perf-profile.self.cycles-pp.__get_node_page
>>> 0.21 Â 3% -0.1 0.15 Â 16% perf-profile.self.cycles-pp.f2fs_invalidate_page
>>> 0.07 Â 5% +0.0 0.09 Â 11% perf-profile.self.cycles-pp.page_mapping
>>> 0.06 Â 11% +0.0 0.08 Â 8% perf-profile.self.cycles-pp.vfs_read
>>> 0.07 Â 7% +0.0 0.10 Â 21% perf-profile.self.cycles-pp.__generic_file_write_iter
>>> 0.06 Â 14% +0.0 0.10 Â 10% perf-profile.self.cycles-pp.percpu_counter_add_batch
>>> 0.20 Â 11% +0.0 0.25 Â 12% perf-profile.self.cycles-pp.selinux_file_permission
>>> 0.05 Â 8% +0.1 0.11 Â 52% perf-profile.self.cycles-pp.__vfs_read
>>> 0.33 Â 9% +0.1 0.41 Â 9% perf-profile.self.cycles-pp.f2fs_lookup_extent_cache
>>> 1.30 +0.1 1.40 Â 2% perf-profile.self.cycles-pp.entry_SYSCALL_64
>>>
>>>
>>>
>>>
>>>
>>> Disclaimer:
>>> Results have been estimated based on internal Intel analysis and are provided
>>> for informational purposes only. Any difference in system hardware or software
>>> design or configuration may affect actual performance.
>>>
>>>
>>> Thanks,
>>> Rong Chen
>>>
>> _______________________________________________
>> LKP mailing list
>> LKP@xxxxxxxxxxxx
>> https://lists.01.org/mailman/listinfo/lkp
>
>
> .
>