Re: [linus:master] [migrate_pages] 7e12beb8ca: vm-scalability.throughput -3.4% regression

From: Liu, Yujie
Date: Wed Mar 22 2023 - 01:19:11 EST


On Tue, 2023-03-21 at 13:43 +0800, Huang, Ying wrote:
> "Liu, Yujie" <yujie.liu@xxxxxxxxx> writes:
>
> > Hi Ying,
> >
> > On Mon, 2023-03-20 at 15:58 +0800, Huang, Ying wrote:
> > > Hi, Yujie,
> > >
> > > kernel test robot <yujie.liu@xxxxxxxxx> writes:
> > >
> > > > Hello,
> > > >
> > > > FYI, we noticed a -3.4% regression of vm-scalability.throughput due to commit:
> > > >
> > > > commit: 7e12beb8ca2ac98b2ec42e0ea4b76cdc93b58654 ("migrate_pages: batch flushing TLB")
> > > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> > > >
> > > > in testcase: vm-scalability
> > > > on test machine: 96 threads 2 sockets Intel(R) Xeon(R) Platinum 8260L CPU @ 2.40GHz (Cascade Lake) with 128G memory
> > > > with following parameters:
> > > >
> > > >         runtime: 300s
> > > >         size: 512G
> > > >         test: anon-cow-rand-mt
> > > >         cpufreq_governor: performance
> > > >
> > > > test-description: The motivation behind this suite is to exercise functions and regions of the mm/ of the Linux kernel which are of interest to us.
> > > > test-url: https://git.kernel.org/cgit/linux/kernel/git/wfg/vm-scalability.git/
> > > >
> > > >
> > > > If you fix the issue, kindly add following tag
> > > > > Reported-by: kernel test robot <yujie.liu@xxxxxxxxx>
> > > > > Link: https://lore.kernel.org/oe-lkp/202303192325.ecbaf968-yujie.liu@xxxxxxxxx
> > > >
> > >
> > > Thanks a lot for report!  Can you try whether the debug patch as
> > > below can restore the regression?
> >
> > We've tested the patch and found the throughput score was partially
> > restored from -3.6% to -1.4%, still with a slight performance drop.
> > Please check the detailed data as follows:
>
> Good!  Thanks for your detailed data!
>
> >       0.09 ± 17%      +1.2        1.32 ±  7%      +0.4        0.45 ± 21%  perf-profile.children.cycles-pp.flush_tlb_func
>
> It appears that we can reduce the unnecessary TLB flushing effectively
> with the previous debug patch.  But the batched flush (full flush) is
> still slower than the non-batched flush (flush one page).
>
> Can you try the debug patch as below to check whether it can restore the
> regression completely?  The new debug patch can be applied on top of the
> previous debug patch.

The second debug patch got a -0.7% performance change. The data have
some fluctuations from test to test, and the standard deviation is even
a bit larger than 0.7%, which make the performance score not very
convincing. Please check other metrics to see if the regression is
fully restored. Thanks.

=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/runtime/size/tbox_group/test/testcase:
gcc-11/performance/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/300s/512G/lkp-csl-2sp3/anon-cow-rand-mt/vm-scalability

commit:
ebe75e4751063 ("migrate_pages: share more code between _unmap and _move")
9a30245d65679 ("dbg, rmap: avoid flushing TLB in batch if PTE is inaccessible")
a65085664418d ("dbg, migrate_pages: don't batch flushing for single page migration")

ebe75e4751063dce 9a30245d656794d171cd798a2be a65085664418d7ed1560095d466
---------------- --------------------------- ---------------------------
%stddev %change %stddev %change %stddev
\ | \ | \
57634 -1.5% 56788 -0.8% 57199 vm-scalability.median
81.16 ± 12% -20.0 61.18 ± 21% -5.0 76.14 ± 12% vm-scalability.stddev%
5528051 -1.4% 5449450 -0.7% 5487122 vm-scalability.throughput
305.38 -0.1% 305.19 -0.1% 305.15 vm-scalability.time.elapsed_time
305.38 -0.1% 305.19 -0.1% 305.15 vm-scalability.time.elapsed_time.max
652.11 ± 88% +54.5% 1007 ± 63% +45.4% 948.20 ± 80% vm-scalability.time.file_system_inputs
200293 ± 3% -4.3% 191707 ± 2% +1.9% 204033 ± 3% vm-scalability.time.involuntary_context_switches
67.11 ± 56% -95.4% 3.11 ± 80% -11.3% 59.50 ± 27% vm-scalability.time.major_page_faults
32930133 -0.0% 32924571 -0.0% 32922758 vm-scalability.time.maximum_resident_set_size
67952989 ± 5% +35.6% 92147668 ± 3% +2.8% 69849921 ± 8% vm-scalability.time.minor_page_faults
4096 +0.0% 4096 +0.0% 4096 vm-scalability.time.page_size
9006 -0.6% 8956 -0.0% 9005 vm-scalability.time.percent_of_cpu_this_job_got
1178 ± 3% +8.6% 1278 ± 3% -1.9% 1155 ± 4% vm-scalability.time.system_time
26327 -1.0% 26056 +0.0% 26327 vm-scalability.time.user_time
11378 ± 5% +118.5% 24867 ± 7% -0.5% 11327 ± 9% vm-scalability.time.voluntary_context_switches
1.662e+09 -1.5% 1.638e+09 -0.8% 1.648e+09 vm-scalability.workload
1.143e+09 +0.6% 1.15e+09 ± 2% +2.9% 1.176e+09 ± 3% cpuidle..time
2464665 ± 3% +2.0% 2515047 ± 4% +2.2% 2519159 ± 8% cpuidle..usage
367.89 -0.2% 367.16 -0.2% 367.32 uptime.boot
6393 ± 3% -0.9% 6336 ± 2% -0.5% 6363 ± 2% uptime.idle
59.33 ± 4% -0.4% 59.06 ± 2% -0.6% 58.94 ± 3% boot-time.boot
33.79 ± 3% -0.8% 33.54 -0.7% 33.57 boot-time.dhcp
5106 ± 4% -0.6% 5076 ± 2% -0.8% 5066 ± 3% boot-time.idle
1.05 ± 8% -4.4% 1.01 -4.3% 1.01 boot-time.smp_boot
3.78 -0.0 3.77 ± 3% +0.1 3.91 ± 4% mpstat.cpu.all.idle%
0.00 ±184% +0.0 0.00 ± 25% -0.0 0.00 ± 60% mpstat.cpu.all.iowait%
2.58 +0.5 3.09 ± 3% -0.0 2.56 mpstat.cpu.all.irq%
0.03 ± 4% +0.0 0.03 ± 8% -0.0 0.03 ± 5% mpstat.cpu.all.soft%
4.06 ± 3% +0.3 4.40 ± 3% -0.1 3.98 ± 4% mpstat.cpu.all.sys%
89.55 -0.8 88.71 -0.0 89.52 mpstat.cpu.all.usr%
0.00 -100.0% 0.00 -100.0% 0.00 numa-numastat.node0.interleave_hit
14350133 ± 4% +7.7% 15454129 ± 4% -0.5% 14283646 ± 4% numa-numastat.node0.local_node
14405409 ± 4% +7.5% 15487972 ± 4% -0.5% 14332762 ± 4% numa-numastat.node0.numa_hit
55258 ± 48% -37.3% 34622 ± 67% -13.6% 47731 ± 51% numa-numastat.node0.other_node
0.00 -100.0% 0.00 -100.0% 0.00 numa-numastat.node1.interleave_hit
14402027 ± 3% +8.4% 15618857 ± 5% -0.1% 14389667 ± 4% numa-numastat.node1.local_node
14433899 ± 3% +8.6% 15670948 ± 5% -0.0% 14429236 ± 4% numa-numastat.node1.numa_hit
31821 ± 84% +64.9% 52467 ± 44% +30.8% 41622 ± 56% numa-numastat.node1.other_node
305.38 -0.1% 305.19 -0.1% 305.15 time.elapsed_time
305.38 -0.1% 305.19 -0.1% 305.15 time.elapsed_time.max
652.11 ± 88% +54.5% 1007 ± 63% +45.4% 948.20 ± 80% time.file_system_inputs
200293 ± 3% -4.3% 191707 ± 2% +1.9% 204033 ± 3% time.involuntary_context_switches
67.11 ± 56% -95.4% 3.11 ± 80% -11.3% 59.50 ± 27% time.major_page_faults
32930133 -0.0% 32924571 -0.0% 32922758 time.maximum_resident_set_size
67952989 ± 5% +35.6% 92147668 ± 3% +2.8% 69849921 ± 8% time.minor_page_faults
4096 +0.0% 4096 +0.0% 4096 time.page_size
9006 -0.6% 8956 -0.0% 9005 time.percent_of_cpu_this_job_got
1178 ± 3% +8.6% 1278 ± 3% -1.9% 1155 ± 4% time.system_time
26327 -1.0% 26056 +0.0% 26327 time.user_time
11378 ± 5% +118.5% 24867 ± 7% -0.5% 11327 ± 9% time.voluntary_context_switches
4.00 +0.0% 4.00 +0.0% 4.00 vmstat.cpu.id
6.00 +16.7% 7.00 +0.0% 6.00 vmstat.cpu.sy
88.33 -0.9% 87.56 +0.3% 88.60 vmstat.cpu.us
0.00 -100.0% 0.00 -100.0% 0.00 vmstat.cpu.wa
10.67 ± 97% -34.4% 7.00 -34.4% 7.00 vmstat.io.bi
8.00 ± 70% -25.0% 6.00 -25.0% 6.00 vmstat.io.bo
1046 -0.1% 1045 -0.1% 1045 vmstat.memory.buff
2964204 -0.1% 2962572 -0.1% 2961826 vmstat.memory.cache
63650311 +0.1% 63687273 +0.1% 63731617 vmstat.memory.free
0.00 -100.0% 0.00 -100.0% 0.00 vmstat.procs.b
92.00 -0.2% 91.78 -0.3% 91.70 vmstat.procs.r
2022 ± 3% +3.6% 2095 -1.3% 1995 vmstat.system.cs
539357 ± 2% +32.9% 716886 ± 4% -2.1% 528047 ± 5% vmstat.system.in
143480 ± 3% -12.0% 126262 ± 4% -0.6% 142665 ± 3% sched_debug.cfs_rq:/.min_vruntime.stddev
548123 ± 7% -20.7% 434543 ± 9% -5.5% 517900 ± 7% sched_debug.cfs_rq:/.spread0.avg
655329 ± 6% -16.2% 549218 ± 6% -4.7% 624275 ± 5% sched_debug.cfs_rq:/.spread0.max
143388 ± 3% -11.9% 126295 ± 4% -0.6% 142588 ± 3% sched_debug.cfs_rq:/.spread0.stddev
240478 ± 6% -12.0% 211715 ± 5% -3.2% 232667 ± 8% sched_debug.cpu.avg_idle.avg
1938 ± 5% +11.4% 2160 ± 3% -2.1% 1897 ± 4% sched_debug.cpu.nr_switches.min
39960890 ± 6% +54.7% 61837739 ± 4% +5.0% 41939453 ± 11% proc-vmstat.numa_hint_faults
19987976 ± 6% +55.1% 30996483 ± 4% +5.0% 20978472 ± 11% proc-vmstat.numa_hint_faults_local
28840932 ± 3% +8.0% 31160418 ± 4% -0.3% 28764186 ± 4% proc-vmstat.numa_hit
28753783 ± 3% +8.1% 31074486 ± 4% -0.3% 28675501 ± 4% proc-vmstat.numa_local
19745743 ± 5% +11.8% 22080123 ± 6% -0.4% 19668879 ± 6% proc-vmstat.numa_pages_migrated
40107839 ± 6% +54.6% 61988683 ± 4% +5.0% 42094380 ± 11% proc-vmstat.numa_pte_updates
37158989 ± 2% +6.3% 39482935 ± 3% -0.2% 37080293 ± 3% proc-vmstat.pgalloc_normal
68856116 ± 5% +35.1% 93057570 ± 3% +2.8% 70755839 ± 8% proc-vmstat.pgfault
19745743 ± 5% +11.8% 22080123 ± 6% -0.4% 19668879 ± 6% proc-vmstat.pgmigrate_success
19754280 ± 5% +11.8% 22080663 ± 6% -0.4% 19677784 ± 6% proc-vmstat.pgreuse
8953845 ± 3% +13.3% 10142474 ± 2% +0.7% 9013008 ± 2% perf-stat.i.branch-misses
158.09 +7.5% 170.00 ± 2% +1.5% 160.38 ± 3% perf-stat.i.cpu-migrations
9.10 -0.1 8.97 -0.0 9.08 perf-stat.i.dTLB-store-miss-rate%
2454429 ± 2% +26.7% 3110501 ± 5% -5.2% 2326293 ± 3% perf-stat.i.iTLB-load-misses
0.31 ± 38% -68.9% 0.10 ± 31% -11.2% 0.27 ± 22% perf-stat.i.major-faults
224958 ± 5% +35.4% 304571 ± 3% +2.7% 231063 ± 8% perf-stat.i.minor-faults
224959 ± 5% +35.4% 304571 ± 3% +2.7% 231064 ± 8% perf-stat.i.page-faults
0.08 ± 4% +0.0 0.09 ± 3% +0.0 0.08 ± 2% perf-stat.overall.branch-miss-rate%
9.38 -0.1 9.25 -0.0 9.37 perf-stat.overall.dTLB-store-miss-rate%
95.49 +1.0 96.53 -0.3 95.15 perf-stat.overall.iTLB-load-miss-rate%
20490 ± 3% -21.5% 16077 ± 6% +4.5% 21404 ± 4% perf-stat.overall.instructions-per-iTLB-miss
8906114 ± 3% +13.3% 10090374 ± 2% +0.7% 8968593 ± 2% perf-stat.ps.branch-misses
157.57 +7.6% 169.49 ± 2% +1.4% 159.76 ± 3% perf-stat.ps.cpu-migrations
2444301 ± 2% +26.8% 3098710 ± 5% -5.2% 2317560 ± 3% perf-stat.ps.iTLB-load-misses
0.31 ± 38% -68.8% 0.10 ± 31% -10.8% 0.27 ± 22% perf-stat.ps.major-faults
224444 ± 5% +35.3% 303619 ± 3% +2.7% 230589 ± 8% perf-stat.ps.minor-faults
224444 ± 5% +35.3% 303620 ± 3% +2.7% 230589 ± 8% perf-stat.ps.page-faults
1.26 ± 15% -1.3 0.00 -0.0 1.25 ± 14% perf-profile.calltrace.cycles-pp.migrate_folio_unmap.migrate_pages_batch.migrate_pages.migrate_misplaced_page.do_numa_page
1.14 ± 15% -1.1 0.00 -0.0 1.12 ± 14% perf-profile.calltrace.cycles-pp.try_to_migrate.migrate_folio_unmap.migrate_pages_batch.migrate_pages.migrate_misplaced_page
1.12 ± 15% -1.1 0.00 -0.0 1.11 ± 14% perf-profile.calltrace.cycles-pp.rmap_walk_anon.try_to_migrate.migrate_folio_unmap.migrate_pages_batch.migrate_pages
1.08 ± 15% -1.1 0.00 -0.0 1.06 ± 14% perf-profile.calltrace.cycles-pp.try_to_migrate_one.rmap_walk_anon.try_to_migrate.migrate_folio_unmap.migrate_pages_batch
0.92 ± 15% -0.9 0.00 -0.0 0.92 ± 14% perf-profile.calltrace.cycles-pp.ptep_clear_flush.try_to_migrate_one.rmap_walk_anon.try_to_migrate.migrate_folio_unmap
0.91 ± 15% -0.9 0.00 -0.0 0.91 ± 14% perf-profile.calltrace.cycles-pp.flush_tlb_mm_range.ptep_clear_flush.try_to_migrate_one.rmap_walk_anon.try_to_migrate
0.91 ± 15% -0.9 0.00 -0.0 0.91 ± 14% perf-profile.calltrace.cycles-pp.on_each_cpu_cond_mask.flush_tlb_mm_range.ptep_clear_flush.try_to_migrate_one.rmap_walk_anon
0.91 ± 15% -0.9 0.00 -0.0 0.90 ± 14% perf-profile.calltrace.cycles-pp.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.ptep_clear_flush.try_to_migrate_one
72.48 ± 3% -0.7 71.79 +2.8 75.24 ± 5% perf-profile.calltrace.cycles-pp.do_access
0.26 ±112% -0.3 0.00 +0.1 0.34 ± 82% perf-profile.calltrace.cycles-pp._raw_spin_lock.do_numa_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
0.19 ±141% -0.2 0.00 -0.0 0.16 ±153% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.do_numa_page.__handle_mm_fault.handle_mm_fault
0.07 ±282% -0.1 0.00 -0.1 0.00 perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.nrand48_r
0.07 ±282% -0.1 0.00 -0.1 0.00 perf-profile.calltrace.cycles-pp.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.nrand48_r
0.07 ±282% -0.1 0.00 -0.1 0.00 perf-profile.calltrace.cycles-pp.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.nrand48_r
0.06 ±282% -0.1 0.00 -0.1 0.00 perf-profile.calltrace.cycles-pp.rmap_walk_anon.migrate_pages_batch.migrate_pages.migrate_misplaced_page.do_numa_page
0.13 ±188% -0.0 0.11 ±187% -0.1 0.00 perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.nrand48_r
4.13 ± 3% -0.0 4.12 -0.1 3.98 ± 6% perf-profile.calltrace.cycles-pp.do_rw_once
1.34 ± 39% +0.0 1.35 ± 25% -0.2 1.16 ± 22% perf-profile.calltrace.cycles-pp.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt
0.55 ± 69% +0.0 0.60 ± 56% -0.1 0.50 ± 52% perf-profile.calltrace.cycles-pp.scheduler_tick.update_process_times.tick_sched_handle.tick_sched_timer.__hrtimer_run_queues
1.09 ± 31% +0.1 1.14 ± 26% -0.2 0.93 ± 37% perf-profile.calltrace.cycles-pp.tick_sched_handle.tick_sched_timer.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt
1.08 ± 31% +0.1 1.13 ± 26% -0.2 0.92 ± 37% perf-profile.calltrace.cycles-pp.update_process_times.tick_sched_handle.tick_sched_timer.__hrtimer_run_queues.hrtimer_interrupt
0.00 +0.1 0.06 ±282% +0.0 0.00 perf-profile.calltrace.cycles-pp.move_to_new_folio.migrate_pages_batch.migrate_pages.migrate_misplaced_page.do_numa_page
0.00 +0.1 0.06 ±282% +0.0 0.00 perf-profile.calltrace.cycles-pp.migrate_folio_extra.move_to_new_folio.migrate_pages_batch.migrate_pages.migrate_misplaced_page
1.18 ± 30% +0.1 1.24 ± 26% -0.1 1.07 ± 23% perf-profile.calltrace.cycles-pp.tick_sched_timer.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt
1.52 ± 28% +0.1 1.58 ± 25% -0.2 1.36 ± 21% perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.do_access
1.43 ± 29% +0.1 1.50 ± 25% -0.1 1.29 ± 21% perf-profile.calltrace.cycles-pp.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.do_access
1.44 ± 28% +0.1 1.51 ± 25% -0.1 1.30 ± 21% perf-profile.calltrace.cycles-pp.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.do_access
1.72 ± 25% +0.1 1.80 ± 22% -0.2 1.55 ± 20% perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.do_access
6.40 ± 9% +0.1 6.54 -0.6 5.76 ± 17% perf-profile.calltrace.cycles-pp.lrand48_r
0.17 ±196% +0.2 0.33 ± 89% -0.1 0.11 ±200% perf-profile.calltrace.cycles-pp.task_tick_fair.scheduler_tick.update_process_times.tick_sched_handle.tick_sched_timer
0.00 +0.3 0.26 ±113% +0.0 0.00 perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.do_access
0.00 +0.3 0.26 ±113% +0.0 0.00 perf-profile.calltrace.cycles-pp.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.do_access
0.00 +0.3 0.33 ± 91% +0.0 0.00 perf-profile.calltrace.cycles-pp.sysvec_call_function.asm_sysvec_call_function.do_access
19.08 ± 10% +0.5 19.59 -2.2 16.90 ± 19% perf-profile.calltrace.cycles-pp.nrand48_r
0.00 +0.6 0.59 ± 40% +0.0 0.00 perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.do_access
3.30 ± 15% +0.9 4.18 ± 19% -0.1 3.24 ± 14% perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
3.34 ± 15% +0.9 4.22 ± 19% -0.1 3.27 ± 14% perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.do_access
0.00 +0.9 0.90 ± 18% +0.0 0.00 perf-profile.calltrace.cycles-pp.llist_add_batch.smp_call_function_many_cond.on_each_cpu_cond_mask.arch_tlbbatch_flush.try_to_unmap_flush
3.70 ± 15% +0.9 4.64 ± 19% -0.1 3.60 ± 14% perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.do_access
3.68 ± 15% +0.9 4.63 ± 19% -0.1 3.59 ± 14% perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.do_access
3.89 ± 14% +1.0 4.85 ± 19% -0.1 3.76 ± 14% perf-profile.calltrace.cycles-pp.asm_exc_page_fault.do_access
3.03 ± 15% +1.0 4.03 ± 19% -0.1 2.98 ± 14% perf-profile.calltrace.cycles-pp.do_numa_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
2.46 ± 15% +1.4 3.85 ± 19% -0.1 2.41 ± 14% perf-profile.calltrace.cycles-pp.migrate_misplaced_page.do_numa_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
2.27 ± 15% +1.4 3.67 ± 19% -0.0 2.22 ± 14% perf-profile.calltrace.cycles-pp.migrate_pages_batch.migrate_pages.migrate_misplaced_page.do_numa_page.__handle_mm_fault
2.27 ± 15% +1.4 3.68 ± 19% -0.0 2.23 ± 14% perf-profile.calltrace.cycles-pp.migrate_pages.migrate_misplaced_page.do_numa_page.__handle_mm_fault.handle_mm_fault
0.00 +2.4 2.38 ± 18% +0.0 0.00 perf-profile.calltrace.cycles-pp.smp_call_function_many_cond.on_each_cpu_cond_mask.arch_tlbbatch_flush.try_to_unmap_flush.migrate_pages_batch
0.00 +2.4 2.40 ± 18% +0.0 0.00 perf-profile.calltrace.cycles-pp.on_each_cpu_cond_mask.arch_tlbbatch_flush.try_to_unmap_flush.migrate_pages_batch.migrate_pages
0.00 +2.4 2.40 ± 18% +0.0 0.00 perf-profile.calltrace.cycles-pp.try_to_unmap_flush.migrate_pages_batch.migrate_pages.migrate_misplaced_page.do_numa_page
0.00 +2.4 2.40 ± 18% +0.0 0.00 perf-profile.calltrace.cycles-pp.arch_tlbbatch_flush.try_to_unmap_flush.migrate_pages_batch.migrate_pages.migrate_misplaced_page
1.51 ± 16% -1.2 0.31 ± 20% -0.0 1.48 ± 14% perf-profile.children.cycles-pp.rmap_walk_anon
1.25 ± 16% -1.0 0.29 ± 20% -0.0 1.22 ± 15% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
1.08 ± 15% -1.0 0.12 ± 21% -0.0 1.06 ± 14% perf-profile.children.cycles-pp.try_to_migrate_one
1.14 ± 15% -0.9 0.19 ± 19% -0.0 1.12 ± 14% perf-profile.children.cycles-pp.try_to_migrate
0.92 ± 15% -0.9 0.00 -0.0 0.92 ± 14% perf-profile.children.cycles-pp.ptep_clear_flush
1.26 ± 15% -0.9 0.34 ± 21% -0.0 1.25 ± 14% perf-profile.children.cycles-pp.migrate_folio_unmap
0.92 ± 15% -0.9 0.00 -0.0 0.91 ± 14% perf-profile.children.cycles-pp.flush_tlb_mm_range
1.05 ± 15% -0.9 0.16 ± 16% -0.0 1.04 ± 15% perf-profile.children.cycles-pp._raw_spin_lock
72.83 ± 3% -0.6 72.25 +2.8 75.59 ± 5% perf-profile.children.cycles-pp.do_access
0.46 ± 15% -0.3 0.11 ± 20% -0.0 0.44 ± 14% perf-profile.children.cycles-pp.page_vma_mapped_walk
0.34 ± 15% -0.3 0.08 ± 18% -0.0 0.33 ± 15% perf-profile.children.cycles-pp.remove_migration_pte
0.14 ± 16% -0.1 0.00 -0.0 0.14 ± 17% perf-profile.children.cycles-pp.handle_pte_fault
0.13 ± 22% -0.0 0.09 ± 23% -0.0 0.12 ± 17% perf-profile.children.cycles-pp.folio_lruvec_lock_irq
0.13 ± 22% -0.0 0.09 ± 22% -0.0 0.12 ± 18% perf-profile.children.cycles-pp._raw_spin_lock_irq
0.09 ± 39% -0.0 0.07 ± 75% -0.0 0.09 ± 52% perf-profile.children.cycles-pp.cpuacct_account_field
0.17 ± 21% -0.0 0.15 ± 21% -0.0 0.16 ± 15% perf-profile.children.cycles-pp.folio_isolate_lru
0.19 ± 20% -0.0 0.17 ± 20% -0.0 0.18 ± 15% perf-profile.children.cycles-pp.numamigrate_isolate_page
0.12 ± 95% -0.0 0.11 ± 16% -0.1 0.06 ± 13% perf-profile.children.cycles-pp.irqentry_exit_to_user_mode
0.09 ± 47% -0.0 0.08 ± 43% -0.0 0.06 ± 38% perf-profile.children.cycles-pp.hrtimer_active
4.37 ± 3% -0.0 4.36 -0.2 4.22 ± 5% perf-profile.children.cycles-pp.do_rw_once
0.33 ± 2% -0.0 0.32 ± 2% -0.0 0.32 ± 5% perf-profile.children.cycles-pp.lrand48_r@plt
0.01 ±282% -0.0 0.00 -0.0 0.00 perf-profile.children.cycles-pp.enqueue_hrtimer
0.01 ±282% -0.0 0.00 -0.0 0.00 perf-profile.children.cycles-pp.timerqueue_add
0.06 ± 13% -0.0 0.05 ± 37% -0.0 0.04 ± 51% perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
0.06 ± 13% -0.0 0.05 ± 37% -0.0 0.04 ± 51% perf-profile.children.cycles-pp.do_syscall_64
0.01 ±282% -0.0 0.00 -0.0 0.00 perf-profile.children.cycles-pp.lapic_next_deadline
0.01 ±282% -0.0 0.00 -0.0 0.00 perf-profile.children.cycles-pp.hrtimer_update_next_event
0.01 ±282% -0.0 0.00 -0.0 0.00 perf-profile.children.cycles-pp.update_min_vruntime
0.01 ±282% -0.0 0.00 -0.0 0.00 perf-profile.children.cycles-pp.rcu_core
0.15 ± 20% -0.0 0.15 ± 21% -0.0 0.14 ± 17% perf-profile.children.cycles-pp.folio_lruvec_lock_irqsave
0.07 ± 27% -0.0 0.06 ± 55% -0.0 0.05 ± 53% perf-profile.children.cycles-pp.ktime_get
0.01 ±193% -0.0 0.01 ±188% -0.0 0.01 ±201% perf-profile.children.cycles-pp.perf_mux_hrtimer_handler
0.01 ±282% -0.0 0.01 ±282% -0.0 0.01 ±299% perf-profile.children.cycles-pp.perf_rotate_context
0.21 ± 17% -0.0 0.21 ± 18% -0.0 0.20 ± 15% perf-profile.children.cycles-pp._raw_spin_lock_irqsave
0.02 ±209% -0.0 0.02 ±142% -0.0 0.01 ±300% perf-profile.children.cycles-pp.update_cfs_group
0.05 ± 43% -0.0 0.05 ± 57% -0.0 0.04 ± 67% perf-profile.children.cycles-pp.update_irq_load_avg
0.00 +0.0 0.00 +0.0 0.02 ±300% perf-profile.children.cycles-pp.secondary_startup_64_no_verify
0.00 +0.0 0.00 +0.0 0.02 ±300% perf-profile.children.cycles-pp.start_secondary
0.00 +0.0 0.00 +0.0 0.02 ±300% perf-profile.children.cycles-pp.cpu_startup_entry
0.00 +0.0 0.00 +0.0 0.02 ±300% perf-profile.children.cycles-pp.do_idle
0.00 +0.0 0.00 +0.0 0.02 ±300% perf-profile.children.cycles-pp.cpuidle_idle_call
0.00 +0.0 0.00 +0.0 0.02 ±300% perf-profile.children.cycles-pp.cpuidle_enter
0.00 +0.0 0.00 +0.0 0.02 ±300% perf-profile.children.cycles-pp.cpuidle_enter_state
0.00 +0.0 0.00 +0.0 0.02 ±300% perf-profile.children.cycles-pp.mwait_idle_with_hints
0.00 +0.0 0.00 +0.0 0.02 ±300% perf-profile.children.cycles-pp.intel_idle
0.06 ± 18% +0.0 0.07 ± 41% -0.0 0.05 ± 66% perf-profile.children.cycles-pp.rcu_pending
0.02 ±112% +0.0 0.03 ±111% -0.0 0.01 ±300% perf-profile.children.cycles-pp.timerqueue_del
0.02 ±111% +0.0 0.03 ±112% +0.0 0.03 ±100% perf-profile.children.cycles-pp.irqtime_account_process_tick
0.06 ± 18% +0.0 0.06 ± 19% -0.0 0.03 ± 82% perf-profile.children.cycles-pp.mt_find
0.07 ± 39% +0.0 0.07 ± 28% -0.0 0.05 ± 55% perf-profile.children.cycles-pp.ktime_get_update_offsets_now
0.00 +0.0 0.01 ±282% +0.0 0.00 perf-profile.children.cycles-pp._find_next_bit
0.00 +0.0 0.01 ±282% +0.0 0.00 perf-profile.children.cycles-pp.folio_get_anon_vma
0.00 +0.0 0.01 ±282% +0.0 0.00 perf-profile.children.cycles-pp.__free_one_page
0.06 ± 18% +0.0 0.06 ± 20% -0.0 0.03 ± 82% perf-profile.children.cycles-pp.find_vma
0.11 ± 25% +0.0 0.11 ± 25% -0.0 0.09 ± 38% perf-profile.children.cycles-pp.update_rq_clock
0.32 ± 19% +0.0 0.33 ± 32% -0.0 0.30 ± 31% perf-profile.children.cycles-pp.account_user_time
0.21 ± 48% +0.0 0.22 ± 28% -0.0 0.18 ± 23% perf-profile.children.cycles-pp.update_load_avg
0.09 ± 20% +0.0 0.09 ± 23% -0.0 0.08 ± 38% perf-profile.children.cycles-pp.tick_sched_do_timer
0.02 ±154% +0.0 0.03 ± 92% -0.0 0.02 ±155% perf-profile.children.cycles-pp.__do_softirq
0.07 ± 35% +0.0 0.08 ± 26% -0.0 0.07 ± 20% perf-profile.children.cycles-pp.clockevents_program_event
0.08 ± 36% +0.0 0.09 ± 24% -0.0 0.07 ± 19% perf-profile.children.cycles-pp.__irq_exit_rcu
0.03 ±127% +0.0 0.04 ± 72% -0.0 0.02 ±123% perf-profile.children.cycles-pp.__update_load_avg_cfs_rq
0.08 ± 18% +0.0 0.09 ± 26% -0.0 0.06 ± 53% perf-profile.children.cycles-pp.rcu_sched_clock_irq
0.00 +0.0 0.01 ±187% +0.0 0.00 perf-profile.children.cycles-pp.lru_add_fn
0.21 ± 19% +0.0 0.22 ± 21% -0.0 0.20 ± 15% perf-profile.children.cycles-pp.folio_batch_move_lru
0.21 ± 19% +0.0 0.22 ± 20% -0.0 0.20 ± 15% perf-profile.children.cycles-pp.lru_add_drain
0.21 ± 19% +0.0 0.22 ± 20% -0.0 0.20 ± 15% perf-profile.children.cycles-pp.lru_add_drain_cpu
0.06 ± 39% +0.0 0.07 ± 21% +0.0 0.06 ± 15% perf-profile.children.cycles-pp.rmqueue_bulk
0.06 ± 16% +0.0 0.08 ± 21% -0.0 0.06 ± 13% perf-profile.children.cycles-pp.free_unref_page
0.09 ± 16% +0.0 0.11 ± 22% -0.0 0.09 ± 14% perf-profile.children.cycles-pp.__alloc_pages
0.09 ± 15% +0.0 0.11 ± 21% -0.0 0.09 ± 17% perf-profile.children.cycles-pp.rmqueue
0.09 ± 16% +0.0 0.11 ± 21% -0.0 0.09 ± 14% perf-profile.children.cycles-pp.get_page_from_freelist
0.03 ± 71% +0.0 0.05 ± 39% -0.0 0.03 ± 82% perf-profile.children.cycles-pp.free_pcppages_bulk
0.00 +0.0 0.02 ±142% +0.0 0.00 perf-profile.children.cycles-pp.can_change_pte_writable
0.00 +0.0 0.02 ±142% +0.0 0.00 perf-profile.children.cycles-pp.folio_migrate_flags
0.03 ±152% +0.0 0.04 ± 72% +0.0 0.03 ± 84% perf-profile.children.cycles-pp.__update_load_avg_se
0.09 ± 18% +0.0 0.11 ± 22% -0.0 0.09 ± 14% perf-profile.children.cycles-pp.__folio_alloc
0.09 ± 18% +0.0 0.11 ± 22% +0.0 0.09 ± 16% perf-profile.children.cycles-pp.alloc_misplaced_dst_page
0.08 ± 15% +0.0 0.10 ± 21% -0.0 0.08 ± 16% perf-profile.children.cycles-pp.__list_del_entry_valid
0.04 ± 91% +0.0 0.06 ± 38% +0.0 0.04 ± 66% perf-profile.children.cycles-pp.arch_scale_freq_tick
0.11 ± 16% +0.0 0.13 ± 29% -0.0 0.10 ± 28% perf-profile.children.cycles-pp.__cgroup_account_cputime_field
0.19 ± 17% +0.0 0.21 ± 18% -0.0 0.18 ± 18% perf-profile.children.cycles-pp.down_read_trylock
0.03 ±118% +0.0 0.05 ± 59% -0.0 0.03 ±101% perf-profile.children.cycles-pp.perf_trace_sched_stat_runtime
0.02 ±142% +0.0 0.04 ± 72% -0.0 0.01 ±299% perf-profile.children.cycles-pp.irqtime_account_irq
0.25 ± 39% +0.0 0.27 ± 25% -0.0 0.22 ± 22% perf-profile.children.cycles-pp.update_curr
0.09 ± 7% +0.0 0.11 ± 14% -0.0 0.08 ± 15% perf-profile.children.cycles-pp.sync_regs
0.16 ± 13% +0.0 0.18 ± 19% -0.0 0.15 ± 14% perf-profile.children.cycles-pp.up_read
0.68 ± 45% +0.0 0.71 ± 28% -0.1 0.58 ± 24% perf-profile.children.cycles-pp.task_tick_fair
0.02 ±141% +0.0 0.05 ± 42% +0.0 0.02 ±122% perf-profile.children.cycles-pp.uncharge_batch
0.01 ±282% +0.0 0.04 ± 75% +0.0 0.01 ±200% perf-profile.children.cycles-pp.page_counter_uncharge
0.02 ±141% +0.0 0.06 ± 44% +0.0 0.02 ±100% perf-profile.children.cycles-pp.__mem_cgroup_uncharge
0.02 ±141% +0.0 0.06 ± 44% +0.0 0.02 ±100% perf-profile.children.cycles-pp.__folio_put
0.03 ± 71% +0.0 0.08 ± 25% -0.0 0.03 ± 82% perf-profile.children.cycles-pp.mem_cgroup_migrate
0.96 ± 40% +0.0 1.00 ± 27% -0.1 0.81 ± 24% perf-profile.children.cycles-pp.scheduler_tick
0.11 ± 20% +0.0 0.16 ± 15% -0.0 0.11 ± 11% perf-profile.children.cycles-pp.native_irq_return_iret
0.06 ± 13% +0.1 0.11 ± 16% -0.0 0.06 ± 13% perf-profile.children.cycles-pp.exit_to_user_mode_prepare
0.05 ± 36% +0.1 0.10 ± 18% -0.0 0.04 ± 51% perf-profile.children.cycles-pp.exit_to_user_mode_loop
0.01 ±187% +0.1 0.07 ± 26% +0.0 0.02 ±122% perf-profile.children.cycles-pp.page_counter_charge
0.04 ± 71% +0.1 0.10 ± 18% +0.0 0.04 ± 51% perf-profile.children.cycles-pp.task_work_run
0.17 ± 14% +0.1 0.23 ± 20% -0.0 0.17 ± 15% perf-profile.children.cycles-pp.copy_page
0.17 ± 13% +0.1 0.24 ± 19% -0.0 0.17 ± 15% perf-profile.children.cycles-pp.folio_copy
0.03 ± 90% +0.1 0.10 ± 16% +0.0 0.04 ± 51% perf-profile.children.cycles-pp.change_pte_range
0.03 ± 90% +0.1 0.10 ± 18% +0.0 0.04 ± 51% perf-profile.children.cycles-pp.task_numa_work
0.03 ± 90% +0.1 0.10 ± 18% +0.0 0.04 ± 51% perf-profile.children.cycles-pp.change_prot_numa
0.03 ± 90% +0.1 0.10 ± 18% +0.0 0.04 ± 51% perf-profile.children.cycles-pp.change_protection_range
0.03 ± 90% +0.1 0.10 ± 18% +0.0 0.04 ± 51% perf-profile.children.cycles-pp.change_pmd_range
0.06 ± 40% +0.1 0.13 ± 23% +0.0 0.06 ± 15% perf-profile.children.cycles-pp.__default_send_IPI_dest_field
1.58 ± 32% +0.1 1.65 ± 25% -0.2 1.36 ± 25% perf-profile.children.cycles-pp.tick_sched_handle
1.56 ± 32% +0.1 1.64 ± 25% -0.2 1.35 ± 25% perf-profile.children.cycles-pp.update_process_times
1.85 ± 30% +0.1 1.94 ± 25% -0.2 1.61 ± 24% perf-profile.children.cycles-pp.__hrtimer_run_queues
1.71 ± 31% +0.1 1.79 ± 25% -0.2 1.49 ± 25% perf-profile.children.cycles-pp.tick_sched_timer
0.08 ± 16% +0.1 0.17 ± 21% -0.0 0.08 ± 17% perf-profile.children.cycles-pp.default_send_IPI_mask_sequence_phys
2.09 ± 29% +0.1 2.18 ± 24% -0.3 1.81 ± 23% perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
2.06 ± 29% +0.1 2.16 ± 24% -0.3 1.79 ± 23% perf-profile.children.cycles-pp.hrtimer_interrupt
2.19 ± 29% +0.1 2.29 ± 24% -0.3 1.89 ± 23% perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
0.25 ± 12% +0.1 0.36 ± 20% -0.0 0.25 ± 14% perf-profile.children.cycles-pp.move_to_new_folio
0.25 ± 12% +0.1 0.36 ± 20% -0.0 0.25 ± 14% perf-profile.children.cycles-pp.migrate_folio_extra
0.00 +0.1 0.12 ± 22% +0.0 0.00 perf-profile.children.cycles-pp.native_flush_tlb_local
2.48 ± 26% +0.1 2.60 ± 22% -0.3 2.14 ± 22% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
9.27 ± 8% +0.2 9.45 -0.9 8.41 ± 15% perf-profile.children.cycles-pp.lrand48_r
0.25 ± 14% +0.3 0.55 ± 18% -0.0 0.24 ± 15% perf-profile.children.cycles-pp.llist_reverse_order
0.09 ± 17% +0.4 0.45 ± 21% +0.0 0.09 ± 12% perf-profile.children.cycles-pp.flush_tlb_func
16.69 ± 10% +0.5 17.16 -2.0 14.72 ± 19% perf-profile.children.cycles-pp.nrand48_r
0.40 ± 15% +0.5 0.93 ± 18% -0.0 0.39 ± 14% perf-profile.children.cycles-pp.llist_add_batch
0.41 ± 14% +0.7 1.14 ± 19% -0.0 0.41 ± 14% perf-profile.children.cycles-pp.__sysvec_call_function
0.41 ± 14% +0.7 1.14 ± 19% -0.0 0.41 ± 14% perf-profile.children.cycles-pp.__flush_smp_call_function_queue
0.43 ± 14% +0.7 1.17 ± 19% -0.0 0.42 ± 14% perf-profile.children.cycles-pp.sysvec_call_function
0.55 ± 12% +0.9 1.40 ± 19% -0.0 0.53 ± 15% perf-profile.children.cycles-pp.asm_sysvec_call_function
3.31 ± 15% +0.9 4.19 ± 19% -0.1 3.24 ± 14% perf-profile.children.cycles-pp.__handle_mm_fault
3.34 ± 15% +0.9 4.23 ± 19% -0.1 3.27 ± 14% perf-profile.children.cycles-pp.handle_mm_fault
3.70 ± 15% +0.9 4.64 ± 19% -0.1 3.60 ± 14% perf-profile.children.cycles-pp.exc_page_fault
3.70 ± 15% +0.9 4.64 ± 19% -0.1 3.60 ± 14% perf-profile.children.cycles-pp.do_user_addr_fault
3.91 ± 14% +1.0 4.88 ± 19% -0.1 3.78 ± 14% perf-profile.children.cycles-pp.asm_exc_page_fault
3.03 ± 15% +1.0 4.03 ± 19% -0.1 2.98 ± 14% perf-profile.children.cycles-pp.do_numa_page
2.46 ± 15% +1.4 3.85 ± 19% -0.1 2.41 ± 14% perf-profile.children.cycles-pp.migrate_misplaced_page
2.27 ± 15% +1.4 3.67 ± 19% -0.0 2.22 ± 14% perf-profile.children.cycles-pp.migrate_pages_batch
2.27 ± 15% +1.4 3.68 ± 19% -0.0 2.23 ± 14% perf-profile.children.cycles-pp.migrate_pages
0.91 ± 15% +1.5 2.42 ± 18% -0.0 0.91 ± 14% perf-profile.children.cycles-pp.smp_call_function_many_cond
0.91 ± 15% +1.5 2.42 ± 18% -0.0 0.91 ± 14% perf-profile.children.cycles-pp.on_each_cpu_cond_mask
0.00 +2.4 2.40 ± 18% +0.0 0.00 perf-profile.children.cycles-pp.try_to_unmap_flush
0.00 +2.4 2.40 ± 18% +0.0 0.00 perf-profile.children.cycles-pp.arch_tlbbatch_flush
66.95 ± 3% -2.0 64.95 +3.1 70.02 ± 6% perf-profile.self.cycles-pp.do_access
1.14 ± 16% -0.9 0.28 ± 21% -0.0 1.12 ± 15% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
0.06 ±187% -0.1 0.00 -0.1 0.00 perf-profile.self.cycles-pp.irqentry_exit_to_user_mode
4.08 ± 3% -0.0 4.03 -0.1 3.94 ± 5% perf-profile.self.cycles-pp.do_rw_once
0.09 ± 39% -0.0 0.07 ± 75% -0.0 0.09 ± 52% perf-profile.self.cycles-pp.cpuacct_account_field
0.06 ± 14% -0.0 0.04 ± 72% -0.0 0.03 ± 82% perf-profile.self.cycles-pp.mt_find
0.01 ±282% -0.0 0.00 -0.0 0.00 perf-profile.self.cycles-pp.lapic_next_deadline
0.01 ±188% -0.0 0.01 ±282% -0.0 0.01 ±200% perf-profile.self.cycles-pp.rmap_walk_anon
0.01 ±282% -0.0 0.00 -0.0 0.00 perf-profile.self.cycles-pp.update_min_vruntime
0.08 ± 47% -0.0 0.07 ± 45% -0.0 0.06 ± 38% perf-profile.self.cycles-pp.hrtimer_active
0.29 ± 4% -0.0 0.28 ± 2% -0.0 0.28 ± 6% perf-profile.self.cycles-pp.lrand48_r@plt
0.06 ± 49% -0.0 0.06 ± 56% -0.0 0.05 ± 52% perf-profile.self.cycles-pp.scheduler_tick
0.02 ±209% -0.0 0.02 ±142% -0.0 0.01 ±300% perf-profile.self.cycles-pp.update_cfs_group
0.05 ± 43% -0.0 0.05 ± 57% -0.0 0.04 ± 67% perf-profile.self.cycles-pp.update_irq_load_avg
0.11 ± 49% -0.0 0.11 ± 29% -0.0 0.09 ± 23% perf-profile.self.cycles-pp.update_load_avg
0.09 ± 41% +0.0 0.09 ± 42% -0.0 0.08 ± 24% perf-profile.self.cycles-pp.task_tick_fair
0.00 +0.0 0.00 +0.0 0.02 ±300% perf-profile.self.cycles-pp.mwait_idle_with_hints
0.12 ± 27% +0.0 0.13 ± 36% -0.0 0.10 ± 45% perf-profile.self.cycles-pp.account_user_time
0.11 ± 17% +0.0 0.12 ± 20% -0.0 0.10 ± 15% perf-profile.self.cycles-pp.__handle_mm_fault
0.02 ±111% +0.0 0.03 ±112% +0.0 0.03 ±100% perf-profile.self.cycles-pp.irqtime_account_process_tick
0.06 ± 55% +0.0 0.06 ± 42% -0.0 0.04 ± 84% perf-profile.self.cycles-pp.ktime_get_update_offsets_now
0.08 ± 17% +0.0 0.08 ± 21% -0.0 0.07 ± 15% perf-profile.self.cycles-pp.page_vma_mapped_walk
0.00 +0.0 0.01 ±282% +0.0 0.00 perf-profile.self.cycles-pp.__free_one_page
0.02 ±141% +0.0 0.02 ±112% -0.0 0.01 ±300% perf-profile.self.cycles-pp.hrtimer_interrupt
0.06 ± 42% +0.0 0.07 ± 43% -0.0 0.06 ± 37% perf-profile.self.cycles-pp.update_process_times
0.07 ± 16% +0.0 0.08 ± 25% -0.0 0.07 ± 38% perf-profile.self.cycles-pp.tick_sched_do_timer
0.00 +0.0 0.01 ±187% +0.0 0.00 perf-profile.self.cycles-pp.can_change_pte_writable
0.00 +0.0 0.01 ±187% +0.0 0.00 perf-profile.self.cycles-pp.folio_migrate_flags
0.00 +0.0 0.01 ±188% +0.0 0.00 perf-profile.self.cycles-pp.try_to_migrate_one
0.02 ±191% +0.0 0.03 ± 90% -0.0 0.01 ±200% perf-profile.self.cycles-pp.__update_load_avg_se
0.19 ± 16% +0.0 0.20 ± 19% -0.0 0.17 ± 17% perf-profile.self.cycles-pp.down_read_trylock
0.03 ±113% +0.0 0.04 ± 71% -0.0 0.03 ±100% perf-profile.self.cycles-pp.update_rq_clock
0.09 ± 14% +0.0 0.10 ± 16% -0.0 0.08 ± 17% perf-profile.self.cycles-pp._raw_spin_lock
0.03 ±151% +0.0 0.04 ± 72% -0.0 0.02 ±123% perf-profile.self.cycles-pp.__update_load_avg_cfs_rq
0.01 ±282% +0.0 0.02 ±112% -0.0 0.00 perf-profile.self.cycles-pp._raw_spin_lock_irqsave
0.01 ±282% +0.0 0.02 ±112% -0.0 0.00 perf-profile.self.cycles-pp.rcu_pending
0.10 ± 16% +0.0 0.12 ± 29% -0.0 0.10 ± 26% perf-profile.self.cycles-pp.__cgroup_account_cputime_field
0.01 ±282% +0.0 0.02 ±112% -0.0 0.01 ±300% perf-profile.self.cycles-pp.__hrtimer_run_queues
0.16 ± 41% +0.0 0.18 ± 25% -0.0 0.14 ± 24% perf-profile.self.cycles-pp.update_curr
0.15 ± 14% +0.0 0.17 ± 21% -0.0 0.14 ± 15% perf-profile.self.cycles-pp.up_read
0.04 ± 94% +0.0 0.05 ± 56% -0.0 0.03 ±100% perf-profile.self.cycles-pp.ktime_get
0.07 ± 16% +0.0 0.10 ± 21% +0.0 0.08 ± 16% perf-profile.self.cycles-pp.__list_del_entry_valid
0.04 ± 91% +0.0 0.06 ± 38% +0.0 0.04 ± 66% perf-profile.self.cycles-pp.arch_scale_freq_tick
0.03 ±118% +0.0 0.05 ± 59% -0.0 0.03 ±101% perf-profile.self.cycles-pp.perf_trace_sched_stat_runtime
0.00 +0.0 0.03 ±113% +0.0 0.00 perf-profile.self.cycles-pp.page_counter_uncharge
0.09 ± 7% +0.0 0.11 ± 14% -0.0 0.08 ± 15% perf-profile.self.cycles-pp.sync_regs
0.02 ±111% +0.0 0.06 ± 15% -0.0 0.02 ±152% perf-profile.self.cycles-pp.change_pte_range
0.11 ± 20% +0.0 0.15 ± 15% -0.0 0.11 ± 11% perf-profile.self.cycles-pp.native_irq_return_iret
0.01 ±282% +0.1 0.06 ± 43% -0.0 0.01 ±299% perf-profile.self.cycles-pp.page_counter_charge
0.16 ± 15% +0.1 0.22 ± 21% -0.0 0.16 ± 16% perf-profile.self.cycles-pp.copy_page
0.06 ± 40% +0.1 0.13 ± 23% +0.0 0.06 ± 15% perf-profile.self.cycles-pp.__default_send_IPI_dest_field
0.07 ± 15% +0.1 0.16 ± 18% +0.0 0.07 ± 15% perf-profile.self.cycles-pp.__flush_smp_call_function_queue
0.00 +0.1 0.11 ± 19% +0.0 0.00 perf-profile.self.cycles-pp.native_flush_tlb_local
8.81 ± 9% +0.1 8.94 ± 2% -0.8 7.99 ± 16% perf-profile.self.cycles-pp.lrand48_r
0.06 ± 16% +0.3 0.33 ± 21% -0.0 0.06 ± 36% perf-profile.self.cycles-pp.flush_tlb_func
0.25 ± 14% +0.3 0.55 ± 18% -0.0 0.24 ± 15% perf-profile.self.cycles-pp.llist_reverse_order
13.38 ± 11% +0.3 13.71 -1.7 11.73 ± 21% perf-profile.self.cycles-pp.nrand48_r
0.35 ± 15% +0.4 0.76 ± 18% -0.0 0.34 ± 13% perf-profile.self.cycles-pp.llist_add_batch
0.37 ± 17% +0.7 1.10 ± 18% +0.0 0.38 ± 15% perf-profile.self.cycles-pp.smp_call_function_many_cond

--
Best Regards,
Yujie


> Best Regards,
> Huang, Ying
>
> ---------------------------8<-----------------------------------------
> From b36b662c80652447d7374faff1142a941dc9d617 Mon Sep 17 00:00:00 2001
> From: Huang Ying <ying.huang@xxxxxxxxx>
> Date: Mon, 20 Mar 2023 15:38:12 +0800
> Subject: [PATCH] dbg, migrate_pages: don't batch flushing for single page
>  migration
>
> ---
>  mm/migrate.c | 12 +++++++-----
>  1 file changed, 7 insertions(+), 5 deletions(-)
>
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 98f1c11197a8..7271209c1a03 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1113,8 +1113,8 @@ static void migrate_folio_done(struct folio *src,
>  static int migrate_folio_unmap(new_page_t get_new_page, free_page_t put_new_page,
>                                unsigned long private, struct folio *src,
>                                struct folio **dstp, int force, bool avoid_force_lock,
> -                              enum migrate_mode mode, enum migrate_reason reason,
> -                              struct list_head *ret)
> +                              bool batch_flush, enum migrate_mode mode,
> +                              enum migrate_reason reason, struct list_head *ret)
>  {
>         struct folio *dst;
>         int rc = -EAGAIN;
> @@ -1253,7 +1253,7 @@ static int migrate_folio_unmap(new_page_t get_new_page, free_page_t put_new_page
>                 /* Establish migration ptes */
>                 VM_BUG_ON_FOLIO(folio_test_anon(src) &&
>                                !folio_test_ksm(src) && !anon_vma, src);
> -               try_to_migrate(src, TTU_BATCH_FLUSH);
> +               try_to_migrate(src, batch_flush ? TTU_BATCH_FLUSH : 0);
>                 page_was_mapped = 1;
>         }
>  
> @@ -1641,6 +1641,7 @@ static int migrate_pages_batch(struct list_head *from, new_page_t get_new_page,
>         bool nosplit = (reason == MR_NUMA_MISPLACED);
>         bool no_split_folio_counting = false;
>         bool avoid_force_lock;
> +       bool batch_flush = !list_is_singular(from);
>  
>  retry:
>         rc_saved = 0;
> @@ -1690,7 +1691,7 @@ static int migrate_pages_batch(struct list_head *from, new_page_t get_new_page,
>  
>                         rc = migrate_folio_unmap(get_new_page, put_new_page, private,
>                                                  folio, &dst, pass > 2, avoid_force_lock,
> -                                                mode, reason, ret_folios);
> +                                                batch_flush, mode, reason, ret_folios);
>                         /*
>                          * The rules are:
>                          *      Success: folio will be freed
> @@ -1804,7 +1805,8 @@ static int migrate_pages_batch(struct list_head *from, new_page_t get_new_page,
>         stats->nr_failed_pages += nr_retry_pages;
>  move:
>         /* Flush TLBs for all unmapped folios */
> -       try_to_unmap_flush();
> +       if (batch_flush)
> +               try_to_unmap_flush();
>  
>         retry = 1;
>         for (pass = 0;