Re: [linus:master] [migrate_pages] 7e12beb8ca: vm-scalability.throughput -3.4% regression

From: Liu, Yujie
Date: Mon Mar 20 2023 - 23:24:54 EST


Hi Ying,

On Mon, 2023-03-20 at 15:58 +0800, Huang, Ying wrote:
> Hi, Yujie,
>
> kernel test robot <yujie.liu@xxxxxxxxx> writes:
>
> > Hello,
> >
> > FYI, we noticed a -3.4% regression of vm-scalability.throughput due to commit:
> >
> > commit: 7e12beb8ca2ac98b2ec42e0ea4b76cdc93b58654 ("migrate_pages: batch flushing TLB")
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> >
> > in testcase: vm-scalability
> > on test machine: 96 threads 2 sockets Intel(R) Xeon(R) Platinum 8260L CPU @ 2.40GHz (Cascade Lake) with 128G memory
> > with following parameters:
> >
> >         runtime: 300s
> >         size: 512G
> >         test: anon-cow-rand-mt
> >         cpufreq_governor: performance
> >
> > test-description: The motivation behind this suite is to exercise functions and regions of the mm/ of the Linux kernel which are of interest to us.
> > test-url: https://git.kernel.org/cgit/linux/kernel/git/wfg/vm-scalability.git/
> >
> >
> > If you fix the issue, kindly add following tag
> > > Reported-by: kernel test robot <yujie.liu@xxxxxxxxx>
> > > Link: https://lore.kernel.org/oe-lkp/202303192325.ecbaf968-yujie.liu@xxxxxxxxx
> >
>
> Thanks a lot for report!  Can you try whether the debug patch as
> below can restore the regression?

We've tested the patch and found the throughput score was partially
restored from -3.6% to -1.4%, still with a slight performance drop.
Please check the detailed data as follows:

=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/runtime/size/tbox_group/test/testcase:
gcc-11/performance/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/300s/512G/lkp-csl-2sp3/anon-cow-rand-mt/vm-scalability

commit:
ebe75e4751063 ("migrate_pages: share more code between _unmap and _move")
7e12beb8ca2ac ("migrate_pages: batch flushing TLB")
9a30245d65679 ("dbg, rmap: avoid flushing TLB in batch if PTE is inaccessible")

ebe75e4751063dce 7e12beb8ca2ac98b2ec42e0ea4b 9a30245d656794d171cd798a2be
---------------- --------------------------- ---------------------------
%stddev %change %stddev %change %stddev
\ | \ | \
57634 -3.5% 55603 -1.5% 56788 vm-scalability.median
81.16 ± 12% -5.0 76.17 ± 35% -20.0 61.18 ± 21% vm-scalability.stddev%
5528051 -3.6% 5328506 -1.4% 5449450 vm-scalability.throughput
200293 ± 3% -7.3% 185675 ± 2% -4.3% 191707 ± 2% vm-scalability.time.involuntary_context_switches
67952989 ± 5% +43.1% 97269013 ± 2% +35.6% 92147668 ± 3% vm-scalability.time.minor_page_faults
9006 -1.8% 8844 -0.6% 8956 vm-scalability.time.percent_of_cpu_this_job_got
1178 ± 3% +57.2% 1852 ± 3% +8.6% 1278 ± 3% vm-scalability.time.system_time
26327 -4.5% 25132 -1.0% 26056 vm-scalability.time.user_time
11378 ± 5% +359.9% 52332 ± 7% +118.5% 24867 ± 7% vm-scalability.time.voluntary_context_switches
1.662e+09 -3.7% 1.601e+09 -1.5% 1.638e+09 vm-scalability.workload
79922 ± 3% +9.3% 87378 ± 3% +3.3% 82589 ± 8% numa-meminfo.node1.SUnreclaim
399014 ±192% -84.9% 60246 ±129% -13.6% 344869 ±239% numa-meminfo.node1.Unevictable
2022 ± 3% +11.6% 2257 +3.6% 2095 vmstat.system.cs
539357 ± 2% +187.0% 1547747 ± 8% +32.9% 716886 ± 4% vmstat.system.in
0.00 ±184% +0.0 0.00 ± 6% +0.0 0.00 ± 25% mpstat.cpu.all.iowait%
2.58 +1.7 4.27 ± 4% +0.5 3.09 ± 3% mpstat.cpu.all.irq%
4.06 ± 3% +2.3 6.36 ± 3% +0.3 4.40 ± 3% mpstat.cpu.all.sys%
19980 ± 3% +9.3% 21844 ± 3% +3.3% 20646 ± 8% numa-vmstat.node1.nr_slab_unreclaimable
99752 ±192% -84.9% 15061 ±129% -13.6% 86216 ±239% numa-vmstat.node1.nr_unevictable
99752 ±192% -84.9% 15061 ±129% -13.6% 86216 ±239% numa-vmstat.node1.nr_zone_unevictable
205569 ± 7% +131.1% 475135 ± 99% +66.5% 342364 ± 91% turbostat.C1
1.382e+09 ± 2% +140.0% 3.317e+09 ± 5% +30.4% 1.803e+09 ± 3% turbostat.IRQ
9095 ± 14% +446.4% 49695 ± 7% +149.0% 22643 ± 11% turbostat.POLL
86.84 -2.4% 84.76 -1.4% 85.63 turbostat.RAMWatt
200293 ± 3% -7.3% 185675 ± 2% -4.3% 191707 ± 2% time.involuntary_context_switches
67.11 ± 56% -92.3% 5.17 ± 55% -95.4% 3.11 ± 80% time.major_page_faults
67952989 ± 5% +43.1% 97269013 ± 2% +35.6% 92147668 ± 3% time.minor_page_faults
9006 -1.8% 8844 -0.6% 8956 time.percent_of_cpu_this_job_got
1178 ± 3% +57.2% 1852 ± 3% +8.6% 1278 ± 3% time.system_time
26327 -4.5% 25132 -1.0% 26056 time.user_time
11378 ± 5% +359.9% 52332 ± 7% +118.5% 24867 ± 7% time.voluntary_context_switches
143480 ± 3% -20.9% 113504 ± 11% -12.0% 126262 ± 4% sched_debug.cfs_rq:/.min_vruntime.stddev
548123 ± 7% -49.1% 279239 ± 34% -20.7% 434543 ± 9% sched_debug.cfs_rq:/.spread0.avg
655329 ± 6% -36.3% 417735 ± 22% -16.2% 549218 ± 6% sched_debug.cfs_rq:/.spread0.max
143388 ± 3% -20.8% 113612 ± 11% -11.9% 126295 ± 4% sched_debug.cfs_rq:/.spread0.stddev
39.81 ± 28% +45.0% 57.73 ± 19% +17.8% 46.89 ± 44% sched_debug.cfs_rq:/.util_est_enqueued.stddev
240478 ± 6% -12.9% 209367 ± 7% -12.0% 211715 ± 5% sched_debug.cpu.avg_idle.avg
1597 +10.4% 1763 ± 3% +2.3% 1633 sched_debug.cpu.clock_task.stddev
1938 ± 5% +29.1% 2503 +11.4% 2160 ± 3% sched_debug.cpu.nr_switches.min
39960890 ± 6% +68.3% 67272793 ± 2% +54.7% 61837739 ± 4% proc-vmstat.numa_hint_faults
19987976 ± 6% +68.7% 33722069 ± 2% +55.1% 30996483 ± 4% proc-vmstat.numa_hint_faults_local
28840932 ± 3% +6.9% 30817082 ± 5% +8.0% 31160418 ± 4% proc-vmstat.numa_hit
28753783 ± 3% +6.9% 30727992 ± 5% +8.1% 31074486 ± 4% proc-vmstat.numa_local
19745743 ± 5% +10.0% 21720583 ± 7% +11.8% 22080123 ± 6% proc-vmstat.numa_pages_migrated
40107839 ± 6% +68.1% 67430626 ± 2% +54.6% 61988683 ± 4% proc-vmstat.numa_pte_updates
37158989 ± 2% +5.3% 39124260 ± 3% +6.3% 39482935 ± 3% proc-vmstat.pgalloc_normal
68856116 ± 5% +42.6% 98184580 ± 2% +35.1% 93057570 ± 3% proc-vmstat.pgfault
19745743 ± 5% +10.0% 21720583 ± 7% +11.8% 22080123 ± 6% proc-vmstat.pgmigrate_success
19754280 ± 5% +10.0% 21735325 ± 7% +11.8% 22080663 ± 6% proc-vmstat.pgreuse
0.17 ± 7% +0.1 0.23 ± 3% +0.0 0.18 ± 5% perf-stat.i.branch-miss-rate%
8953845 ± 3% +61.0% 14417578 ± 3% +13.3% 10142474 ± 2% perf-stat.i.branch-misses
66.30 -1.8 64.47 -0.3 65.98 perf-stat.i.cache-miss-rate%
1904 ± 3% +12.3% 2139 +3.9% 1979 perf-stat.i.context-switches
158.09 +11.3% 175.92 ± 3% +7.5% 170.00 ± 2% perf-stat.i.cpu-migrations
0.04 ± 9% +0.0 0.05 ± 11% +0.0 0.04 ± 7% perf-stat.i.dTLB-load-miss-rate%
4856144 ± 8% +41.5% 6870029 ± 9% +12.3% 5455416 ± 7% perf-stat.i.dTLB-load-misses
9.10 -0.4 8.71 -0.1 8.97 perf-stat.i.dTLB-store-miss-rate%
5.33e+08 -4.4% 5.095e+08 -1.8% 5.233e+08 perf-stat.i.dTLB-store-misses
2454429 ± 2% +159.7% 6374895 ± 7% +26.7% 3110501 ± 5% perf-stat.i.iTLB-load-misses
116140 ± 2% +60.9% 186840 ± 7% -3.6% 111933 ± 4% perf-stat.i.iTLB-loads
41691 ± 5% -23.0% 32083 ± 26% +1.7% 42380 ± 20% perf-stat.i.instructions-per-iTLB-miss
0.31 ± 38% -59.1% 0.13 ± 27% -68.9% 0.10 ± 31% perf-stat.i.major-faults
224958 ± 5% +42.4% 320417 ± 2% +35.4% 304571 ± 3% perf-stat.i.minor-faults
50.61 +1.6 52.22 +0.7 51.35 perf-stat.i.node-load-miss-rate%
1.169e+08 +3.3% 1.208e+08 +0.9% 1.179e+08 perf-stat.i.node-load-misses
1.132e+08 -3.7% 1.089e+08 -2.1% 1.108e+08 perf-stat.i.node-loads
2.688e+08 -3.9% 2.582e+08 -1.8% 2.64e+08 perf-stat.i.node-store-misses
2.664e+08 -4.5% 2.543e+08 -1.7% 2.618e+08 perf-stat.i.node-stores
224959 ± 5% +42.4% 320418 ± 2% +35.4% 304571 ± 3% perf-stat.i.page-faults
0.08 ± 4% +0.0 0.12 ± 4% +0.0 0.09 ± 3% perf-stat.overall.branch-miss-rate%
67.15 -1.9 65.28 -0.5 66.64 perf-stat.overall.cache-miss-rate%
366.74 +2.9% 377.43 +1.2% 371.26 perf-stat.overall.cycles-between-cache-misses
0.03 ± 8% +0.0 0.05 ± 10% +0.0 0.04 ± 8% perf-stat.overall.dTLB-load-miss-rate%
9.38 -0.4 8.97 -0.1 9.25 perf-stat.overall.dTLB-store-miss-rate%
95.49 +1.7 97.16 +1.0 96.53 perf-stat.overall.iTLB-load-miss-rate%
20490 ± 3% -61.8% 7826 ± 7% -21.5% 16077 ± 6% perf-stat.overall.instructions-per-iTLB-miss
50.81 +1.8 52.60 +0.8 51.56 perf-stat.overall.node-load-miss-rate%
9210 +3.0% 9485 +0.7% 9271 perf-stat.overall.path-length
8906114 ± 3% +61.8% 14412101 ± 3% +13.3% 10090374 ± 2% perf-stat.ps.branch-misses
1906 ± 3% +12.3% 2142 +3.8% 1979 perf-stat.ps.context-switches
157.57 +11.7% 176.03 ± 3% +7.6% 169.49 ± 2% perf-stat.ps.cpu-migrations
4843373 ± 8% +41.9% 6871859 ± 9% +12.3% 5440606 ± 7% perf-stat.ps.dTLB-load-misses
5.313e+08 -4.4% 5.077e+08 -1.8% 5.218e+08 perf-stat.ps.dTLB-store-misses
2444301 ± 2% +161.3% 6385873 ± 7% +26.8% 3098710 ± 5% perf-stat.ps.iTLB-load-misses
115384 ± 2% +61.5% 186290 ± 7% -3.7% 111109 ± 4% perf-stat.ps.iTLB-loads
0.31 ± 38% -59.0% 0.13 ± 27% -68.8% 0.10 ± 31% perf-stat.ps.major-faults
224444 ± 5% +42.8% 320615 ± 2% +35.3% 303619 ± 3% perf-stat.ps.minor-faults
1.165e+08 +3.4% 1.205e+08 +0.9% 1.176e+08 perf-stat.ps.node-load-misses
1.128e+08 -3.8% 1.086e+08 -2.1% 1.105e+08 perf-stat.ps.node-loads
2.68e+08 -4.0% 2.573e+08 -1.8% 2.632e+08 perf-stat.ps.node-store-misses
2.656e+08 -4.6% 2.534e+08 -1.7% 2.61e+08 perf-stat.ps.node-stores
224444 ± 5% +42.8% 320615 ± 2% +35.3% 303620 ± 3% perf-stat.ps.page-faults
19.08 ± 10% -1.7 17.34 ± 4% +0.5 19.59 perf-profile.calltrace.cycles-pp.nrand48_r
1.26 ± 15% -1.3 0.00 -1.3 0.00 perf-profile.calltrace.cycles-pp.migrate_folio_unmap.migrate_pages_batch.migrate_pages.migrate_misplaced_page.do_numa_page
1.14 ± 15% -1.1 0.00 -1.1 0.00 perf-profile.calltrace.cycles-pp.try_to_migrate.migrate_folio_unmap.migrate_pages_batch.migrate_pages.migrate_misplaced_page
1.12 ± 15% -1.1 0.00 -1.1 0.00 perf-profile.calltrace.cycles-pp.rmap_walk_anon.try_to_migrate.migrate_folio_unmap.migrate_pages_batch.migrate_pages
1.08 ± 15% -1.1 0.00 -1.1 0.00 perf-profile.calltrace.cycles-pp.try_to_migrate_one.rmap_walk_anon.try_to_migrate.migrate_folio_unmap.migrate_pages_batch
0.92 ± 15% -0.9 0.00 -0.9 0.00 perf-profile.calltrace.cycles-pp.ptep_clear_flush.try_to_migrate_one.rmap_walk_anon.try_to_migrate.migrate_folio_unmap
0.91 ± 15% -0.9 0.00 -0.9 0.00 perf-profile.calltrace.cycles-pp.flush_tlb_mm_range.ptep_clear_flush.try_to_migrate_one.rmap_walk_anon.try_to_migrate
0.91 ± 15% -0.9 0.00 -0.9 0.00 perf-profile.calltrace.cycles-pp.on_each_cpu_cond_mask.flush_tlb_mm_range.ptep_clear_flush.try_to_migrate_one.rmap_walk_anon
0.91 ± 15% -0.9 0.00 -0.9 0.00 perf-profile.calltrace.cycles-pp.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.ptep_clear_flush.try_to_migrate_one
6.40 ± 9% -0.5 5.94 ± 4% +0.1 6.54 perf-profile.calltrace.cycles-pp.lrand48_r
0.26 ±112% -0.3 0.00 -0.3 0.00 perf-profile.calltrace.cycles-pp._raw_spin_lock.do_numa_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
0.19 ±141% -0.2 0.00 -0.2 0.00 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.do_numa_page.__handle_mm_fault.handle_mm_fault
4.13 ± 3% -0.1 4.04 -0.0 4.12 perf-profile.calltrace.cycles-pp.do_rw_once
0.06 ±282% -0.1 0.00 -0.1 0.00 perf-profile.calltrace.cycles-pp.rmap_walk_anon.migrate_pages_batch.migrate_pages.migrate_misplaced_page.do_numa_page
0.13 ±188% +0.1 0.24 ±144% -0.0 0.11 ±187% perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.nrand48_r
0.00 +0.1 0.10 ±223% +0.0 0.00 perf-profile.calltrace.cycles-pp.update_load_avg.task_tick_fair.scheduler_tick.update_process_times.tick_sched_handle
0.00 +0.1 0.11 ±223% +0.0 0.00 perf-profile.calltrace.cycles-pp.update_curr.task_tick_fair.scheduler_tick.update_process_times.tick_sched_handle
0.07 ±282% +0.1 0.21 ±144% -0.1 0.00 perf-profile.calltrace.cycles-pp.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.nrand48_r
0.07 ±282% +0.1 0.21 ±144% -0.1 0.00 perf-profile.calltrace.cycles-pp.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.nrand48_r
0.07 ±282% +0.1 0.22 ±144% -0.1 0.00 perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.nrand48_r
0.00 +0.2 0.17 ±141% +0.0 0.00 perf-profile.calltrace.cycles-pp.__default_send_IPI_dest_field.default_send_IPI_mask_sequence_phys.smp_call_function_many_cond.on_each_cpu_cond_mask.arch_tlbbatch_flush
0.00 +0.3 0.26 ±100% +0.0 0.00 perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.nrand48_r
0.00 +0.4 0.36 ± 70% +0.1 0.06 ±282% perf-profile.calltrace.cycles-pp.migrate_folio_extra.move_to_new_folio.migrate_pages_batch.migrate_pages.migrate_misplaced_page
0.00 +0.4 0.36 ± 70% +0.1 0.06 ±282% perf-profile.calltrace.cycles-pp.move_to_new_folio.migrate_pages_batch.migrate_pages.migrate_misplaced_page.do_numa_page
1.44 ± 28% +0.5 1.94 ± 61% +0.1 1.51 ± 25% perf-profile.calltrace.cycles-pp.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.do_access
1.43 ± 29% +0.5 1.93 ± 61% +0.1 1.50 ± 25% perf-profile.calltrace.cycles-pp.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.do_access
0.55 ± 69% +0.5 1.08 ± 69% +0.0 0.60 ± 56% perf-profile.calltrace.cycles-pp.scheduler_tick.update_process_times.tick_sched_handle.tick_sched_timer.__hrtimer_run_queues
1.34 ± 39% +0.6 1.90 ± 69% +0.0 1.35 ± 25% perf-profile.calltrace.cycles-pp.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt
0.17 ±196% +0.6 0.73 ± 85% +0.2 0.33 ± 89% perf-profile.calltrace.cycles-pp.task_tick_fair.scheduler_tick.update_process_times.tick_sched_handle.tick_sched_timer
1.72 ± 25% +0.6 2.30 ± 48% +0.1 1.80 ± 22% perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.do_access
1.08 ± 31% +0.6 1.66 ± 72% +0.1 1.13 ± 26% perf-profile.calltrace.cycles-pp.update_process_times.tick_sched_handle.tick_sched_timer.__hrtimer_run_queues.hrtimer_interrupt
1.52 ± 28% +0.6 2.11 ± 52% +0.1 1.58 ± 25% perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.do_access
1.09 ± 31% +0.6 1.68 ± 72% +0.1 1.14 ± 26% perf-profile.calltrace.cycles-pp.tick_sched_handle.tick_sched_timer.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt
1.18 ± 30% +0.6 1.78 ± 70% +0.1 1.24 ± 26% perf-profile.calltrace.cycles-pp.tick_sched_timer.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt
0.00 +0.6 0.60 ± 8% +0.0 0.00 perf-profile.calltrace.cycles-pp.default_send_IPI_mask_sequence_phys.smp_call_function_many_cond.on_each_cpu_cond_mask.arch_tlbbatch_flush.try_to_unmap_flush
0.00 +0.6 0.64 ± 7% +0.0 0.00 perf-profile.calltrace.cycles-pp.flush_tlb_func.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function
0.00 +0.9 0.90 ± 10% +0.0 0.00 perf-profile.calltrace.cycles-pp.llist_reverse_order.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function
72.48 ± 3% +1.4 73.88 -0.7 71.79 perf-profile.calltrace.cycles-pp.do_access
0.00 +1.9 1.86 ± 9% +0.3 0.26 ±113% perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.do_access
0.00 +1.9 1.87 ± 8% +0.3 0.26 ±113% perf-profile.calltrace.cycles-pp.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.do_access
0.00 +1.9 1.94 ± 8% +0.3 0.33 ± 91% perf-profile.calltrace.cycles-pp.sysvec_call_function.asm_sysvec_call_function.do_access
0.00 +2.6 2.59 ± 9% +0.6 0.59 ± 40% perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.do_access
0.00 +2.8 2.80 ± 8% +0.9 0.90 ± 18% perf-profile.calltrace.cycles-pp.llist_add_batch.smp_call_function_many_cond.on_each_cpu_cond_mask.arch_tlbbatch_flush.try_to_unmap_flush
3.30 ± 15% +6.6 9.88 ± 7% +0.9 4.18 ± 19% perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
3.34 ± 15% +6.6 9.94 ± 7% +0.9 4.22 ± 19% perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.do_access
3.03 ± 15% +6.7 9.69 ± 7% +1.0 4.03 ± 19% perf-profile.calltrace.cycles-pp.do_numa_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
3.68 ± 15% +6.8 10.48 ± 7% +0.9 4.63 ± 19% perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.do_access
3.70 ± 15% +6.8 10.49 ± 7% +0.9 4.64 ± 19% perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.do_access
3.89 ± 14% +6.8 10.71 ± 7% +1.0 4.85 ± 19% perf-profile.calltrace.cycles-pp.asm_exc_page_fault.do_access
2.46 ± 15% +7.0 9.46 ± 7% +1.4 3.85 ± 19% perf-profile.calltrace.cycles-pp.migrate_misplaced_page.do_numa_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
2.27 ± 15% +7.0 9.28 ± 7% +1.4 3.67 ± 19% perf-profile.calltrace.cycles-pp.migrate_pages_batch.migrate_pages.migrate_misplaced_page.do_numa_page.__handle_mm_fault
2.27 ± 15% +7.0 9.29 ± 7% +1.4 3.68 ± 19% perf-profile.calltrace.cycles-pp.migrate_pages.migrate_misplaced_page.do_numa_page.__handle_mm_fault.handle_mm_fault
0.00 +7.5 7.50 ± 7% +2.4 2.38 ± 18% perf-profile.calltrace.cycles-pp.smp_call_function_many_cond.on_each_cpu_cond_mask.arch_tlbbatch_flush.try_to_unmap_flush.migrate_pages_batch
0.00 +7.6 7.56 ± 7% +2.4 2.40 ± 18% perf-profile.calltrace.cycles-pp.on_each_cpu_cond_mask.arch_tlbbatch_flush.try_to_unmap_flush.migrate_pages_batch.migrate_pages
0.00 +7.6 7.57 ± 8% +2.4 2.40 ± 18% perf-profile.calltrace.cycles-pp.arch_tlbbatch_flush.try_to_unmap_flush.migrate_pages_batch.migrate_pages.migrate_misplaced_page
0.00 +7.6 7.57 ± 7% +2.4 2.40 ± 18% perf-profile.calltrace.cycles-pp.try_to_unmap_flush.migrate_pages_batch.migrate_pages.migrate_misplaced_page.do_numa_page
16.69 ± 10% -1.3 15.43 ± 5% +0.5 17.16 perf-profile.children.cycles-pp.nrand48_r
1.51 ± 16% -1.1 0.42 ± 9% -1.2 0.31 ± 20% perf-profile.children.cycles-pp.rmap_walk_anon
1.25 ± 16% -1.0 0.30 ± 9% -1.0 0.29 ± 20% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
0.92 ± 15% -0.9 0.00 -0.9 0.00 perf-profile.children.cycles-pp.ptep_clear_flush
0.92 ± 15% -0.9 0.00 -0.9 0.00 perf-profile.children.cycles-pp.flush_tlb_mm_range
9.27 ± 8% -0.9 8.37 ± 4% +0.2 9.45 perf-profile.children.cycles-pp.lrand48_r
1.08 ± 15% -0.9 0.18 ± 6% -1.0 0.12 ± 21% perf-profile.children.cycles-pp.try_to_migrate_one
1.14 ± 15% -0.9 0.26 ± 8% -0.9 0.19 ± 19% perf-profile.children.cycles-pp.try_to_migrate
1.05 ± 15% -0.8 0.21 ± 11% -0.9 0.16 ± 16% perf-profile.children.cycles-pp._raw_spin_lock
1.26 ± 15% -0.8 0.42 ± 8% -0.9 0.34 ± 21% perf-profile.children.cycles-pp.migrate_folio_unmap
0.46 ± 15% -0.3 0.14 ± 13% -0.3 0.11 ± 20% perf-profile.children.cycles-pp.page_vma_mapped_walk
0.34 ± 15% -0.2 0.11 ± 11% -0.3 0.08 ± 18% perf-profile.children.cycles-pp.remove_migration_pte
0.14 ± 16% -0.1 0.00 -0.1 0.00 perf-profile.children.cycles-pp.handle_pte_fault
4.37 ± 3% -0.1 4.29 -0.0 4.36 perf-profile.children.cycles-pp.do_rw_once
0.13 ± 22% -0.1 0.07 ± 11% -0.0 0.09 ± 23% perf-profile.children.cycles-pp.folio_lruvec_lock_irq
0.13 ± 22% -0.1 0.08 ± 10% -0.0 0.09 ± 22% perf-profile.children.cycles-pp._raw_spin_lock_irq
0.33 ± 2% -0.0 0.30 -0.0 0.32 ± 2% perf-profile.children.cycles-pp.lrand48_r@plt
0.17 ± 21% -0.0 0.14 ± 9% -0.0 0.15 ± 21% perf-profile.children.cycles-pp.folio_isolate_lru
0.02 ±112% -0.0 0.00 +0.0 0.03 ±111% perf-profile.children.cycles-pp.timerqueue_del
0.19 ± 20% -0.0 0.17 ± 8% -0.0 0.17 ± 20% perf-profile.children.cycles-pp.numamigrate_isolate_page
0.06 ± 13% -0.0 0.04 ± 45% -0.0 0.05 ± 37% perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
0.06 ± 13% -0.0 0.04 ± 45% -0.0 0.05 ± 37% perf-profile.children.cycles-pp.do_syscall_64
0.01 ±193% -0.0 0.00 -0.0 0.01 ±188% perf-profile.children.cycles-pp.perf_mux_hrtimer_handler
0.09 ± 20% -0.0 0.08 ± 47% +0.0 0.09 ± 23% perf-profile.children.cycles-pp.tick_sched_do_timer
0.07 ± 39% -0.0 0.06 ± 45% +0.0 0.07 ± 28% perf-profile.children.cycles-pp.ktime_get_update_offsets_now
0.01 ±282% -0.0 0.00 -0.0 0.01 ±282% perf-profile.children.cycles-pp.perf_rotate_context
0.02 ±111% -0.0 0.02 ±142% +0.0 0.03 ±112% perf-profile.children.cycles-pp.irqtime_account_process_tick
0.06 ± 39% -0.0 0.06 ± 8% +0.0 0.07 ± 21% perf-profile.children.cycles-pp.rmqueue_bulk
0.00 +0.0 0.00 +0.0 0.01 ±282% perf-profile.children.cycles-pp.__free_one_page
0.00 +0.0 0.00 +0.0 0.01 ±187% perf-profile.children.cycles-pp.lru_add_fn
0.07 ± 27% +0.0 0.07 ± 47% -0.0 0.06 ± 55% perf-profile.children.cycles-pp.ktime_get
0.09 ± 15% +0.0 0.10 ± 8% +0.0 0.11 ± 21% perf-profile.children.cycles-pp.rmqueue
0.09 ± 39% +0.0 0.10 ± 50% -0.0 0.07 ± 75% perf-profile.children.cycles-pp.cpuacct_account_field
0.00 +0.0 0.01 ±223% +0.0 0.00 perf-profile.children.cycles-pp.run_posix_cpu_timers
0.00 +0.0 0.01 ±223% +0.0 0.00 perf-profile.children.cycles-pp.nohz_balance_exit_idle
0.00 +0.0 0.01 ±223% +0.0 0.00 perf-profile.children.cycles-pp.reweight_entity
0.00 +0.0 0.01 ±223% +0.0 0.00 perf-profile.children.cycles-pp.__hrtimer_next_event_base
0.00 +0.0 0.01 ±223% +0.0 0.00 perf-profile.children.cycles-pp.nohz_balancer_kick
0.00 +0.0 0.01 ±223% +0.0 0.00 perf-profile.children.cycles-pp.trigger_load_balance
0.00 +0.0 0.01 ±223% +0.0 0.00 perf-profile.children.cycles-pp.check_cpu_stall
0.00 +0.0 0.01 ±223% +0.0 0.00 perf-profile.children.cycles-pp.perf_event_task_tick
0.09 ± 16% +0.0 0.10 ± 7% +0.0 0.11 ± 22% perf-profile.children.cycles-pp.__alloc_pages
0.09 ± 16% +0.0 0.10 ± 10% +0.0 0.11 ± 21% perf-profile.children.cycles-pp.get_page_from_freelist
0.00 +0.0 0.01 ±223% +0.0 0.00 perf-profile.children.cycles-pp.acct_account_cputime
0.09 ± 18% +0.0 0.10 ± 7% +0.0 0.11 ± 22% perf-profile.children.cycles-pp.__folio_alloc
0.01 ±282% +0.0 0.02 ±142% -0.0 0.00 perf-profile.children.cycles-pp.rcu_core
0.32 ± 19% +0.0 0.34 ± 45% +0.0 0.33 ± 32% perf-profile.children.cycles-pp.account_user_time
0.12 ± 95% +0.0 0.14 ± 6% -0.0 0.11 ± 16% perf-profile.children.cycles-pp.irqentry_exit_to_user_mode
0.09 ± 18% +0.0 0.11 ± 9% +0.0 0.11 ± 22% perf-profile.children.cycles-pp.alloc_misplaced_dst_page
0.06 ± 18% +0.0 0.08 ± 69% +0.0 0.07 ± 41% perf-profile.children.cycles-pp.rcu_pending
0.00 +0.0 0.02 ±141% +0.0 0.00 perf-profile.children.cycles-pp.set_tlb_ubc_flush_pending
0.00 +0.0 0.02 ±141% +0.0 0.00 perf-profile.children.cycles-pp.folio_lock_anon_vma_read
0.00 +0.0 0.02 ±141% +0.0 0.01 ±282% perf-profile.children.cycles-pp.folio_get_anon_vma
0.06 ± 18% +0.0 0.08 ± 9% +0.0 0.06 ± 19% perf-profile.children.cycles-pp.mt_find
0.21 ± 17% +0.0 0.23 ± 8% -0.0 0.21 ± 18% perf-profile.children.cycles-pp._raw_spin_lock_irqsave
0.06 ± 16% +0.0 0.08 ± 8% +0.0 0.08 ± 21% perf-profile.children.cycles-pp.free_unref_page
0.06 ± 18% +0.0 0.08 ± 11% +0.0 0.06 ± 20% perf-profile.children.cycles-pp.find_vma
0.11 ± 16% +0.0 0.12 ± 66% +0.0 0.13 ± 29% perf-profile.children.cycles-pp.__cgroup_account_cputime_field
0.01 ±282% +0.0 0.03 ±102% -0.0 0.00 perf-profile.children.cycles-pp.lapic_next_deadline
0.03 ± 71% +0.0 0.06 ± 8% +0.0 0.05 ± 39% perf-profile.children.cycles-pp.free_pcppages_bulk
0.02 ±209% +0.0 0.04 ±103% -0.0 0.02 ±142% perf-profile.children.cycles-pp.update_cfs_group
0.01 ±282% +0.0 0.03 ±105% -0.0 0.00 perf-profile.children.cycles-pp.hrtimer_update_next_event
0.05 ± 43% +0.0 0.08 ± 61% -0.0 0.05 ± 57% perf-profile.children.cycles-pp.update_irq_load_avg
0.00 +0.0 0.02 ± 99% +0.0 0.00 perf-profile.children.cycles-pp.__perf_sw_event
0.08 ± 15% +0.0 0.10 ± 10% +0.0 0.10 ± 21% perf-profile.children.cycles-pp.__list_del_entry_valid
0.09 ± 47% +0.0 0.12 ± 70% -0.0 0.08 ± 43% perf-profile.children.cycles-pp.hrtimer_active
0.01 ±282% +0.0 0.03 ±106% -0.0 0.00 perf-profile.children.cycles-pp.update_min_vruntime
0.08 ± 18% +0.0 0.11 ± 68% +0.0 0.09 ± 26% perf-profile.children.cycles-pp.rcu_sched_clock_irq
0.07 ± 35% +0.0 0.10 ± 33% +0.0 0.08 ± 26% perf-profile.children.cycles-pp.clockevents_program_event
0.01 ±282% +0.0 0.04 ±110% -0.0 0.00 perf-profile.children.cycles-pp.timerqueue_add
0.04 ± 91% +0.0 0.07 ± 50% +0.0 0.06 ± 38% perf-profile.children.cycles-pp.arch_scale_freq_tick
0.02 ±154% +0.0 0.06 ± 74% +0.0 0.03 ± 92% perf-profile.children.cycles-pp.__do_softirq
0.00 +0.0 0.04 ± 71% +0.0 0.02 ±142% perf-profile.children.cycles-pp.can_change_pte_writable
0.01 ±282% +0.0 0.04 ±107% -0.0 0.00 perf-profile.children.cycles-pp.enqueue_hrtimer
0.00 +0.0 0.04 ± 44% +0.0 0.00 perf-profile.children.cycles-pp.tlb_is_not_lazy
0.00 +0.0 0.04 ± 45% +0.0 0.00 perf-profile.children.cycles-pp._raw_spin_unlock_irqrestore
0.15 ± 20% +0.0 0.20 ± 8% -0.0 0.15 ± 21% perf-profile.children.cycles-pp.folio_lruvec_lock_irqsave
0.11 ± 25% +0.0 0.16 ± 64% +0.0 0.11 ± 25% perf-profile.children.cycles-pp.update_rq_clock
0.03 ±118% +0.1 0.08 ± 58% +0.0 0.05 ± 59% perf-profile.children.cycles-pp.perf_trace_sched_stat_runtime
0.03 ±127% +0.1 0.09 ± 84% +0.0 0.04 ± 72% perf-profile.children.cycles-pp.__update_load_avg_cfs_rq
0.00 +0.1 0.06 ± 9% +0.0 0.02 ±142% perf-profile.children.cycles-pp.folio_migrate_flags
0.03 ±152% +0.1 0.09 ± 68% +0.0 0.04 ± 72% perf-profile.children.cycles-pp.__update_load_avg_se
0.00 +0.1 0.07 ± 8% +0.0 0.00 perf-profile.children.cycles-pp.native_sched_clock
0.05 ± 36% +0.1 0.12 ± 8% +0.1 0.10 ± 18% perf-profile.children.cycles-pp.exit_to_user_mode_loop
0.06 ± 13% +0.1 0.13 ± 8% +0.1 0.11 ± 16% perf-profile.children.cycles-pp.exit_to_user_mode_prepare
0.16 ± 13% +0.1 0.24 ± 10% +0.0 0.18 ± 19% perf-profile.children.cycles-pp.up_read
0.00 +0.1 0.08 ± 10% +0.0 0.00 perf-profile.children.cycles-pp.sched_clock_cpu
0.02 ±141% +0.1 0.10 ± 8% +0.0 0.05 ± 42% perf-profile.children.cycles-pp.uncharge_batch
0.01 ±282% +0.1 0.09 ± 12% +0.0 0.04 ± 75% perf-profile.children.cycles-pp.page_counter_uncharge
0.04 ± 71% +0.1 0.12 ± 8% +0.1 0.10 ± 18% perf-profile.children.cycles-pp.task_work_run
0.00 +0.1 0.09 ± 10% +0.0 0.01 ±282% perf-profile.children.cycles-pp._find_next_bit
0.02 ±141% +0.1 0.10 ± 10% +0.0 0.06 ± 44% perf-profile.children.cycles-pp.__mem_cgroup_uncharge
0.02 ±141% +0.1 0.10 ± 10% +0.0 0.06 ± 44% perf-profile.children.cycles-pp.__folio_put
0.19 ± 17% +0.1 0.28 ± 11% +0.0 0.21 ± 18% perf-profile.children.cycles-pp.down_read_trylock
0.03 ± 90% +0.1 0.12 ± 8% +0.1 0.10 ± 16% perf-profile.children.cycles-pp.change_pte_range
0.03 ± 90% +0.1 0.12 ± 8% +0.1 0.10 ± 18% perf-profile.children.cycles-pp.task_numa_work
0.03 ± 90% +0.1 0.12 ± 8% +0.1 0.10 ± 18% perf-profile.children.cycles-pp.change_prot_numa
0.03 ± 90% +0.1 0.12 ± 8% +0.1 0.10 ± 18% perf-profile.children.cycles-pp.change_protection_range
0.03 ± 90% +0.1 0.12 ± 8% +0.1 0.10 ± 18% perf-profile.children.cycles-pp.change_pmd_range
0.21 ± 19% +0.1 0.31 ± 8% +0.0 0.22 ± 21% perf-profile.children.cycles-pp.folio_batch_move_lru
0.02 ±142% +0.1 0.12 ± 6% +0.0 0.04 ± 72% perf-profile.children.cycles-pp.irqtime_account_irq
0.08 ± 36% +0.1 0.18 ± 24% +0.0 0.09 ± 24% perf-profile.children.cycles-pp.__irq_exit_rcu
0.21 ± 19% +0.1 0.31 ± 8% +0.0 0.22 ± 20% perf-profile.children.cycles-pp.lru_add_drain
0.21 ± 19% +0.1 0.31 ± 8% +0.0 0.22 ± 20% perf-profile.children.cycles-pp.lru_add_drain_cpu
0.03 ± 71% +0.1 0.14 ± 8% +0.0 0.08 ± 25% perf-profile.children.cycles-pp.mem_cgroup_migrate
0.01 ±187% +0.1 0.13 ± 6% +0.1 0.07 ± 26% perf-profile.children.cycles-pp.page_counter_charge
0.17 ± 13% +0.1 0.30 ± 9% +0.1 0.24 ± 19% perf-profile.children.cycles-pp.folio_copy
0.17 ± 14% +0.1 0.30 ± 9% +0.1 0.23 ± 20% perf-profile.children.cycles-pp.copy_page
0.09 ± 7% +0.2 0.24 ± 9% +0.0 0.11 ± 14% perf-profile.children.cycles-pp.sync_regs
0.21 ± 48% +0.2 0.39 ± 65% +0.0 0.22 ± 28% perf-profile.children.cycles-pp.update_load_avg
0.25 ± 39% +0.2 0.43 ± 61% +0.0 0.27 ± 25% perf-profile.children.cycles-pp.update_curr
0.25 ± 12% +0.3 0.51 ± 8% +0.1 0.36 ± 20% perf-profile.children.cycles-pp.migrate_folio_extra
0.25 ± 12% +0.3 0.51 ± 8% +0.1 0.36 ± 20% perf-profile.children.cycles-pp.move_to_new_folio
0.11 ± 20% +0.3 0.40 ± 7% +0.0 0.16 ± 15% perf-profile.children.cycles-pp.native_irq_return_iret
0.06 ± 40% +0.4 0.47 ± 9% +0.1 0.13 ± 23% perf-profile.children.cycles-pp.__default_send_IPI_dest_field
0.00 +0.4 0.44 ± 9% +0.1 0.12 ± 22% perf-profile.children.cycles-pp.native_flush_tlb_local
0.68 ± 45% +0.5 1.16 ± 62% +0.0 0.71 ± 28% perf-profile.children.cycles-pp.task_tick_fair
0.08 ± 16% +0.5 0.62 ± 9% +0.1 0.17 ± 21% perf-profile.children.cycles-pp.default_send_IPI_mask_sequence_phys
0.96 ± 40% +0.6 1.57 ± 60% +0.0 1.00 ± 27% perf-profile.children.cycles-pp.scheduler_tick
1.56 ± 32% +0.7 2.26 ± 55% +0.1 1.64 ± 25% perf-profile.children.cycles-pp.update_process_times
1.58 ± 32% +0.7 2.29 ± 55% +0.1 1.65 ± 25% perf-profile.children.cycles-pp.tick_sched_handle
1.71 ± 31% +0.7 2.42 ± 54% +0.1 1.79 ± 25% perf-profile.children.cycles-pp.tick_sched_timer
1.85 ± 30% +0.7 2.60 ± 52% +0.1 1.94 ± 25% perf-profile.children.cycles-pp.__hrtimer_run_queues
2.09 ± 29% +0.8 2.86 ± 50% +0.1 2.18 ± 24% perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
2.06 ± 29% +0.8 2.85 ± 50% +0.1 2.16 ± 24% perf-profile.children.cycles-pp.hrtimer_interrupt
2.48 ± 26% +0.8 3.28 ± 45% +0.1 2.60 ± 22% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
2.19 ± 29% +0.8 2.99 ± 49% +0.1 2.29 ± 24% perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
0.09 ± 17% +1.2 1.32 ± 7% +0.4 0.45 ± 21% perf-profile.children.cycles-pp.flush_tlb_func
0.25 ± 14% +1.6 1.85 ± 9% +0.3 0.55 ± 18% perf-profile.children.cycles-pp.llist_reverse_order
72.83 ± 3% +1.9 74.77 -0.6 72.25 perf-profile.children.cycles-pp.do_access
0.40 ± 15% +2.5 2.86 ± 8% +0.5 0.93 ± 18% perf-profile.children.cycles-pp.llist_add_batch
0.41 ± 14% +3.3 3.76 ± 8% +0.7 1.14 ± 19% perf-profile.children.cycles-pp.__sysvec_call_function
0.41 ± 14% +3.4 3.76 ± 8% +0.7 1.14 ± 19% perf-profile.children.cycles-pp.__flush_smp_call_function_queue
0.43 ± 14% +3.5 3.90 ± 8% +0.7 1.17 ± 19% perf-profile.children.cycles-pp.sysvec_call_function
0.55 ± 12% +4.4 4.95 ± 8% +0.9 1.40 ± 19% perf-profile.children.cycles-pp.asm_sysvec_call_function
3.31 ± 15% +6.6 9.89 ± 7% +0.9 4.19 ± 19% perf-profile.children.cycles-pp.__handle_mm_fault
3.34 ± 15% +6.6 9.95 ± 7% +0.9 4.23 ± 19% perf-profile.children.cycles-pp.handle_mm_fault
3.03 ± 15% +6.7 9.69 ± 7% +1.0 4.03 ± 19% perf-profile.children.cycles-pp.do_numa_page
0.91 ± 15% +6.7 7.59 ± 7% +1.5 2.42 ± 18% perf-profile.children.cycles-pp.smp_call_function_many_cond
0.91 ± 15% +6.7 7.59 ± 7% +1.5 2.42 ± 18% perf-profile.children.cycles-pp.on_each_cpu_cond_mask
3.70 ± 15% +6.8 10.49 ± 7% +0.9 4.64 ± 19% perf-profile.children.cycles-pp.do_user_addr_fault
3.70 ± 15% +6.8 10.50 ± 7% +0.9 4.64 ± 19% perf-profile.children.cycles-pp.exc_page_fault
3.91 ± 14% +6.8 10.76 ± 7% +1.0 4.88 ± 19% perf-profile.children.cycles-pp.asm_exc_page_fault
2.46 ± 15% +7.0 9.46 ± 7% +1.4 3.85 ± 19% perf-profile.children.cycles-pp.migrate_misplaced_page
2.27 ± 15% +7.0 9.28 ± 7% +1.4 3.67 ± 19% perf-profile.children.cycles-pp.migrate_pages_batch
2.27 ± 15% +7.0 9.29 ± 7% +1.4 3.68 ± 19% perf-profile.children.cycles-pp.migrate_pages
0.00 +7.6 7.57 ± 7% +2.4 2.40 ± 18% perf-profile.children.cycles-pp.try_to_unmap_flush
0.00 +7.6 7.57 ± 7% +2.4 2.40 ± 18% perf-profile.children.cycles-pp.arch_tlbbatch_flush
66.95 ± 3% -7.7 59.28 ± 2% -2.0 64.95 perf-profile.self.cycles-pp.do_access
13.38 ± 11% -1.4 12.02 ± 4% +0.3 13.71 perf-profile.self.cycles-pp.nrand48_r
8.81 ± 9% -1.1 7.70 ± 3% +0.1 8.94 ± 2% perf-profile.self.cycles-pp.lrand48_r
1.14 ± 16% -0.9 0.28 ± 9% -0.9 0.28 ± 21% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
4.08 ± 3% -0.3 3.77 -0.0 4.03 perf-profile.self.cycles-pp.do_rw_once
0.06 ±187% -0.1 0.00 -0.1 0.00 perf-profile.self.cycles-pp.irqentry_exit_to_user_mode
0.29 ± 4% -0.0 0.26 -0.0 0.28 ± 2% perf-profile.self.cycles-pp.lrand48_r@plt
0.12 ± 27% -0.0 0.10 ± 53% +0.0 0.13 ± 36% perf-profile.self.cycles-pp.account_user_time
0.02 ±141% -0.0 0.00 +0.0 0.02 ±112% perf-profile.self.cycles-pp.hrtimer_interrupt
0.07 ± 16% -0.0 0.07 ± 47% +0.0 0.08 ± 25% perf-profile.self.cycles-pp.tick_sched_do_timer
0.06 ± 55% -0.0 0.05 ± 46% +0.0 0.06 ± 42% perf-profile.self.cycles-pp.ktime_get_update_offsets_now
0.02 ±111% -0.0 0.02 ±142% +0.0 0.03 ±112% perf-profile.self.cycles-pp.irqtime_account_process_tick
0.01 ±188% -0.0 0.01 ±223% -0.0 0.01 ±282% perf-profile.self.cycles-pp.rmap_walk_anon
0.00 +0.0 0.00 +0.0 0.01 ±282% perf-profile.self.cycles-pp.__free_one_page
0.06 ± 42% +0.0 0.07 ± 46% +0.0 0.07 ± 43% perf-profile.self.cycles-pp.update_process_times
0.09 ± 39% +0.0 0.10 ± 50% -0.0 0.07 ± 75% perf-profile.self.cycles-pp.cpuacct_account_field
0.00 +0.0 0.01 ±223% +0.0 0.00 perf-profile.self.cycles-pp.set_tlb_ubc_flush_pending
0.00 +0.0 0.01 ±223% +0.0 0.00 perf-profile.self.cycles-pp.__irq_exit_rcu
0.00 +0.0 0.01 ±223% +0.0 0.00 perf-profile.self.cycles-pp.perf_event_task_tick
0.00 +0.0 0.01 ±223% +0.0 0.00 perf-profile.self.cycles-pp.run_posix_cpu_timers
0.00 +0.0 0.01 ±223% +0.0 0.00 perf-profile.self.cycles-pp.nohz_balance_exit_idle
0.00 +0.0 0.01 ±223% +0.0 0.00 perf-profile.self.cycles-pp.reweight_entity
0.00 +0.0 0.01 ±223% +0.0 0.01 ±187% perf-profile.self.cycles-pp.can_change_pte_writable
0.06 ± 14% +0.0 0.07 ± 11% -0.0 0.04 ± 72% perf-profile.self.cycles-pp.mt_find
0.00 +0.0 0.01 ±223% +0.0 0.00 perf-profile.self.cycles-pp.trigger_load_balance
0.00 +0.0 0.01 ±223% +0.0 0.00 perf-profile.self.cycles-pp.check_cpu_stall
0.00 +0.0 0.01 ±223% +0.0 0.00 perf-profile.self.cycles-pp.timerqueue_add
0.00 +0.0 0.01 ±223% +0.0 0.00 perf-profile.self.cycles-pp.acct_account_cputime
0.08 ± 17% +0.0 0.09 ± 13% +0.0 0.08 ± 21% perf-profile.self.cycles-pp.page_vma_mapped_walk
0.11 ± 17% +0.0 0.13 ± 15% +0.0 0.12 ± 20% perf-profile.self.cycles-pp.__handle_mm_fault
0.01 ±282% +0.0 0.02 ± 99% +0.0 0.02 ±112% perf-profile.self.cycles-pp._raw_spin_lock_irqsave
0.10 ± 16% +0.0 0.12 ± 65% +0.0 0.12 ± 29% perf-profile.self.cycles-pp.__cgroup_account_cputime_field
0.01 ±282% +0.0 0.03 ±102% -0.0 0.00 perf-profile.self.cycles-pp.lapic_next_deadline
0.01 ±282% +0.0 0.03 ±150% +0.0 0.02 ±112% perf-profile.self.cycles-pp.rcu_pending
0.02 ±209% +0.0 0.04 ±103% -0.0 0.02 ±142% perf-profile.self.cycles-pp.update_cfs_group
0.08 ± 47% +0.0 0.10 ± 68% -0.0 0.07 ± 45% perf-profile.self.cycles-pp.hrtimer_active
0.05 ± 43% +0.0 0.08 ± 61% -0.0 0.05 ± 57% perf-profile.self.cycles-pp.update_irq_load_avg
0.04 ± 94% +0.0 0.06 ± 48% +0.0 0.05 ± 56% perf-profile.self.cycles-pp.ktime_get
0.07 ± 16% +0.0 0.10 ± 10% +0.0 0.10 ± 21% perf-profile.self.cycles-pp.__list_del_entry_valid
0.01 ±282% +0.0 0.03 ±106% -0.0 0.00 perf-profile.self.cycles-pp.update_min_vruntime
0.04 ± 91% +0.0 0.07 ± 50% +0.0 0.06 ± 38% perf-profile.self.cycles-pp.arch_scale_freq_tick
0.00 +0.0 0.03 ± 70% +0.0 0.00 perf-profile.self.cycles-pp.default_send_IPI_mask_sequence_phys
0.01 ±282% +0.0 0.04 ± 75% +0.0 0.02 ±112% perf-profile.self.cycles-pp.__hrtimer_run_queues
0.06 ± 49% +0.0 0.10 ± 65% -0.0 0.06 ± 56% perf-profile.self.cycles-pp.scheduler_tick
0.03 ±113% +0.0 0.07 ± 83% +0.0 0.04 ± 71% perf-profile.self.cycles-pp.update_rq_clock
0.00 +0.0 0.04 ± 44% +0.0 0.01 ±187% perf-profile.self.cycles-pp.folio_migrate_flags
0.09 ± 14% +0.0 0.14 ± 20% +0.0 0.10 ± 16% perf-profile.self.cycles-pp._raw_spin_lock
0.02 ±191% +0.0 0.06 ± 86% +0.0 0.03 ± 90% perf-profile.self.cycles-pp.__update_load_avg_se
0.03 ±118% +0.0 0.08 ± 57% +0.0 0.05 ± 59% perf-profile.self.cycles-pp.perf_trace_sched_stat_runtime
0.02 ±111% +0.1 0.08 ± 10% +0.0 0.06 ± 15% perf-profile.self.cycles-pp.change_pte_range
0.15 ± 14% +0.1 0.20 ± 10% +0.0 0.17 ± 21% perf-profile.self.cycles-pp.up_read
0.00 +0.1 0.05 ± 8% +0.0 0.01 ±188% perf-profile.self.cycles-pp.try_to_migrate_one
0.19 ± 16% +0.1 0.24 ± 11% +0.0 0.20 ± 19% perf-profile.self.cycles-pp.down_read_trylock
0.03 ±151% +0.1 0.09 ± 84% +0.0 0.04 ± 72% perf-profile.self.cycles-pp.__update_load_avg_cfs_rq
0.00 +0.1 0.07 ± 8% +0.0 0.00 perf-profile.self.cycles-pp._find_next_bit
0.00 +0.1 0.07 ± 8% +0.0 0.00 perf-profile.self.cycles-pp.native_sched_clock
0.00 +0.1 0.07 ± 12% +0.0 0.03 ±113% perf-profile.self.cycles-pp.page_counter_uncharge
0.09 ± 41% +0.1 0.16 ± 69% +0.0 0.09 ± 42% perf-profile.self.cycles-pp.task_tick_fair
0.11 ± 49% +0.1 0.19 ± 74% -0.0 0.11 ± 29% perf-profile.self.cycles-pp.update_load_avg
0.01 ±282% +0.1 0.11 ± 8% +0.1 0.06 ± 43% perf-profile.self.cycles-pp.page_counter_charge
0.16 ± 15% +0.1 0.27 ± 9% +0.1 0.22 ± 21% perf-profile.self.cycles-pp.copy_page
0.16 ± 41% +0.1 0.28 ± 65% +0.0 0.18 ± 25% perf-profile.self.cycles-pp.update_curr
0.09 ± 7% +0.2 0.24 ± 9% +0.0 0.11 ± 14% perf-profile.self.cycles-pp.sync_regs
0.11 ± 20% +0.3 0.39 ± 8% +0.0 0.15 ± 15% perf-profile.self.cycles-pp.native_irq_return_iret
0.06 ± 40% +0.4 0.47 ± 9% +0.1 0.13 ± 23% perf-profile.self.cycles-pp.__default_send_IPI_dest_field
0.00 +0.4 0.44 ± 10% +0.1 0.11 ± 19% perf-profile.self.cycles-pp.native_flush_tlb_local
0.07 ± 15% +0.5 0.62 ± 7% +0.1 0.16 ± 18% perf-profile.self.cycles-pp.__flush_smp_call_function_queue
0.06 ± 16% +0.8 0.88 ± 7% +0.3 0.33 ± 21% perf-profile.self.cycles-pp.flush_tlb_func
0.25 ± 14% +1.6 1.85 ± 9% +0.3 0.55 ± 18% perf-profile.self.cycles-pp.llist_reverse_order
0.35 ± 15% +2.1 2.40 ± 8% +0.4 0.76 ± 18% perf-profile.self.cycles-pp.llist_add_batch
0.37 ± 17% +3.1 3.49 ± 7% +0.7 1.10 ± 18% perf-profile.self.cycles-pp.smp_call_function_many_cond


> Best Regards,
> Huang, Ying
>
> -------------------------------------8<------------------------------------
> From 1ac61967b54bbdc1ca20af16f9dfb2507a4d4811 Mon Sep 17 00:00:00 2001
> From: Huang Ying <ying.huang@xxxxxxxxx>
> Date: Mon, 20 Mar 2023 15:48:39 +0800
> Subject: [PATCH] dbg, rmap: avoid flushing TLB in batch if PTE is inaccessible
>
> Signed-off-by: "Huang, Ying" <ying.huang@xxxxxxxxx>
> ---
>  mm/rmap.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 8632e02661ac..3c7c43642d7c 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1582,7 +1582,8 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
>                                  */
>                                 pteval = ptep_get_and_clear(mm, address, pvmw.pte);
>  
> -                               set_tlb_ubc_flush_pending(mm, pte_dirty(pteval));
> +                               if (pte_accessible(mm, pteval))
> +                                       set_tlb_ubc_flush_pending(mm, pte_dirty(pteval));
>                         } else {
>                                 pteval = ptep_clear_flush(vma, address, pvmw.pte);
>                         }
> @@ -1963,7 +1964,8 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,
>                                  */
>                                 pteval = ptep_get_and_clear(mm, address, pvmw.pte);
>  
> -                               set_tlb_ubc_flush_pending(mm, pte_dirty(pteval));
> +                               if (pte_accessible(mm, pteval))
> +                                       set_tlb_ubc_flush_pending(mm, pte_dirty(pteval));
>                         } else {
>                                 pteval = ptep_clear_flush(vma, address, pvmw.pte);
>                         }