[linus:master] [mm] 6bef4c2f97: stress-ng.mlockmany.ops_per_sec 5.2% improvement
From: kernel test robot
Date: Tue Jun 10 2025 - 10:42:44 EST
Hello,
kernel test robot noticed a 5.2% improvement of stress-ng.mlockmany.ops_per_sec on:
commit: 6bef4c2f97221f3b595d08c8656eb5845ef80fe9 ("mm: move lesser used vma_area_struct members into the last cacheline")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
testcase: stress-ng
config: x86_64-rhel-9.4
compiler: gcc-12
test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
parameters:
nr_threads: 100%
testtime: 60s
test: mlockmany
cpufreq_governor: performance
Details are as below:
-------------------------------------------------------------------------------------------------->
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250610/202506102254.13cda0af-lkp@xxxxxxxxx
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
gcc-12/performance/x86_64-rhel-9.4/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp7/mlockmany/stress-ng/60s
commit:
f35ab95ca0 ("mm: replace vm_lock and detached flag with a reference count")
6bef4c2f97 ("mm: move lesser used vma_area_struct members into the last cacheline")
f35ab95ca0af7a27 6bef4c2f97221f3b595d08c8656
---------------- ---------------------------
%stddev %change %stddev
\ | \
0.66 ± 5% -0.1 0.57 ± 9% mpstat.cpu.all.soft%
27183 +1.9% 27708 vmstat.system.cs
264643 +5.2% 278326 stress-ng.mlockmany.ops
4406 +5.2% 4634 stress-ng.mlockmany.ops_per_sec
314509 +4.9% 329874 stress-ng.time.voluntary_context_switches
343582 -3.7% 330742 ± 2% proc-vmstat.nr_active_anon
454064 -2.7% 441886 proc-vmstat.nr_anon_pages
54743 -3.5% 52828 proc-vmstat.nr_slab_unreclaimable
343583 -3.7% 330741 ± 2% proc-vmstat.nr_zone_active_anon
1.99 ± 8% -14.0% 1.72 ± 12% sched_debug.cfs_rq:/.h_nr_queued.stddev
1.98 ± 8% -13.9% 1.71 ± 12% sched_debug.cfs_rq:/.h_nr_runnable.stddev
0.00 ± 18% -24.8% 0.00 ± 20% sched_debug.cpu.next_balance.stddev
1.99 ± 8% -13.8% 1.72 ± 12% sched_debug.cpu.nr_running.stddev
0.25 +0.0 0.25 perf-stat.i.branch-miss-rate%
21663531 +1.7% 22033919 perf-stat.i.branch-misses
27855 +1.8% 28352 perf-stat.i.context-switches
0.25 +0.0 0.25 perf-stat.overall.branch-miss-rate%
21319615 +1.7% 21691011 perf-stat.ps.branch-misses
27388 +1.7% 27866 perf-stat.ps.context-switches
19.64 ± 7% -18.7% 15.97 ± 11% perf-sched.sch_delay.avg.ms.__cond_resched.__vmalloc_area_node.__vmalloc_node_range_noprof.alloc_thread_stack_node.dup_task_struct
11.34 ± 8% -13.5% 9.80 ± 6% perf-sched.sch_delay.avg.ms.__cond_resched.down_read.__mm_populate.do_mlock.__x64_sys_mlock
17.11 ± 4% -8.2% 15.70 ± 5% perf-sched.sch_delay.avg.ms.__cond_resched.mlock_pte_range.walk_pmd_range.isra.0
10.51 ± 10% +35.6% 14.26 ± 15% perf-sched.sch_delay.avg.ms.__cond_resched.wp_page_copy.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
52.76 ± 22% -31.2% 36.28 ± 18% perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown].[unknown]
50.19 ± 7% -26.9% 36.68 ± 45% perf-sched.wait_and_delay.avg.ms.__cond_resched.__vmalloc_area_node.__vmalloc_node_range_noprof.alloc_thread_stack_node.dup_task_struct
23.36 ± 9% -14.2% 20.03 ± 6% perf-sched.wait_and_delay.avg.ms.__cond_resched.down_read.__mm_populate.do_mlock.__x64_sys_mlock
51.05 ± 10% -34.3% 33.53 ± 45% perf-sched.wait_and_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.prepare_creds.copy_creds.copy_process
245.67 ± 6% -47.6% 128.83 ± 4% perf-sched.wait_and_delay.count.__cond_resched.copy_page_range.dup_mmap.dup_mm.constprop
286.83 ± 7% -21.0% 226.67 ± 5% perf-sched.wait_and_delay.count.__cond_resched.down_write.anon_vma_clone.anon_vma_fork.dup_mmap
120.67 ± 9% +32.6% 160.00 ± 8% perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.anon_vma_fork
225.41 ± 31% -33.7% 149.44 ± 7% perf-sched.wait_and_delay.max.ms.__cond_resched.copy_page_range.dup_mmap.dup_mm.constprop
77.77 ± 73% +79.0% 139.22 ± 15% perf-sched.wait_and_delay.max.ms.__cond_resched.uprobe_start_dup_mmap.dup_mm.constprop.0
12.02 ± 11% -14.9% 10.23 ± 6% perf-sched.wait_time.avg.ms.__cond_resched.down_read.__mm_populate.do_mlock.__x64_sys_mlock
31.78 ± 18% -31.9% 21.63 ± 11% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.prepare_creds.copy_creds.copy_process
16.57 ± 5% -9.3% 15.03 ± 5% perf-sched.wait_time.avg.ms.__cond_resched.mlock_pte_range.walk_pmd_range.isra.0
25.21 ± 7% +12.4% 28.34 ± 6% perf-sched.wait_time.avg.ms.__cond_resched.mutex_lock_killable.pcpu_alloc_noprof.mm_init.dup_mm
24.68 ± 29% +39.0% 34.31 ± 15% perf-sched.wait_time.avg.ms.__cond_resched.uprobe_start_dup_mmap.dup_mm.constprop.0
207.48 ± 35% -32.5% 140.02 ± 6% perf-sched.wait_time.max.ms.__cond_resched.copy_page_range.dup_mmap.dup_mm.constprop
70.62 ± 41% +75.6% 124.03 ± 15% perf-sched.wait_time.max.ms.__cond_resched.uprobe_start_dup_mmap.dup_mm.constprop.0
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki