Re: [PATCH] random: add chacha8_block and swtich the rng to it
From: kernel test robot
Date: Wed May 08 2024 - 03:42:15 EST
Hello,
kernel test robot noticed a 80.5% improvement of stress-ng.getrandom.ops_per_sec on:
commit: 470a8ed1624a45a74176a786e28fac3234c71424 ("[PATCH] random: add chacha8_block and swtich the rng to it")
url: https://github.com/intel-lab-lkp/linux/commits/Aaron-Toponce/random-add-chacha8_block-and-swtich-the-rng-to-it/20240430-130757
base: https://git.kernel.org/cgit/linux/kernel/git/herbert/cryptodev-2.6.git master
patch link: https://lore.kernel.org/all/20240429134942.2873253-1-aaron.toponce@xxxxxxxxx/
patch subject: [PATCH] random: add chacha8_block and swtich the rng to it
testcase: stress-ng
test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
parameters:
nr_threads: 100%
testtime: 60s
test: getrandom
cpufreq_governor: performance
Details are as below:
-------------------------------------------------------------------------------------------------->
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240508/202405081501.e1c083b0-oliver.sang@xxxxxxxxx
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp7/getrandom/stress-ng/60s
commit:
ed265f7fd9 ("crypto: x86/aes-gcm - simplify GCM hash subkey derivation")
470a8ed162 ("random: add chacha8_block and swtich the rng to it")
ed265f7fd9a635d7 470a8ed1624a45a74176a786e28
---------------- ---------------------------
%stddev %change %stddev
\ | \
1.793e+09 +80.7% 3.239e+09 stress-ng.getrandom.getrandom_bits_per_sec
1.054e+08 +80.5% 1.901e+08 stress-ng.getrandom.ops
1755950 +80.5% 3168792 stress-ng.getrandom.ops_per_sec
13.18 +74.9% 23.05 stress-ng.time.user_time
1.088e+10 +52.5% 1.66e+10 perf-stat.i.branch-instructions
0.29 ± 8% -0.1 0.20 ± 7% perf-stat.i.branch-miss-rate%
0.57 +7.2% 0.61 perf-stat.i.cpi
3.411e+11 -6.7% 3.182e+11 perf-stat.i.instructions
1.75 -6.7% 1.63 perf-stat.i.ipc
0.29 ± 8% -0.1 0.20 ± 7% perf-stat.overall.branch-miss-rate%
0.57 +7.2% 0.61 perf-stat.overall.cpi
1.75 -6.7% 1.64 perf-stat.overall.ipc
1.07e+10 +52.6% 1.633e+10 perf-stat.ps.branch-instructions
3.355e+11 -6.7% 3.13e+11 perf-stat.ps.instructions
2.049e+13 -6.3% 1.919e+13 perf-stat.total.instructions
74.33 -18.9 55.41 perf-profile.calltrace.cycles-pp.chacha_permute.chacha_block_generic.get_random_bytes_user.__x64_sys_getrandom.do_syscall_64
83.70 -10.8 72.88 perf-profile.calltrace.cycles-pp.chacha_block_generic.get_random_bytes_user.__x64_sys_getrandom.do_syscall_64.entry_SYSCALL_64_after_hwframe
97.41 -1.3 96.15 perf-profile.calltrace.cycles-pp.get_random_bytes_user.__x64_sys_getrandom.do_syscall_64.entry_SYSCALL_64_after_hwframe.getrandom
98.10 -0.7 97.41 perf-profile.calltrace.cycles-pp.__x64_sys_getrandom.do_syscall_64.entry_SYSCALL_64_after_hwframe.getrandom
98.19 -0.6 97.55 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.getrandom
98.23 -0.6 97.61 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.getrandom
98.43 -0.4 97.99 perf-profile.calltrace.cycles-pp.getrandom
1.30 -0.2 1.14 perf-profile.calltrace.cycles-pp.chacha_block_generic.crng_fast_key_erasure.crng_make_state.get_random_bytes_user.__x64_sys_getrandom
1.56 +0.0 1.58 perf-profile.calltrace.cycles-pp.crng_fast_key_erasure.crng_make_state.get_random_bytes_user.__x64_sys_getrandom.do_syscall_64
1.62 +0.1 1.69 perf-profile.calltrace.cycles-pp.crng_make_state.get_random_bytes_user.__x64_sys_getrandom.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.05 +0.2 1.26 perf-profile.calltrace.cycles-pp.get_random_bytes_user.__x64_sys_getrandom.do_syscall_64.entry_SYSCALL_64_after_hwframe.getentropy
1.07 +0.2 1.30 perf-profile.calltrace.cycles-pp.__x64_sys_getrandom.do_syscall_64.entry_SYSCALL_64_after_hwframe.getentropy
1.13 +0.3 1.40 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.getentropy
1.16 +0.3 1.45 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.getentropy
1.31 +0.4 1.73 perf-profile.calltrace.cycles-pp.getentropy
11.88 +9.5 21.40 perf-profile.calltrace.cycles-pp._copy_to_iter.get_random_bytes_user.__x64_sys_getrandom.do_syscall_64.entry_SYSCALL_64_after_hwframe
75.73 -19.0 56.70 perf-profile.children.cycles-pp.chacha_permute
85.45 -11.4 74.03 perf-profile.children.cycles-pp.chacha_block_generic
99.14 -0.5 98.63 perf-profile.children.cycles-pp.get_random_bytes_user
99.20 -0.5 98.73 perf-profile.children.cycles-pp.__x64_sys_getrandom
98.52 -0.4 98.13 perf-profile.children.cycles-pp.getrandom
99.45 -0.4 99.07 perf-profile.children.cycles-pp.do_syscall_64
99.49 -0.3 99.14 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
0.44 ± 4% -0.0 0.40 ± 6% perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
0.46 ± 4% -0.0 0.42 ± 6% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
0.42 ± 4% -0.0 0.38 ± 6% perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
0.42 ± 4% -0.0 0.38 ± 6% perf-profile.children.cycles-pp.hrtimer_interrupt
0.24 ± 7% -0.0 0.20 ± 5% perf-profile.children.cycles-pp.tick_nohz_handler
0.25 ± 7% -0.0 0.21 ± 6% perf-profile.children.cycles-pp.__hrtimer_run_queues
0.12 ± 4% -0.0 0.11 ± 7% perf-profile.children.cycles-pp.scheduler_tick
1.56 +0.0 1.60 perf-profile.children.cycles-pp.crng_fast_key_erasure
0.03 ± 70% +0.0 0.08 perf-profile.children.cycles-pp.stress_getrandom
0.09 +0.1 0.14 perf-profile.children.cycles-pp.syscall_exit_to_user_mode
0.00 +0.1 0.07 perf-profile.children.cycles-pp.__memcpy
1.62 +0.1 1.70 perf-profile.children.cycles-pp.crng_make_state
0.00 +0.1 0.08 ± 6% perf-profile.children.cycles-pp.syscall_return_via_sysret
0.13 ± 3% +0.1 0.25 perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
0.17 +0.1 0.31 ± 2% perf-profile.children.cycles-pp.entry_SYSCALL_64
1.37 +0.5 1.83 perf-profile.children.cycles-pp.getentropy
12.20 +9.8 21.97 perf-profile.children.cycles-pp._copy_to_iter
75.17 -19.1 56.06 perf-profile.self.cycles-pp.chacha_permute
0.05 +0.0 0.08 ± 5% perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
0.05 ± 7% +0.0 0.09 perf-profile.self.cycles-pp.crng_make_state
0.07 +0.1 0.13 ± 3% perf-profile.self.cycles-pp.do_syscall_64
0.06 ± 6% +0.1 0.12 perf-profile.self.cycles-pp.getentropy
0.00 +0.1 0.06 ± 6% perf-profile.self.cycles-pp.__memcpy
0.00 +0.1 0.06 ± 7% perf-profile.self.cycles-pp.stress_getrandom
0.00 +0.1 0.06 ± 7% perf-profile.self.cycles-pp.syscall_exit_to_user_mode
0.00 +0.1 0.07 ± 7% perf-profile.self.cycles-pp.__x64_sys_getrandom
0.09 ± 5% +0.1 0.17 ± 2% perf-profile.self.cycles-pp.getrandom
0.00 +0.1 0.08 ± 6% perf-profile.self.cycles-pp.entry_SYSCALL_64
0.00 +0.1 0.08 ± 6% perf-profile.self.cycles-pp.syscall_return_via_sysret
0.13 +0.1 0.24 perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
0.23 ± 2% +0.2 0.39 perf-profile.self.cycles-pp.crng_fast_key_erasure
1.81 +1.4 3.23 perf-profile.self.cycles-pp.get_random_bytes_user
9.46 +7.4 16.86 perf-profile.self.cycles-pp.chacha_block_generic
11.93 +9.6 21.49 perf-profile.self.cycles-pp._copy_to_iter
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki