Re: [smb3] 5efdd9122e: filebench.sum_operations/s -50.0% regression

From: Yin Fengwei
Date: Thu Mar 23 2023 - 03:48:14 EST


On 3/15/23 22:26, Steve French wrote:
Can you verify what this perf looks like with "closetime=0" and "closetime=1"

Are there differences in /proc/fs/cifs/Stats when you run the same
steps with "closetimeo=1" vs. the more recent default (5 seconds)?
The dump of /proc/fs/cifs/Stats are attached. You can tell which
closetimeo is used from the file name. Thanks.


Regards
Yin, Fengwei


On Wed, Mar 15, 2023 at 2:46 AM Yin, Fengwei <fengwei.yin@xxxxxxxxx> wrote:



On 8/26/2022 10:41 AM, kernel test robot wrote:
Greeting,

FYI, we noticed a -50.0% regression of filebench.sum_operations/s due to commit:


commit: 5efdd9122eff772eae2feae9f0fc0ec02d4846a3 ("smb3: allow deferred close timeout to be configurable")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

in testcase: filebench
on test machine: 88 threads 2 sockets Intel(R) Xeon(R) Gold 6238M CPU @ 2.10GHz (Cascade Lake) with 128G memory
with following parameters:

disk: 1HDD
fs: ext4
fs2: cifs
test: filemicro_delete.f
cpufreq_governor: performance
ucode: 0x5003302
Please note, we still could see this regresion on v6.3-rc2. And the regression is related with
the commit: 5efdd9122eff772eae2feae9f0fc0ec02d4846a3. This commit changed the default timeout
value from 1s to 5s. If change the timeout back to 1s as following:

diff --git a/fs/cifs/fs_context.c b/fs/cifs/fs_context.c
index 6d13f8207e96..6b930fb0c4bd 100644
--- a/fs/cifs/fs_context.c
+++ b/fs/cifs/fs_context.c
@@ -1537,7 +1537,7 @@ int smb3_init_fs_context(struct fs_context *fc)

ctx->acregmax = CIFS_DEF_ACTIMEO;
ctx->acdirmax = CIFS_DEF_ACTIMEO;
- ctx->closetimeo = SMB3_DEF_DCLOSETIMEO;
+ ctx->closetimeo = CIFS_DEF_ACTIMEO;

/* Most clients set timeout to 0, allows server to use its default */
ctx->handle_timeout = 0; /* See MS-SMB2 spec section 2.2.14.2.12 */

The regression is gone:
dcb45fd7f501f864 v6.3-rc2 32715af441411a5a266606be08f
---------------- --------------------------- ---------------------------
fail:runs %reproduction fail:runs %reproduction fail:runs
| | | | |
:25 0% :3 33% 1:3 last_state.booting
:25 0% :3 33% 1:3 last_state.is_incomplete_run
%stddev %change %stddev %change %stddev
\ | \ | \
515.95 -50.0% 257.98 -0.0% 515.92 filebench.sum_operations/s
5.04 ± 7% +833.7% 47.09 ± 2% -2.9% 4.90 ± 2% filebench.sum_time_ms/op
10438 -100.0% 0.33 ±141% -100.0% 0.50 ±100% filebench.time.major_page_faults
167575 -4.1% 160660 -4.4% 160140 filebench.time.maximum_resident_set_size
7138 ± 11% +141.4% 17235 ± 3% +147.6% 17677 filebench.time.minor_page_faults
28.68 ± 9% +199.9% 86.00 ± 7% -2.4% 28.00 filebench.time.percent_of_cpu_this_job_got
2453485 ± 54% -63.0% 907380 -66.2% 830273 ± 6% cpuidle..usage
0.61 ± 38% +0.8 1.41 ± 3% +0.2 0.80 ± 4% mpstat.cpu.all.sys%
142984 ± 13% -45.6% 77725 -47.5% 75106 vmstat.system.in
34.23 ± 7% +27.9% 43.79 +27.8% 43.74 boot-time.boot
17.09 ± 11% +66.0% 28.38 +65.5% 28.28 boot-time.dhcp
2661 ± 7% +37.5% 3659 +37.2% 3651 boot-time.idle
104737 ±185% -87.8% 12762 ± 10% -89.8% 10631 ± 4% turbostat.C1


32715af441411a5a266606be08f is v6.3-rc2 with the change to restore the timeout to 1s. Thanks.

Regards
Yin, Fengwei




=========================================================================================
compiler/cpufreq_governor/disk/fs2/fs/kconfig/rootfs/tbox_group/test/testcase/ucode:
gcc-11/performance/1HDD/cifs/ext4/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/lkp-csl-2sp9/filemicro_delete.f/filebench/0x5003302

commit:
dcb45fd7f5 ("cifs: Do not use tcon->cfid directly, use the cfid we get from open_cached_dir")
5efdd9122e ("smb3: allow deferred close timeout to be configurable")

dcb45fd7f501f864 5efdd9122eff772eae2feae9f0f
---------------- ---------------------------
%stddev %change %stddev
\ | \
515.95 -50.0% 257.98 filebench.sum_operations/s
4.81 ± 2% +1038.4% 54.78 ± 6% filebench.sum_time_ms/op
29.00 ± 8% +212.1% 90.50 ± 3% filebench.time.percent_of_cpu_this_job_got
24629 +2.7% 25297 filebench.time.voluntary_context_switches
7.685e+08 +19.3% 9.169e+08 ± 4% cpuidle..time
2.53 ± 6% -20.6% 2.00 ± 3% iostat.cpu.iowait
1506141 ± 8% +22.4% 1843256 ± 3% turbostat.IRQ
2.00 -50.0% 1.00 vmstat.procs.b
21969 ± 2% -9.5% 19885 ± 2% vmstat.system.cs
3.06 ± 7% -0.7 2.35 ± 4% mpstat.cpu.all.iowait%
0.79 ± 5% +0.5 1.27 ± 2% mpstat.cpu.all.sys%
0.89 ± 3% -0.1 0.79 ± 3% mpstat.cpu.all.usr%
34.55 ± 14% -34.8% 22.51 ± 27% sched_debug.cfs_rq:/.removed.runnable_avg.avg
119.64 ± 3% -20.0% 95.69 ± 17% sched_debug.cfs_rq:/.removed.runnable_avg.stddev
34.55 ± 14% -34.8% 22.51 ± 27% sched_debug.cfs_rq:/.removed.util_avg.avg
119.64 ± 3% -20.0% 95.69 ± 17% sched_debug.cfs_rq:/.removed.util_avg.stddev
5249 +15.8% 6076 meminfo.Active
3866 ± 2% +17.7% 4552 meminfo.Active(anon)
1382 ± 4% +10.3% 1524 ± 4% meminfo.Active(file)
69791 ± 14% +39.8% 97553 ± 6% meminfo.AnonHugePages
72709 ± 2% +12.5% 81779 ± 3% meminfo.Inactive(file)
23219 +13.5% 26352 ± 3% meminfo.KernelStack
966.50 ± 2% +17.7% 1137 proc-vmstat.nr_active_anon
74302 +6.3% 78977 ± 2% proc-vmstat.nr_anon_pages
81133 +6.0% 85973 proc-vmstat.nr_inactive_anon
18172 ± 2% +12.5% 20442 ± 3% proc-vmstat.nr_inactive_file
23213 +13.5% 26348 ± 3% proc-vmstat.nr_kernel_stack
17983 +2.3% 18400 proc-vmstat.nr_mapped
7446 ± 2% +5.5% 7853 ± 3% proc-vmstat.nr_shmem
26888 +1.6% 27306 proc-vmstat.nr_slab_reclaimable
47220 +3.4% 48803 proc-vmstat.nr_slab_unreclaimable
966.50 ± 2% +17.7% 1137 proc-vmstat.nr_zone_active_anon
81133 +6.0% 85973 proc-vmstat.nr_zone_inactive_anon
18172 ± 2% +12.5% 20442 ± 3% proc-vmstat.nr_zone_inactive_file
361460 +2.5% 370454 proc-vmstat.numa_hit
946.67 +18.6% 1122 proc-vmstat.pgactivate
361562 +2.5% 370553 proc-vmstat.pgalloc_normal
187906 +4.7% 196761 proc-vmstat.pgfault
8189 +2.5% 8395 proc-vmstat.pgreuse
1.097e+09 +15.5% 1.267e+09 ± 7% perf-stat.i.branch-instructions
39079265 ± 6% -20.9% 30915354 ± 4% perf-stat.i.branch-misses
5093263 ± 4% -23.7% 3884752 ± 9% perf-stat.i.cache-misses
29213 -18.7% 23764 ± 5% perf-stat.i.context-switches
7.666e+09 ± 4% +5.7% 8.106e+09 ± 2% perf-stat.i.cpu-cycles
1877 ± 15% +75.1% 3287 ± 12% perf-stat.i.cycles-between-cache-misses
1735450 ± 3% -12.9% 1512060 ± 3% perf-stat.i.iTLB-load-misses
2898 ± 3% +34.4% 3895 ± 7% perf-stat.i.instructions-per-iTLB-miss
1493 -20.3% 1190 ± 7% perf-stat.i.major-faults
0.09 ± 3% +5.8% 0.09 ± 2% perf-stat.i.metric.GHz
48.47 ± 11% +8.4 56.83 ± 7% perf-stat.i.node-store-miss-rate%
283426 ± 4% -21.6% 222190 ± 10% perf-stat.i.node-stores
3.57 ± 7% -1.1 2.44 ± 6% perf-stat.overall.branch-miss-rate%
1508 ± 3% +39.8% 2108 ± 9% perf-stat.overall.cycles-between-cache-misses
3022 ± 3% +23.6% 3736 ± 5% perf-stat.overall.instructions-per-iTLB-miss
9.585e+08 +18.8% 1.138e+09 ± 6% perf-stat.ps.branch-instructions
34151514 ± 6% -18.8% 27725316 ± 4% perf-stat.ps.branch-misses
4450329 ± 5% -21.7% 3486409 ± 9% perf-stat.ps.cache-misses
25524 -16.4% 21333 ± 4% perf-stat.ps.context-switches
77139 +2.5% 79105 perf-stat.ps.cpu-clock
6.704e+09 ± 4% +8.7% 7.287e+09 perf-stat.ps.cpu-cycles
1.06e+09 +11.3% 1.18e+09 ± 5% perf-stat.ps.dTLB-loads
1517349 ± 3% -10.5% 1357716 ± 2% perf-stat.ps.iTLB-load-misses
4.582e+09 +10.8% 5.075e+09 ± 6% perf-stat.ps.instructions
1296 -18.1% 1061 ± 6% perf-stat.ps.major-faults
247613 ± 4% -19.5% 199283 ± 9% perf-stat.ps.node-stores
77139 +2.5% 79105 perf-stat.ps.task-clock
3.697e+10 +35.3% 5.003e+10 perf-stat.total.instructions
8.51 ± 91% -6.9 1.59 ±223% perf-profile.calltrace.cycles-pp.getdents64
8.34 ± 83% -6.7 1.67 ±141% perf-profile.calltrace.cycles-pp.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
6.25 ±107% -6.2 0.00 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.open64
6.25 ±107% -6.2 0.00 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64
6.25 ±107% -6.2 0.00 perf-profile.calltrace.cycles-pp.open64
7.63 ± 84% -6.0 1.59 ±223% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.getdents64
7.63 ± 84% -6.0 1.59 ±223% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.getdents64
7.63 ± 84% -6.0 1.59 ±223% perf-profile.calltrace.cycles-pp.__x64_sys_getdents64.do_syscall_64.entry_SYSCALL_64_after_hwframe.getdents64
7.63 ± 84% -6.0 1.59 ±223% perf-profile.calltrace.cycles-pp.iterate_dir.__x64_sys_getdents64.do_syscall_64.entry_SYSCALL_64_after_hwframe.getdents64
6.26 ±115% -5.4 0.88 ±223% perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
6.26 ±115% -5.4 0.88 ±223% perf-profile.calltrace.cycles-pp.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
6.26 ±115% -4.6 1.67 ±141% perf-profile.calltrace.cycles-pp.do_group_exit.get_signal.arch_do_signal_or_restart.exit_to_user_mode_loop.exit_to_user_mode_prepare
6.26 ±115% -4.6 1.67 ±141% perf-profile.calltrace.cycles-pp.do_exit.do_group_exit.get_signal.arch_do_signal_or_restart.exit_to_user_mode_loop
6.26 ±115% -4.6 1.67 ±141% perf-profile.calltrace.cycles-pp.arch_do_signal_or_restart.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64
6.26 ±115% -4.6 1.67 ±141% perf-profile.calltrace.cycles-pp.get_signal.arch_do_signal_or_restart.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode
4.57 ± 73% -3.8 0.76 ±223% perf-profile.calltrace.cycles-pp.asm_exc_page_fault.perf_mmap__read_head.perf_mmap__push.record__mmap_read_evlist.__cmd_record
4.57 ± 73% -3.8 0.76 ±223% perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.perf_mmap__read_head.perf_mmap__push.record__mmap_read_evlist
4.57 ± 73% -3.8 0.76 ±223% perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.perf_mmap__read_head.perf_mmap__push
4.57 ± 73% -3.8 0.76 ±223% perf-profile.calltrace.cycles-pp.perf_mmap__read_head.perf_mmap__push.record__mmap_read_evlist.__cmd_record.cmd_record
3.38 ±103% -3.4 0.00 perf-profile.calltrace.cycles-pp.unmap_vmas.exit_mmap.mmput.exit_mm.do_exit
3.38 ±103% -3.4 0.00 perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.exit_mmap.mmput.exit_mm
3.38 ±103% -3.4 0.00 perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.unmap_vmas.exit_mmap.mmput
3.38 ±103% -3.4 0.00 perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas.exit_mmap
8.51 ± 91% -6.9 1.59 ±223% perf-profile.children.cycles-pp.getdents64
8.34 ± 83% -6.7 1.67 ±141% perf-profile.children.cycles-pp.exit_to_user_mode_prepare
8.34 ± 83% -6.7 1.67 ±141% perf-profile.children.cycles-pp.exit_to_user_mode_loop
8.34 ± 83% -6.7 1.67 ±141% perf-profile.children.cycles-pp.syscall_exit_to_user_mode
6.25 ±107% -6.2 0.00 perf-profile.children.cycles-pp.open64
7.63 ± 84% -6.0 1.59 ±223% perf-profile.children.cycles-pp.__x64_sys_getdents64
7.63 ± 84% -6.0 1.59 ±223% perf-profile.children.cycles-pp.iterate_dir
6.26 ±115% -4.6 1.67 ±141% perf-profile.children.cycles-pp.arch_do_signal_or_restart
6.26 ±115% -4.6 1.67 ±141% perf-profile.children.cycles-pp.get_signal
4.57 ± 73% -3.8 0.76 ±223% perf-profile.children.cycles-pp.perf_mmap__read_head
3.38 ±103% -3.4 0.00 perf-profile.children.cycles-pp.unmap_vmas
3.38 ±103% -3.4 0.00 perf-profile.children.cycles-pp.unmap_page_range
3.38 ±103% -3.4 0.00 perf-profile.children.cycles-pp.zap_pmd_range
3.38 ±103% -3.4 0.00 perf-profile.children.cycles-pp.zap_pte_range
8.54 ± 43% +8.6 17.19 ± 38% perf-profile.children.cycles-pp.asm_exc_page_fault


If you fix the issue, kindly add following tag
Reported-by: kernel test robot <yujie.liu@xxxxxxxxx>


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
sudo bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
sudo bin/lkp run generated-yaml-file

# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.


Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


#regzbot introduced: 5efdd9122e





Resources in use
CIFS Session: 1
Share (unique mount targets): 2
SMB Request/Response Buffer: 1 Pool size: 5
SMB Small Req/Resp Buffer: 1 Pool size: 30
Total Large 177 Small 59083 Allocations
Operations (MIDs): 0

0 session 0 share reconnects
Total vfs operations: 21664 maximum at one time: 278

Max requests in flight: 294
Total time spent processing by command. Time units are jiffies (1000 per second)
SMB3 CMD Number Total Time Fastest Slowest
-------- ------ ---------- ------- -------
0 1 25 25 25
1 2 1 0 1
2 0 0 0 0
3 2 1 0 1
4 0 0 0 0
5 11538 29831 0 38
6 11538 59530 0 38
7 0 0 0 0
8 0 0 0 0
9 5000 28108 0 37
10 0 0 0 0
11 2 0 0 0
12 0 0 0 0
13 0 0 0 0
14 187 85 0 2
15 0 0 0 0
16 1297 921 0 37
17 62 26 0 12
18 0 0 0 0

1) \\localhost\fs
SMBs: 29624
Bytes read: 0 Bytes written: 81920000
Open files: 0 total (local), 0 open on server
TreeConnects: 1 total 0 failed
TreeDisconnects: 0 total 0 failed
Creates: 11538 total 0 failed
Closes: 11538 total 0 failed
Flushes: 0 total 0 failed
Reads: 0 total 0 failed
Writes: 5000 total 0 failed
Locks: 0 total 0 failed
IOCTLs: 1 total 1 failed
QueryDirectories: 187 total 0 failed
ChangeNotifies: 0 total 0 failed
QueryInfos: 1297 total 0 failed
SetInfos: 62 total 0 failed
OplockBreaks: 0 sent 0 failed
Resources in use
CIFS Session: 1
Share (unique mount targets): 2
SMB Request/Response Buffer: 1 Pool size: 5
SMB Small Req/Resp Buffer: 1 Pool size: 30
Total Large 177 Small 57235 Allocations
Operations (MIDs): 0

0 session 0 share reconnects
Total vfs operations: 21356 maximum at one time: 1143

Max requests in flight: 1143
Total time spent processing by command. Time units are jiffies (1000 per second)
SMB3 CMD Number Total Time Fastest Slowest
-------- ------ ---------- ------- -------
0 1 25 25 25
1 2 1 0 1
2 0 0 0 0
3 2 0 0 0
4 0 0 0 0
5 11230 27180 0 50
6 11230 106921 0 66
7 0 0 0 0
8 0 0 0 0
9 5000 25106 0 41
10 0 0 0 0
11 2 0 0 0
12 0 0 0 0
13 0 0 0 0
14 187 67 0 2
15 0 0 0 0
16 989 617 0 10
17 62 9 0 1
18 0 0 0 0

1) \\localhost\fs
SMBs: 28700
Bytes read: 0 Bytes written: 81920000
Open files: 0 total (local), 0 open on server
TreeConnects: 1 total 0 failed
TreeDisconnects: 0 total 0 failed
Creates: 11230 total 0 failed
Closes: 11230 total 0 failed
Flushes: 0 total 0 failed
Reads: 0 total 0 failed
Writes: 5000 total 0 failed
Locks: 0 total 0 failed
IOCTLs: 1 total 1 failed
QueryDirectories: 187 total 0 failed
ChangeNotifies: 0 total 0 failed
QueryInfos: 989 total 0 failed
SetInfos: 62 total 0 failed
OplockBreaks: 0 sent 0 failed