trinity test fanotify cause hungtasks on kernel 4.13

From: Gu Zheng
Date: Thu Jul 27 2017 - 05:56:21 EST


hi,Eric Paris:
when we used the trinity test the fanotify interfaces, it cause many hungtasks.
CONFIG_FANOTIFY_ACCESS_PERMISSIONS=y
the shell is simple:
1 #!/bin/bash
2
3 while true
4 do
5 ./trinity -c fanotify_init -l off -C 2 -X > /dev/null 2>&1 &
6 sleep 1
7 ./trinity -c fanotify_mark -l off -C 2 -X > /dev/null 2>&1 &
8 sleep 10
9 done
we found the trinity enter the D state fastly.
we check the pids'stack
[root@localhost ~]# ps -aux | grep D
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 977 0.0 0.0 207992 7904 ? Ss 15:23 0:00 /usr/bin/abrt-watch-log -F BUG: WARNING: at WARNING: CPU: INFO: possible recursive locking detected ernel BUG at list_del corruption list_add corruption do_IRQ: stack overflow: ear stack overflow (cur: eneral protection fault nable to handle kernel ouble fault: RTNL: assertion failed eek! page_mapcount(page) went negative! adness at NETDEV WATCHDOG ysctl table check failed : nobody cared IRQ handler type mismatch Machine Check Exception: Machine check events logged divide error: bounds: coprocessor segment overrun: invalid TSS: segment not present: invalid opcode: alignment check: stack segment: fpu exception: simd exception: iret exception: /var/log/messages -- /usr/bin/abrt-dump-oops -xtD
root 997 0.0 0.0 203360 3188 ? Ssl 15:23 0:00 /usr/sbin/gssproxy -D
root 1549 0.0 0.0 82552 6012 ? Ss 15:23 0:00 /usr/sbin/sshd -D
root 2807 3.5 0.2 59740 35416 pts/0 DL 15:24 0:00 ./trinity -c fanotify_init -l off -C 2 -X
root 2809 3.1 0.2 53712 35332 pts/0 DL 15:24 0:00 ./trinity -c fanotify_mark -l off -C 2 -X
root 2915 0.0 0.0 136948 1776 pts/0 D 15:24 0:00 ps ax
root 2919 0.0 0.0 112656 2100 pts/1 S+ 15:24 0:00 grep --color=auto D
[root@localhost ~]# cat /proc/2807/stack
[<ffffffff95287551>] fanotify_handle_event+0x2a1/0x2f0
[<ffffffff95283c13>] fsnotify+0x2d3/0x4f0
[<ffffffff952f3a89>] security_file_open+0x89/0x90
[<ffffffff95239819>] do_dentry_open+0x139/0x330
[<ffffffff9523ad9f>] vfs_open+0x4f/0x70
[<ffffffff9524c428>] path_openat+0x548/0x1350
[<ffffffff9524ea51>] do_filp_open+0x91/0x100
[<ffffffff9523b174>] do_sys_open+0x124/0x210
[<ffffffff9523b27e>] SyS_open+0x1e/0x20
[<ffffffff95003857>] do_syscall_64+0x67/0x150
[<ffffffff95741de7>] entry_SYSCALL64_slow_path+0x25/0x25
[<ffffffffffffffff>] 0xffffffffffffffff

[root@localhost ~]# cat /proc/2915/stack
[<ffffffff95287551>] fanotify_handle_event+0x2a1/0x2f0
[<ffffffff95283c13>] fsnotify+0x2d3/0x4f0
[<ffffffff952f3a89>] security_file_open+0x89/0x90
[<ffffffff95239819>] do_dentry_open+0x139/0x330
[<ffffffff9523ad9f>] vfs_open+0x4f/0x70
[<ffffffff9524c428>] path_openat+0x548/0x1350
[<ffffffff9524ea51>] do_filp_open+0x91/0x100
[<ffffffff9523b174>] do_sys_open+0x124/0x210
[<ffffffff9523b27e>] SyS_open+0x1e/0x20
[<ffffffff95003857>] do_syscall_64+0x67/0x150
[<ffffffff95741de7>] entry_SYSCALL64_slow_path+0x25/0x25
[<ffffffffffffffff>] 0xffffffffffffffff
[root@localhost ~]# cat /proc/2809/stack
[<ffffffff95287551>] fanotify_handle_event+0x2a1/0x2f0
[<ffffffff95283c13>] fsnotify+0x2d3/0x4f0
[<ffffffff952f3a89>] security_file_open+0x89/0x90
[<ffffffff95239819>] do_dentry_open+0x139/0x330
[<ffffffff9523ad9f>] vfs_open+0x4f/0x70
[<ffffffff9524c428>] path_openat+0x548/0x1350
[<ffffffff9524ea51>] do_filp_open+0x91/0x100
[<ffffffff9523b174>] do_sys_open+0x124/0x210
[<ffffffff9523b27e>] SyS_open+0x1e/0x20
[<ffffffff95003857>] do_syscall_64+0x67/0x150
[<ffffffff95741de7>] entry_SYSCALL64_slow_path+0x25/0x25
[<ffffffffffffffff>] 0xffffffffffffffff

all pids wait for the response in fanotify_handle_event->fanotify_get_response,
but the monitor can not replay anything ,becauseof the permission or killed monitor
then the others will be stucked who use the fanotify or synchronize_srcu

if we disable the CONFIG_FANOTIFY_ACCESS_PERMISSIONS,
the mem will be consumed quickly, because the fsnotify_mark_srcu read lock always be hold.

if add a timeout , the safety can not be guaranteed.

do you have any ideas?
thanks.