Re: INFO: rcu detected stall in ext4_write_checks

From: Theodore Ts'o
Date: Wed Jun 26 2019 - 17:04:15 EST


The reproducer causes similar rcu stalls when using xfs:

RSP: 0018:ffffaae8c0953c58 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000288 RBX: 000000000000b05a RCX: ffffaae8c0953d50
RDX: 000000000000001c RSI: 000000000000001c RDI: ffffdcec41772800
RBP: ffffdcec41772800 R08: 00000015143ceae3 R09: 0000000000000001
R10: 0000000000000000 R11: ffffffff96863980 R12: ffff88d179ac7d80
R13: ffff88d174837ca0 R14: 0000000000000288 R15: 000000000000001c
generic_file_buffered_read+0x2c1/0x8b0
xfs_file_buffered_aio_read+0x5f/0x140
xfs_file_read_iter+0x6e/0xd0
generic_file_splice_read+0x110/0x1d0
splice_direct_to_actor+0xd5/0x230
? pipe_to_sendpage+0x90/0x90
do_splice_direct+0x9f/0xd0
do_sendfile+0x1d4/0x3a0
__se_sys_sendfile64+0x58/0xc0
do_syscall_64+0x50/0x1b0
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7f8038387469

and btrfs

[ 42.671321] RSP: 0018:ffff960dc0963b90 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
[ 42.673592] RAX: 0000000000000001 RBX: ffff89eb3628ab30 RCX: 00000000ffffffff
[ 42.675775] RDX: ffff89eb3628a380 RSI: 00000000ffffffff RDI: ffffffff9ec63980
[ 42.677851] RBP: ffff89eb3628aaf8 R08: 0000000000000001 R09: 0000000000000001
[ 42.680028] R10: 0000000000000000 R11: ffffffff9ec63980 R12: ffff89eb3628a380
[ 42.682213] R13: 0000000000000246 R14: 0000000000000001 R15: ffffffff9ec63980
[ 42.684509] xas_descend+0xed/0x120
[ 42.685682] xas_load+0x39/0x50
[ 42.686691] find_get_entry+0xa0/0x330
[ 42.687885] pagecache_get_page+0x30/0x2d0
[ 42.689190] generic_file_buffered_read+0xee/0x8b0
[ 42.690708] generic_file_splice_read+0x110/0x1d0
[ 42.692374] splice_direct_to_actor+0xd5/0x230
[ 42.693868] ? pipe_to_sendpage+0x90/0x90
[ 42.695180] do_splice_direct+0x9f/0xd0
[ 42.696407] do_sendfile+0x1d4/0x3a0
[ 42.697551] __se_sys_sendfile64+0x58/0xc0
[ 42.698854] do_syscall_64+0x50/0x1b0
[ 42.700021] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 42.701619] RIP: 0033:0x7f21e8d5f469

So this is probably a generic vfs issue (probably in sendfile). Next
steps is probably for someone to do a bisection search covering
changes to the fs/ and mm/ directories. This should exclude bogus
changes in the networking layer, although it does seem that there is
something very badly wrong which is breaking bisectability, if you
can't even scp to the system under test for certain commit ranges. :-( :-( :-(

- Ted