Re: INFO: task can't die in shrink_inactive_list (2)
From: Alex Shi
Date: Tue Nov 24 2020 - 09:38:21 EST
在 2020/11/24 下午8:00, Alex Shi 写道:
>>> syzbot found the following issue on:
>>>
>>> HEAD commit: 03430750 Add linux-next specific files for 20201116
>>> git tree: linux-next
>>> console output: https://syzkaller.appspot.com/x/log.txt?x=13f80e5e500000
>>> kernel config: https://syzkaller.appspot.com/x/.config?x=a1c4c3f27041fdb8
>>> dashboard link: https://syzkaller.appspot.com/bug?extid=e5a33e700b1dd0da20a2
>>> compiler: gcc (GCC) 10.1.0-syz 20200507
>>> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=12f7bc5a500000
>>> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=10934cf2500000
> CC Peter Zijlstra.
>
> I found next-20200821 had a very very similar ops as this.
> https://groups.google.com/g/syzkaller-upstream-moderation/c/S0pyqK1dZv8/m/dxMoEhGdAQAJ
> So does this means the bug exist for long time from 5.9-rc1?
>
> The reproducer works randomly on a cpu=2, mem=1600M x86 vm. It could cause hung again
> on both kernel, but both with different kernel stack.
>
> Maybe is system just too busy? I will try more older kernel with the reproducer.
5.8 kernel sometime also failed on this test on my 2 cpus vm guest with 2g memory:
Any comments for this issue?
Thanks
Alex
[ 5875.750929][ T946] INFO: task repro:31866 blocked for more than 143 seconds.
[ 5875.751618][ T946] Not tainted 5.8.0 #6
[ 5875.752046][ T946] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables th.
[ 5875.752845][ T946] repro D12088 31866 1 0x80004086
[ 5875.753436][ T946] Call Trace:
[ 5875.753747][ T946] __schedule+0x394/0x950
[ 5875.774033][ T946] ? __mutex_lock+0x46f/0x9c0
[ 5875.774481][ T946] ? blkdev_put+0x18/0x120
[ 5875.774894][ T946] schedule+0x37/0xe0
[ 5875.775260][ T946] schedule_preempt_disabled+0xf/0x20
[ 5875.775753][ T946] __mutex_lock+0x474/0x9c0
[ 5875.776174][ T946] ? lock_acquire+0xa7/0x390
[ 5875.776602][ T946] ? locks_remove_file+0x1e7/0x2d0
[ 5875.777079][ T946] ? blkdev_put+0x18/0x120
[ 5875.777485][ T946] blkdev_put+0x18/0x120
[ 5875.777880][ T946] blkdev_close+0x1f/0x30
[ 5875.778281][ T946] __fput+0xf0/0x260
[ 5875.778639][ T946] task_work_run+0x68/0xb0
[ 5875.779054][ T946] do_exit+0x3df/0xce0
[ 5875.779430][ T946] ? get_signal+0x11d/0xca0
[ 5875.779846][ T946] do_group_exit+0x42/0xb0
[ 5875.780261][ T946] get_signal+0x16a/0xca0
[ 5875.780662][ T946] ? handle_mm_fault+0xc8f/0x19c0
[ 5875.781134][ T946] do_signal+0x2b/0x8e0
[ 5875.781521][ T946] ? trace_hardirqs_off+0xe/0xf0
[ 5875.781989][ T946] __prepare_exit_to_usermode+0xef/0x1f0
[ 5875.782512][ T946] ? asm_exc_page_fault+0x8/0x30
[ 5875.782979][ T946] prepare_exit_to_usermode+0x5/0x30
[ 5875.783461][ T946] asm_exc_page_fault+0x1e/0x30
[ 5875.783909][ T946] RIP: 0033:0x428dd7
[ 5875.794899][ T946] Code: Bad RIP value.
[ 5875.795290][ T946] RSP: 002b:00007f37c99e0d78 EFLAGS: 00010202
[ 5875.795858][ T946] RAX: 0000000020000080 RBX: 0000000000000000 RCX: 0000000076656f
[ 5875.796588][ T946] RDX: 000000000000000c RSI: 00000000004b2370 RDI: 00000000200000
[ 5875.797326][ T946] RBP: 00007f37c99e0da0 R08: 00007f37c99e1700 R09: 00007f37c99e10
[ 5875.798063][ T946] R10: 00007f37c99e19d0 R11: 0000000000000202 R12: 00000000000000
[ 5875.798802][ T946] R13: 0000000000021000 R14: 0000000000000000 R15: 00007f37c99e10