Re: [syzbot] possible deadlock in del_gendisk

From: Steven Rostedt
Date: Tue Apr 13 2021 - 14:40:21 EST


On Tue, 13 Apr 2021 20:24:00 +0200
Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote:

> On Tue, Apr 13, 2021 at 7:43 PM Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
> >
> > On Tue, 13 Apr 2021 13:41:47 -0400
> > Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
> >
> > > As the below splats look like it has nothing to do with this patch, and
> > > this patch will add a WARN() if there's broken logic somewhere, I bet the
> > > bisect got confused (if it is automated and does a panic_on_warning),
> > > because it will panic for broken code that his patch detects.
> > >
> > > That is, the bisect was confused because it was triggering on two different
> > > issues. One that triggered the reported splat below, and another that this
> > > commit detects and warns on.
> >
> > Is it possible to update the the bisect to make sure that if it is failing
> > on warnings, to make sure the warnings are somewhat related, before decided
> > that its the same bug?
>
> It does not seem to be feasible, bugs manifest differently in both
> space and time. Also even if we somehow conclude the crash we see is
> different, it says nothing about the original bug. For more data see:
> https://groups.google.com/g/syzkaller/c/sR8aAXaWEF4/m/tTWYRgvmAwAJ

Sure, but if you trigger a lockdep bug, lockdep bugs are usually very
similar in all instances. It usually will include the same locks, or at
least be the same type of lockdep bug (where some types are related).

If the bisect saw an issue before and at this commit, I would think it
didn't trigger a lockdep bug at all, and simply triggered a warning.

In fact, according to:

https://syzkaller.appspot.com/x/report.txt?x=15a7e77ed00000

That's exactly what it did.

That is, if the original bug is a lockdep warning, you should only be
interested in lockdep warnings (at a minimum). The final bug here had:

------------[ cut here ]------------
raw_local_irq_restore() called with IRQs enabled
WARNING: CPU: 0 PID: 8777 at kernel/locking/irqflag-debug.c:9 warn_bogus_irq_restore kernel/locking/irqflag-debug.c:9 [inline]
WARNING: CPU: 0 PID: 8777 at kernel/locking/irqflag-debug.c:9 warn_bogus_irq_restore+0x1d/0x20 kernel/locking/irqflag-debug.c:7
Modules linked in:
CPU: 0 PID: 8777 Comm: syz-executor.1 Not tainted 5.11.0-rc2-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:warn_bogus_irq_restore kernel/locking/irqflag-debug.c:9 [inline]
RIP: 0010:warn_bogus_irq_restore+0x1d/0x20 kernel/locking/irqflag-debug.c:7
Code: 51 00 e9 3f fe ff ff cc cc cc cc cc cc 80 3d e0 b4 ce 0a 00 74 01 c3 48 c7 c7 60 f5 8a 88 c6 05 cf b4 ce 0a 01 e8 17 01 a4 06 <0f> 0b c3 48 c7 c0 a0 46 4d 8e 53 48 89 fb 48 ba 00 00 00 00 00 fc
RSP: 0018:ffffc900017bf9f8 EFLAGS: 00010286
RAX: 0000000000000000 RBX: ffff8881477b2040 RCX: 0000000000000000
RDX: 0000000000000002 RSI: 0000000000000004 RDI: fffff520002f7f31
RBP: 0000000000000246 R08: 0000000000000001 R09: ffff8880b9e2015b
R10: ffffed10173c402b R11: 0000000000000001 R12: 0000000000000003
R13: ffffed1028ef6408 R14: 0000000000000001 R15: ffff8880b9e359c0
FS: 00000000017b8400(0000) GS:ffff8880b9e00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000017c1848 CR3: 0000000010920000 CR4: 00000000001506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
kvm_wait arch/x86/kernel/kvm.c:860 [inline]
kvm_wait+0xc3/0xe0 arch/x86/kernel/kvm.c:837
pv_wait arch/x86/include/asm/paravirt.h:564 [inline]
pv_wait_head_or_lock kernel/locking/qspinlock_paravirt.h:470 [inline]
__pv_queued_spin_lock_slowpath+0x8b8/0xb40 kernel/locking/qspinlock.c:508
pv_queued_spin_lock_slowpath arch/x86/include/asm/paravirt.h:554 [inline]
queued_spin_lock_slowpath arch/x86/include/asm/qspinlock.h:51 [inline]
queued_spin_lock include/asm-generic/qspinlock.h:85 [inline]
do_raw_spin_lock+0x200/0x2b0 kernel/locking/spinlock_debug.c:113
spin_lock include/linux/spinlock.h:354 [inline]
ext4_lock_group fs/ext4/ext4.h:3379 [inline]
__ext4_new_inode+0x2da2/0x44d0 fs/ext4/ialloc.c:1187
ext4_mkdir+0x298/0x910 fs/ext4/namei.c:2793
vfs_mkdir+0x413/0x660 fs/namei.c:3652
do_mkdirat+0x1eb/0x250 fs/namei.c:3675
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x465567
Code: 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 53 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffe00ad69f8 EFLAGS: 00000202 ORIG_RAX: 0000000000000053
RAX: ffffffffffffffda RBX: 00007ffe00ad6a90 RCX: 0000000000465567
RDX: 0000000000000000 RSI: 00000000000001ff RDI: 00007ffe00ad6a90
RBP: 00007ffe00ad6a6c R08: 0000000000000000 R09: 0000000000000006
R10: 00007ffe00ad6794 R11: 0000000000000202 R12: 0000000000000032
R13: 000000000006549a R14: 0000000000000002 R15: 00007ffe00ad6ad0



Which shows a WARN was triggered, and totally unrelated to what you were
bisecting.

Just matching types of warnings (lockdep to lockdep, or WARN_ON to WARN_ON)
would help reduce the number of bogus bisects you are having.

-- Steve