Re: [syzbot] kernel BUG in assertfail

From: Dmitry Vyukov
Date: Mon May 31 2021 - 05:09:33 EST


On Mon, May 31, 2021 at 10:57 AM Nikolay Borisov <nborisov@xxxxxxxx> wrote:
> On 31.05.21 г. 11:55, Dmitry Vyukov wrote:
> > On Mon, May 31, 2021 at 10:44 AM 'Nikolay Borisov' via syzkaller-bugs
> > <syzkaller-bugs@xxxxxxxxxxxxxxxx> wrote:
> >> On 31.05.21 г. 10:53, syzbot wrote:
> >>> Hello,
> >>>
> >>> syzbot found the following issue on:
> >>>
> >>> HEAD commit: 1434a312 Merge branch 'for-5.13-fixes' of git://git.kernel..
> >>> git tree: upstream
> >>> console output: https://syzkaller.appspot.com/x/log.txt?x=162843f3d00000
> >>> kernel config: https://syzkaller.appspot.com/x/.config?x=9f3da44a01882e99
> >>> dashboard link: https://syzkaller.appspot.com/bug?extid=a6bf271c02e4fe66b4e4
> >>>
> >>> Unfortunately, I don't have any reproducer for this issue yet.
> >>>
> >>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> >>> Reported-by: syzbot+a6bf271c02e4fe66b4e4@xxxxxxxxxxxxxxxxxxxxxxxxx
> >>>
> >>> assertion failed: !memcmp(fs_info->fs_devices->fsid, fs_info->super_copy->fsid, BTRFS_FSID_SIZE), in fs/btrfs/disk-io.c:3282
> >>
> >> This means a device contains a btrfs filesystem which has a different
> >> FSID in its superblock than the fsid which all devices part of the same
> >> fs_devices should have. This can happen in 2 ways - memory corruption
> >> where either of the ->fsid member are corrupted or if there was a crash
> >> while a filesystem's fsid was being changed. We need more context about
> >> what the test did?
> >
> > Hi Nikolay,
> >
> > From a semantic point of view we can consider that it just mounts /dev/random.
> > If syzbot comes up with a reproducer it will post it, but you seem to
> > already figure out what happened, so I assume you can write a unit
> > test for this.
> >
>
> Well no, under normal circumstances this shouldn't trigger. So if syzbot
> is doing something stupid as mounting /dev/random then I don't see a
> problem here. The assert is there to catch inconsistencies during normal
> operation which doesn't seem to be the case here.


Does this mean that CONFIG_BTRFS_ASSERT needs to be disabled in any testing?
What is it intended for? Or it can only be enabled when mounting known
good images? But then I assume even btrfs unit tests mount some
invalid images, so it would mean it can't be used even during unit
testing?

Looking at the output of "grep ASSERT fs/btrfs/*.c" it looks like most
of these actually check for something that "must never happen". E.g.
some lists/pointers are empty/non-empty in particular states. And
"must never happen" checks are for testing scenarios...

Taking this particular FSID mismatch assert, should such corrupted
images be mounted for end users? Should users be notified? Currently
they are mounted and users are not notified, what is the purpose of
this assertion?

Perhaps CONFIG_BTRFS_ASSERT needs to be split into "must never happen"
checks that are enabled during testing and normal if's with pr_err for
user notifications?