Re: [syzbot] kernel BUG in assertfail

From: Nikolay Borisov
Date: Mon May 31 2021 - 05:27:14 EST




On 31.05.21 г. 12:09, Dmitry Vyukov wrote:
> On Mon, May 31, 2021 at 10:57 AM Nikolay Borisov <nborisov@xxxxxxxx> wrote:
>> On 31.05.21 г. 11:55, Dmitry Vyukov wrote:
>>> On Mon, May 31, 2021 at 10:44 AM 'Nikolay Borisov' via syzkaller-bugs
>>> <syzkaller-bugs@xxxxxxxxxxxxxxxx> wrote:
>>>> On 31.05.21 г. 10:53, syzbot wrote:
>>>>> Hello,
>>>>>
>>>>> syzbot found the following issue on:
>>>>>
>>>>> HEAD commit: 1434a312 Merge branch 'for-5.13-fixes' of git://git.kernel..
>>>>> git tree: upstream
>>>>> console output: https://syzkaller.appspot.com/x/log.txt?x=162843f3d00000
>>>>> kernel config: https://syzkaller.appspot.com/x/.config?x=9f3da44a01882e99
>>>>> dashboard link: https://syzkaller.appspot.com/bug?extid=a6bf271c02e4fe66b4e4
>>>>>
>>>>> Unfortunately, I don't have any reproducer for this issue yet.
>>>>>
>>>>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
>>>>> Reported-by: syzbot+a6bf271c02e4fe66b4e4@xxxxxxxxxxxxxxxxxxxxxxxxx
>>>>>
>>>>> assertion failed: !memcmp(fs_info->fs_devices->fsid, fs_info->super_copy->fsid, BTRFS_FSID_SIZE), in fs/btrfs/disk-io.c:3282
>>>>
>>>> This means a device contains a btrfs filesystem which has a different
>>>> FSID in its superblock than the fsid which all devices part of the same
>>>> fs_devices should have. This can happen in 2 ways - memory corruption
>>>> where either of the ->fsid member are corrupted or if there was a crash
>>>> while a filesystem's fsid was being changed. We need more context about
>>>> what the test did?
>>>
>>> Hi Nikolay,
>>>
>>> From a semantic point of view we can consider that it just mounts /dev/random.
>>> If syzbot comes up with a reproducer it will post it, but you seem to
>>> already figure out what happened, so I assume you can write a unit
>>> test for this.
>>>
>>
>> Well no, under normal circumstances this shouldn't trigger. So if syzbot
>> is doing something stupid as mounting /dev/random then I don't see a
>> problem here. The assert is there to catch inconsistencies during normal
>> operation which doesn't seem to be the case here.
>
>
> Does this mean that CONFIG_BTRFS_ASSERT needs to be disabled in any testing?
> What is it intended for? Or it can only be enabled when mounting known
> good images? But then I assume even btrfs unit tests mount some
> invalid images, so it would mean it can't be used even during unit
> testing?
>
> Looking at the output of "grep ASSERT fs/btrfs/*.c" it looks like most
> of these actually check for something that "must never happen". E.g.
> some lists/pointers are empty/non-empty in particular states. And
> "must never happen" checks are for testing scenarios...
>
> Taking this particular FSID mismatch assert, should such corrupted
> images be mounted for end users? Should users be notified? Currently
> they are mounted and users are not notified, what is the purpose of
> this assertion?
>
> Perhaps CONFIG_BTRFS_ASSERT needs to be split into "must never happen"
> checks that are enabled during testing and normal if's with pr_err for
> user notifications?
>

After going through the code you've convinced me. I just sent a patch
turning the 2 debugging asserts into full-fledged checks in
validate_super. So now the correct behavior is to prevent mounting of
such images. How can I force syzbot to retest with the given patch applied?