Re: [syzbot] [xfs?] WARNING: Reset corrupted AGFL on AG NUM. NUM blocks leaked. Please unmount and run xfs_repair.

From: Eric Biggers
Date: Wed Jun 21 2023 - 03:54:28 EST


Hi Dave,

On Wed, Jun 21, 2023 at 05:07:15PM +1000, 'Dave Chinner' via syzkaller-bugs wrote:
> On Tue, Jun 20, 2023 at 07:10:19PM -0700, syzbot wrote:
> > Hello,
> >
> > syzbot found the following issue on:
> >
> > HEAD commit: 40f71e7cd3c6 Merge tag 'net-6.4-rc7' of git://git.kernel.o..
> > git tree: upstream
> > console+strace: https://syzkaller.appspot.com/x/log.txt?x=158b99d3280000
> > kernel config: https://syzkaller.appspot.com/x/.config?x=7ff8f87c7ab0e04e
> > dashboard link: https://syzkaller.appspot.com/bug?extid=9d0b0d54a8bd799f6ae4
> > compiler: Debian clang version 15.0.7, GNU ld (GNU Binutils for Debian) 2.35.2
> > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=16ab4537280000
> > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=148326ef280000
> >
> > Downloadable assets:
> > disk image: https://storage.googleapis.com/syzbot-assets/2dc89d5fee38/disk-40f71e7c.raw.xz
> > vmlinux: https://storage.googleapis.com/syzbot-assets/0ced5a475218/vmlinux-40f71e7c.xz
> > kernel image: https://storage.googleapis.com/syzbot-assets/d543a4f69684/bzImage-40f71e7c.xz
> > mounted in repro: https://storage.googleapis.com/syzbot-assets/e2012b787a31/mount_0.gz
> >
> > The issue was bisected to:
> >
> > commit e0a8de7da35e5b22b44fa1013ccc0716e17b0c14
> > Author: Dave Chinner <dchinner@xxxxxxxxxx>
> > Date: Mon Jun 5 04:48:15 2023 +0000
> >
> > xfs: fix agf/agfl verification on v4 filesystems
> >
> > bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=10bb665b280000
> > final oops: https://syzkaller.appspot.com/x/report.txt?x=12bb665b280000
> > console output: https://syzkaller.appspot.com/x/log.txt?x=14bb665b280000
>
> WTAF?
>
> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: syzbot+9d0b0d54a8bd799f6ae4@xxxxxxxxxxxxxxxxxxxxxxxxx
> > Fixes: e0a8de7da35e ("xfs: fix agf/agfl verification on v4 filesystems")
> >
> > XFS (loop0): WARNING: Reset corrupted AGFL on AG 0. 4 blocks leaked. Please unmount and run xfs_repair.
> > XFS (loop0): Internal error !ino_ok at line 213 of file fs/xfs/libxfs/xfs_dir2.c. Caller xfs_dir_ino_validate+0x2c/0x90 fs/xfs/libxfs/xfs_dir2.c:220
> > CPU: 1 PID: 46 Comm: kworker/u4:3 Not tainted 6.4.0-rc6-syzkaller-00195-g40f71e7cd3c6 #0
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/27/2023
> > Workqueue: xfs_iwalk-4998 xfs_pwork_work
> > Call Trace:
> > <TASK>
> > __dump_stack lib/dump_stack.c:88 [inline]
> > dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
> > xfs_error_report fs/xfs/xfs_error.c:384 [inline]
> > xfs_corruption_error+0x11d/0x170 fs/xfs/xfs_error.c:401
> > xfs_dir_ino_validate+0x5f/0x90 fs/xfs/libxfs/xfs_dir2.c:213
> > xfs_dir2_sf_verify+0x487/0x990 fs/xfs/libxfs/xfs_dir2_sf.c:779
> > xfs_ifork_verify_local_data fs/xfs/libxfs/xfs_inode_fork.c:706 [inline]
> > xfs_iformat_data_fork+0x4bf/0x6d0 fs/xfs/libxfs/xfs_inode_fork.c:256
> > xfs_inode_from_disk+0xbbf/0x1070 fs/xfs/libxfs/xfs_inode_buf.c:245
> > xfs_iget_cache_miss fs/xfs/xfs_icache.c:639 [inline]
> > xfs_iget+0xf08/0x3050 fs/xfs/xfs_icache.c:777
> > xfs_qm_dqusage_adjust+0x228/0x670 fs/xfs/xfs_qm.c:1157
> > xfs_iwalk_ag_recs+0x486/0x7c0 fs/xfs/xfs_iwalk.c:220
> > xfs_iwalk_run_callbacks+0x25b/0x490 fs/xfs/xfs_iwalk.c:376
> > xfs_iwalk_ag+0xad6/0xbd0 fs/xfs/xfs_iwalk.c:482
> > xfs_iwalk_ag_work+0xfb/0x1b0 fs/xfs/xfs_iwalk.c:624
> > xfs_pwork_work+0x7c/0x190 fs/xfs/xfs_pwork.c:47
> > process_one_work+0x8a0/0x10e0 kernel/workqueue.c:2405
> > worker_thread+0xa63/0x1210 kernel/workqueue.c:2552
> > kthread+0x2b8/0x350 kernel/kthread.c:379
> > ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
> > </TASK>
> > XFS (loop0): Corruption detected. Unmount and run xfs_repair
> > XFS (loop0): Invalid inode number 0x24
> > XFS (loop0): Metadata corruption detected at xfs_dir2_sf_verify+0x767/0x990 fs/xfs/libxfs/xfs_dir2_sf.c:774, inode 0x23 data fork
> > XFS (loop0): Unmount and run xfs_repair
> > XFS (loop0): First 32 bytes of corrupted metadata buffer:
> > 00000000: 02 00 00 00 00 20 05 00 30 66 69 6c 65 30 01 00 ..... ..0file0..
>
> syzbot corrupted a v4 filesystem.
>
> Syzbot corrupted the superblock, XFS detected and corrected that.
>
> Syzbot corrupted the AGI. XFS detected that.
>
> Syzbot corrupted the AGF and AGFL. XFS detected and corrected that,
> allowing operations to continue.
>
> Syzbot also corrupted a directory inode. XFS detected that and
> warned about it.
>
> Test finished.
>
> At no point did the kernel crash, oops, do anything bad like a UAF
> or OOB read. All XFS did was catch the corruptions, fix some of them
> so it could continue operating, and warn the user that they need to
> unmount and run repair.
>
> So exactly what is syzbot complaining about here? There's no kernel
> issue here at all.
>
> Also, I cannot tell syzbot "don't ever report this as a bug again",
> so the syzbot developers are going to have to triage and fix this
> syzbot problem themselves so it doesn't keep getting reported to
> us...

I think the problem here was that XFS logged a message beginning with
"WARNING:", followed by a stack trace. In the log that looks like a warning
generated by the WARN_ON() macro, which is meant for reporting recoverable
kernel bugs. It's difficult for any program to understand the log in cases like
this. This is why include/asm-generic/bug.h contains the following comment:

* Do not include "BUG"/"WARNING" in format strings manually to make these
* conditions distinguishable from kernel issues.

If you have a constructive suggestion of how all programs that parse the kernel
log can identify real warnings reliably without getting confused by cases like
this, I'm sure that would be appreciated. It would need to be documented and
then the guidance in bug.h could then be removed. But until then, the above is
the current guidance.

- Eric