Re: [6.2][regression] after commit 947a629988f191807d2d22ba63ae18259bb645c5 btrfs volume periodical forced switch to readonly after a lot of disk writes

From: Qu Wenruo
Date: Mon Dec 26 2022 - 07:12:02 EST




On 2022/12/26 19:14, Mikhail Gavrilov wrote:
On Mon, Dec 26, 2022 at 1:47 PM Qu Wenruo <quwenruo.btrfs@xxxxxxx> wrote:


With the check output, it's indeed a runtime error.
(At least no corruption to your fs)

And it can be some call paths not properly initializing the level to check.

Here is the new debug patch.
It should be applied without any previous debug patch.


yeah, gotcha trace

[ 2561.764141] BTRFS error (device nvme1n1p3): level verify failed on
logical 45170688 mirror 1 wanted 0 found 1
[ 2561.764220] BTRFS error (device nvme1n1p3): read time tree block
corruption detected on logical 45170688 mirror 1
[ 2561.764224] BTRFS error (device nvme1n1p3): check owner_root=0
transid=0 first_key=(0 0 0) has_first_key=0 level=0

Damn it, some check structure is completely uninitialized.

[ 2561.764242] BTRFS error (device nvme1n1p3): read failed, check
owner_root=259 transid=650936 has_first_key=1 first_key=(2221294 1 0)
level=1

While at the caller, the structure is properly passed in.

So there is something wrong between the endio function and the check.

I have created the v2 version patch to debug, please apply without any previous debug patch.

Meanwhile this really looks like a race, thus I'm not 100% sure if my debug patch would reduce the possibility to reproduce.

Your bisect should be the determining evidence, for the worst case we can revert the offending patch.

Thank you very much for all of the testing, it really helps a lot.
Qu

[ 2561.764342] CPU: 8 PID: 727 Comm: systemd-udevd Tainted: G W
L 6.2.0-rc1-1b929c02afd37871d5afb9d498426f83432e71c2-btrfs-debug2+
#3
[ 2561.764348] Hardware name: ASUSTeK COMPUTER INC. ROG Strix
G513QY_G513QY/G513QY, BIOS G513QY.320 09/07/2022
[ 2561.764352] Call Trace:
[ 2561.764355] <TASK>
[ 2561.764360] dump_stack_lvl+0x5b/0x77
[ 2561.764369] read_extent_buffer_pages.cold+0x5a/0x9d
[ 2561.764382] ? __pfx_end_bio_extent_readpage+0x10/0x10
[ 2561.764390] ? folio_unlock+0x1d/0x50
[ 2561.764400] btrfs_read_extent_buffer+0x98/0x150
[ 2561.764410] read_tree_block+0x43/0xa0
[ 2561.764417] read_block_for_search+0x218/0x320
[ 2561.764432] btrfs_search_slot+0x351/0xd30
[ 2561.764450] btrfs_lookup_xattr+0x7b/0xe0
[ 2561.764460] btrfs_getxattr+0x55/0x110
[ 2561.764470] __vfs_getxattr+0x7e/0xb0
[ 2561.764480] get_vfs_caps_from_disk+0x7a/0x220
[ 2561.764492] audit_copy_inode+0x95/0xd0
[ 2561.764501] filename_lookup+0x16b/0x1c0
[ 2561.764525] ? _raw_spin_unlock+0x29/0x50
[ 2561.764534] vfs_statx+0x8e/0x150
[ 2561.764545] vfs_fstatat+0x51/0x70
[ 2561.764553] __do_sys_newfstatat+0x2e/0x50
[ 2561.764571] do_syscall_64+0x5b/0x80
[ 2561.764576] ? do_syscall_64+0x67/0x80
[ 2561.764585] ? up_read+0x17/0x20
[ 2561.764593] ? lock_is_held_type+0xe8/0x140
[ 2561.764601] ? asm_exc_page_fault+0x22/0x30
[ 2561.764607] ? lockdep_hardirqs_on+0x7d/0x100
[ 2561.764613] entry_SYSCALL_64_after_hwframe+0x72/0xdc
[ 2561.764618] RIP: 0033:0x7f7b1b91e05e
[ 2561.764639] Code: 48 89 f2 b9 00 01 00 00 48 89 fe bf 9c ff ff ff
e9 07 00 00 00 0f 1f 80 00 00 00 00 f3 0f 1e fa 41 89 ca b8 06 01 00
00 0f 05 <3d> 00 f0 ff ff 77 0b 31 c0 c3 0f 1f 84 00 00 00 00 00 48 8b
15 71
[ 2561.764643] RSP: 002b:00007fffdc4062c8 EFLAGS: 00000246 ORIG_RAX:
0000000000000106
[ 2561.764649] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f7b1b91e05e
[ 2561.764653] RDX: 00007fffdc406320 RSI: 000055efe7d04bc0 RDI: 00000000ffffff9c
[ 2561.764656] RBP: 000055efe7e79668 R08: 00000000000000f0 R09: 000055efe7ea85b0
[ 2561.764659] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fffdc4064b0
[ 2561.764663] R13: 00007fffdc406320 R14: 00007fffdc4064a8 R15: 00007fffdc406308
[ 2561.764680] </TASK>
[ 2561.764684] BTRFS error (device nvme1n1p3): read failed, check
owner_root=259 transid=650936 has_first_key=1 first_key=(2221294 1 0)
level=1

Of course full logs are also attached.
Thanks.