Re: [6.2][regression] after commit 947a629988f191807d2d22ba63ae18259bb645c5 btrfs volume periodical forced switch to readonly after a lot of disk writes

From: Qu Wenruo
Date: Wed Dec 28 2022 - 18:31:35 EST




On 2022/12/28 22:12, Mikhail Gavrilov wrote:
On Wed, Dec 28, 2022 at 6:08 AM Qu Wenruo <quwenruo.btrfs@xxxxxxx> wrote:

From the very first dmesg with calltrack, it already shows the
submit_one_bio() is called from submit_extent_page(), which means cases
cross stripe boundary, and has no parent_check populated at all.

And since you're using RAID0 on two NVMEs, it matches the symptom, while
most tests done here are using single device (DUP and SINGLE), thus no
stripe boundary cases at all.
(In fact it should still be possible to trigger on SINGLE, but way too
hard to trigger)

With proper root cause found, this version should mostly handle the
regression correctly.

This version should mostly be the formal one I'd later send to the
mailing list.

I can not thank you more for all the testing you have provided, it not
only pinned down the bug, but also proves I'm a total idiot...

I have already written over 1.6Tb of data to disk and there are no
hints of errors.
For me, this is a sign that the problem has been fixed.
Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@xxxxxxxxx>

❯ dmesg | grep -i btrfs
[ 0.000000] Linux version
6.2.0-rc1-1b929c02afd37871d5afb9d498426f83432e71c2-btrfs-fix+
(mikhail@mikhail-laptop) (gcc (GCC) 12.2.1 20221121 (Red Hat
12.2.1-4), GNU ld version 2.39-6.fc38) #7 SMP PREEMPT_DYNAMIC Wed Dec
28 10:00:39 +05 2022
[ 0.000000] Command line:
BOOT_IMAGE=(hd0,gpt1)/@root/boot/vmlinuz-6.2.0-rc1-1b929c02afd37871d5afb9d498426f83432e71c2-btrfs-fix+
root=UUID=40e0b5d2-df54-46e0-b6f4-2f868296271d ro
rootflags=subvol=@root
resume=UUID=db79988f-6b70-4b52-84f5-3e505471c85e log_buf_len=16M
sysrq_always_enabled=1 nmi_watchdog=1
amdgpu.lockup_timeout=-1,-1,-1,-1
[ 0.154567] Kernel command line:
BOOT_IMAGE=(hd0,gpt1)/@root/boot/vmlinuz-6.2.0-rc1-1b929c02afd37871d5afb9d498426f83432e71c2-btrfs-fix+
root=UUID=40e0b5d2-df54-46e0-b6f4-2f868296271d ro
rootflags=subvol=@root
resume=UUID=db79988f-6b70-4b52-84f5-3e505471c85e log_buf_len=16M
sysrq_always_enabled=1 nmi_watchdog=1
amdgpu.lockup_timeout=-1,-1,-1,-1
[ 0.154654] Unknown kernel command line parameters
"BOOT_IMAGE=(hd0,gpt1)/@root/boot/vmlinuz-6.2.0-rc1-1b929c02afd37871d5afb9d498426f83432e71c2-btrfs-fix+",
will be passed to user space.
[ 4.496766] usb usb2: Manufacturer: Linux
6.2.0-rc1-1b929c02afd37871d5afb9d498426f83432e71c2-btrfs-fix+ xhci-hcd
[ 4.498963] usb usb1: Manufacturer: Linux
6.2.0-rc1-1b929c02afd37871d5afb9d498426f83432e71c2-btrfs-fix+ xhci-hcd
[ 4.500665] usb usb3: Manufacturer: Linux
6.2.0-rc1-1b929c02afd37871d5afb9d498426f83432e71c2-btrfs-fix+ xhci-hcd
[ 4.501851] usb usb4: Manufacturer: Linux
6.2.0-rc1-1b929c02afd37871d5afb9d498426f83432e71c2-btrfs-fix+ xhci-hcd
[ 4.735212] Btrfs loaded, crc32c=crc32c-generic, assert=on,
zoned=yes, fsverity=yes
[ 5.223368]
BOOT_IMAGE=(hd0,gpt1)/@root/boot/vmlinuz-6.2.0-rc1-1b929c02afd37871d5afb9d498426f83432e71c2-btrfs-fix+
[ 6.923453] BTRFS: device label fedora_localhost-live devid 2
transid 652981 /dev/nvme1n1p1 scanned by systemd-udevd (448)
[ 6.974412] BTRFS: device label fedora_localhost-live devid 1
transid 652981 /dev/nvme0n1p3 scanned by systemd-udevd (484)
[ 11.113437] CPU: 15 PID: 478 Comm: systemd-udevd Tainted: G
L 6.2.0-rc1-1b929c02afd37871d5afb9d498426f83432e71c2-btrfs-fix+
#7
[ 11.221359] CPU: 15 PID: 478 Comm: systemd-udevd Tainted: G
W L 6.2.0-rc1-1b929c02afd37871d5afb9d498426f83432e71c2-btrfs-fix+
#7
[ 13.731015] BTRFS info (device nvme0n1p3): using crc32c
(crc32c-intel) checksum algorithm
[ 13.731147] BTRFS info (device nvme0n1p3): using free space tree
[ 14.328439] BTRFS info (device nvme0n1p3): enabling ssd optimizations
[ 14.328469] BTRFS info (device nvme0n1p3): auto enabling async discard
[ 16.592713] BTRFS info (device nvme0n1p3: state M): use zstd
compression, level 1
[11691.071176] CPU: 11 PID: 2068 Comm: gnome-shell Tainted: G W
L 6.2.0-rc1-1b929c02afd37871d5afb9d498426f83432e71c2-btrfs-fix+
#7


<OFFTOPIC>
As I mentioned at the first message I also have a computer where the
btrfs partition is located on a slow HDD.
When I update the container (podman pull), the system becomes
unresposible for half an hour, which is how long it takes to update
the container.
I do not expect any super-speed from the HDD, I just would like to do
something else with this computer. Yes, at least watching videos on
youtube. Is there anything that can be done here or is there nothing
that we can do?

Are you using qgroup? If so it may be worthy trying disabling qgroup.

But for newer kernel, qgroup hang should only happen when dropping large snapshot, I don't know if podman pull would cause older snapshots to be deleted...

Thanks,
Qu

[46944.301588] INFO: task btrfs-transacti:1184 blocked for more than
122 seconds.
[46944.301825] Tainted: G W L ------- ---
6.2.0-0.rc1.14.fc38.x86_64+debug #1
[46944.301829] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[46944.301832] task:btrfs-transacti state:D stack:12000 pid:1184
ppid:2 flags:0x00004000
[46944.301840] Call Trace:
[46944.301843] <TASK>
[46944.301851] __schedule+0x50c/0x1780
[46944.301863] ? _raw_spin_unlock_irqrestore+0x30/0x60
[46944.301876] schedule+0x5d/0xe0
[46944.301881] wait_current_trans+0x110/0x170
[46944.301888] ? __pfx_autoremove_wake_function+0x10/0x10
[46944.301895] start_transaction+0x36c/0x680
[46944.301904] transaction_kthread+0xb6/0x1b0
[46944.301912] ? __pfx_transaction_kthread+0x10/0x10
[46944.301916] kthread+0xf5/0x120
[46944.301920] ? __pfx_kthread+0x10/0x10
[46944.301926] ret_from_fork+0x2c/0x50
[46944.301941] </TASK>


I attached a full kernel log from this machine.
I can start a separate thread if it makes sense.
Sorry for oftop.
</OFFTOPIC>

--
Best Regards,
Mike Gavrilov.