Re: [6.2][regression] after commit 947a629988f191807d2d22ba63ae18259bb645c5 btrfs volume periodical forced switch to readonly after a lot of disk writes

From: Qu Wenruo
Date: Sun Dec 25 2022 - 19:35:14 EST




On 2022/12/26 05:32, Mikhail Gavrilov wrote:
Hi,
It is curious but it happens only on machine which have BTRFS volume
combined from two high speed nvme (pcie 4) SSD in RAID 0. On machines
with BTRFS volume from one HDD the bug does not appear.

To bisect the problematic commit, I had to sweat a lot. At each step,
I downloaded the 150 GB game "Assassin's Creed Valhalla" 4 times and
deleted it. For make sure that the commit previous to
947a629988f191807d2d22ba63ae18259bb645c5 is definitely not affected by
the bug, I downloaded this game 10 times, which should have provided
more than 1.5 Tb of data writing to the btrfs volume.

Here is result of my bisection:
947a629988f191807d2d22ba63ae18259bb645c5 is the first bad commit
commit 947a629988f191807d2d22ba63ae18259bb645c5
Author: Qu Wenruo <wqu@xxxxxxxx>
Date: Wed Sep 14 13:32:51 2022 +0800

btrfs: move tree block parentness check into validate_extent_buffer()

[...]
Signed-off-by: Qu Wenruo <wqu@xxxxxxxx>
Signed-off-by: David Sterba <dsterba@xxxxxxxx>

fs/btrfs/disk-io.c | 73 ++++++++++++++++++++++++++++++++++++++--------------
fs/btrfs/extent_io.c | 18 ++++++++++---
fs/btrfs/extent_io.h | 5 ++--
fs/btrfs/volumes.h | 25 +++++++++++++++---
4 files changed, 93 insertions(+), 28 deletions(-)

Before going to readonly, the preceding line in kernel log display a message:
[ 1908.029663] BTRFS: error (device nvme0n1p3: state A) in
btrfs_run_delayed_refs:2147: errno=-5 IO failure

I also attached a full kernel log.

Thanks a lot for the full kernel log.

It indeed shows something is wrong in the run_one_delayed_ref().
But surprisingly, if there is something wrong, I'd expect more output from btrfs, as normally if one tree block failed to pass whatever the checks, it should cause an error message at least.

Since you can reproduce the bug (although I don't think this is easy to reproduce), mind to apply the extra debug patch and then try to reproduce?

(Part of the patch would go upstreamed soon)

Another thing is, mind to run "btrfs check --readonly" on the fs?
I don't believe that's the case, but just in case.

Thanks,
QuFrom 41cf78e56ed2a3391499469855b906d74d03eeca Mon Sep 17 00:00:00 2001
Message-Id: <41cf78e56ed2a3391499469855b906d74d03eeca.1672013469.git.wqu@xxxxxxxx>
From: Qu Wenruo <wqu@xxxxxxxx>
Date: Mon, 26 Dec 2022 08:10:27 +0800
Subject: [PATCH] btrfs: add extra debug for run_one_delayed_ref()

Signed-off-by: Qu Wenruo <wqu@xxxxxxxx>
---
fs/btrfs/disk-io.c | 4 ++++
fs/btrfs/extent-tree.c | 22 ++++++++++++++++++----
2 files changed, 22 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 8aa7d79a341a..c67c688d568e 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -530,6 +530,10 @@ static int validate_extent_buffer(struct extent_buffer *eb,
}

if (found_level != check->level) {
+ btrfs_err_rl(eb->fs_info,
+ "level check failed on logical %llu mirror %u wanted %u found %u",
+ eb->start, eb->read_mirror, check->level,
+ found_level);
ret = -EIO;
goto out;
}
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index eaa1fb2850d7..fe1f31319d58 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -1462,8 +1462,13 @@ static int __btrfs_inc_extent_ref(struct btrfs_trans_handle *trans,
ret = insert_inline_extent_backref(trans, path, bytenr, num_bytes,
parent, root_objectid, owner,
offset, refs_to_add, extent_op);
- if ((ret < 0 && ret != -EAGAIN) || !ret)
+ if ((ret < 0 && ret != -EAGAIN) || !ret) {
+ if (ret < 0 && ret != -EAGAIN)
+ btrfs_err(trans->fs_info,
+ "failed to insert inline extent backref: %d",
+ ret);
goto out;
+ }

/*
* Ok we had -EAGAIN which means we didn't have space to insert and
@@ -1954,8 +1959,8 @@ static int btrfs_run_delayed_refs_for_head(struct btrfs_trans_handle *trans,
if (ret) {
unselect_delayed_ref_head(delayed_refs, locked_ref);
btrfs_put_delayed_ref(ref);
- btrfs_debug(fs_info, "run_one_delayed_ref returned %d",
- ret);
+ btrfs_err(fs_info, "run_one_delayed_ref returned %d",
+ ret);
return ret;
}

@@ -4642,8 +4647,12 @@ static int alloc_reserved_extent(struct btrfs_trans_handle *trans, u64 bytenr,
int ret;

ret = remove_from_free_space_tree(trans, bytenr, num_bytes);
- if (ret)
+ if (ret) {
+ btrfs_err(fs_info,
+ "failed to remove bytenr %llu num_bytes %llu from free space tree: %d",
+ bytenr, num_bytes, ret);
return ret;
+ }

ret = btrfs_update_block_group(trans, bytenr, num_bytes, true);
if (ret) {
@@ -4686,6 +4695,9 @@ static int alloc_reserved_file_extent(struct btrfs_trans_handle *trans,
extent_root = btrfs_extent_root(fs_info, ins->objectid);
ret = btrfs_insert_empty_item(trans, extent_root, path, ins, size);
if (ret) {
+ btrfs_err(fs_info,
+ "failed to insert empty item for extent root: %d",
+ ret);
btrfs_free_path(path);
return ret;
}
@@ -4759,6 +4771,8 @@ static int alloc_reserved_tree_block(struct btrfs_trans_handle *trans,
size);
if (ret) {
btrfs_free_path(path);
+ btrfs_err(fs_info, "failed to insert empty item into extent root: %d",
+ ret);
return ret;
}

--
2.39.0