Re: [PATCH] btrfs: fix rw device counting in __btrfs_free_extra_devids

From: Anand Jain
Date: Sun Jul 25 2021 - 09:49:47 EST




On 22/07/2021 01:59, David Sterba wrote:
On Thu, Jul 15, 2021 at 06:34:03PM +0800, Desmond Cheong Zhi Xi wrote:
Syzbot reports a warning in close_fs_devices that happens because
fs_devices->rw_devices is not 0 after calling btrfs_close_one_device
on each device.

This happens when a writeable device is removed in
__btrfs_free_extra_devids, but the rw device count is not decremented
accordingly. So when close_fs_devices is called, the removed device is
still counted and we get an off by 1 error.

Here is one call trace that was observed:
btrfs_mount_root():
btrfs_scan_one_device():
device_list_add(); <---------------- device added
btrfs_open_devices():
open_fs_devices():
btrfs_open_one_device(); <-------- rw device count ++
btrfs_fill_super():
open_ctree():
btrfs_free_extra_devids():
__btrfs_free_extra_devids(); <--- device removed
fail_tree_roots:
btrfs_close_devices():
close_fs_devices(); <------- rw device count off by 1

Fixes: cf89af146b7e ("btrfs: dev-replace: fail mount if we don't have replace item with target device")

What this patch did in the last hunk was the rw_devices decrement, but
conditional:

@@ -1080,9 +1071,6 @@ static void __btrfs_free_extra_devids(struct btrfs_fs_devices *fs_devices,
if (test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state)) {
list_del_init(&device->dev_alloc_list);
clear_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state);



- if (!test_bit(BTRFS_DEV_STATE_REPLACE_TGT,
- &device->dev_state))

This condition was wrong.
The 1st roll of this patch which is here [1], has the details of why. As shown below -

[1]
https://patchwork.kernel.org/project/linux-btrfs/patch/b3a0a629df98bd044a1fd5c4964f381ff6e7aa05.1600777827.git.anand.jain@xxxxxxxxxx/#23640775

----
rw_devices is incremented in btrfs_open_one_device() for all write-able
devices except for devid == BTRFS_DEV_REPLACE_DEVID.
But while we clean up the extra devices in __btrfs_free_extra_devids()
we used the BTRFS_DEV_STATE_REPLACE_TGT flag isn't set because
there isn't the replace-item. So rw_devices went below zero.
----


- fs_devices->rw_devices--;
}
list_del_init(&device->dev_list);
fs_devices->num_devices--;
---


@@ -1078,6 +1078,7 @@ static void __btrfs_free_extra_devids(struct btrfs_fs_devices *fs_devices,
if (test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state)) {
list_del_init(&device->dev_alloc_list);
clear_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state);
+ fs_devices->rw_devices--;
}
list_del_init(&device->dev_list);
fs_devices->num_devices--;

So should it be reinstated in the original form?

No. The reason is the same as above.
Only the rw_devices decrement line has to be restored.

The rest of
cf89af146b7e handles unexpected device replace item during mount.

Adding the decrement is correct, but right now I'm not sure about the
corner case when teh devcie has the BTRFS_DEV_STATE_REPLACE_TGT bit set.

BTRFS_DEV_STATE_REPLACE_TGT is set (on BTRFS_DEV_REPLACE_DEVID) for two reasons when we call replace through ioctl or during mount upon finding a replace-device item.

The state machine of the device bits and counters is not trivial so
fixing it one way or the other could lead to further syzbot reports if
we don't understand the issue.

I agree. Also, a good idea to convert this sysbot test into an xfstests case.

Thanks, Anand