Re: [PATCH] btrfs: send: fix send failure of a subcase of orphan inodes

From: David Sterba
Date: Tue Oct 18 2022 - 09:34:59 EST


On Sun, Oct 16, 2022 at 11:33:46PM +0800, bingjingc wrote:
> From: BingJing Chang <bingjingc@xxxxxxxxxxxx>
>
> Commit 9ed0a72e5b35 ("btrfs: send: fix failures when processing inodes with
> no links") tries to fix all incremental send cases of orphan inodes the
> send operation will meet. However, there's still a bug causing the corner
> subcase fails with a ENOENT error.
>
> Here's shortened steps of that subcase:
>
> $ btrfs subvolume create vol
> $ touch vol/foo
>
> $ btrfs subvolume snapshot -r vol snap1
> $ btrfs subvolume snapshot -r vol snap2
>
> # Turn the second snapshot to RW mode and delete the file while
> # holding an open file descriptor on it
> $ btrfs property set snap2 ro false
> $ exec 73<snap2/foo
> $ rm snap2/foo
>
> # Set the second snapshot back to RO mode and do an incremental send
> # with an unusal reverse order
> $ btrfs property set snap2 ro true
> $ btrfs send -p snap2 snap1 > /dev/null
> At subvol snap1
> ERROR: send ioctl failed with -2: No such file or directory
>
> It's subcase 3 of BTRFS_COMPARE_TREE_CHANGED in the commit 9ed0a72e5b35
> ("btrfs: send: fix failures when processing inodes with no links"). And
> it's not a common case. We still have not met it in the real world.
> Theoretically, this case can happen in a batch cascading snapshot backup.
> In cascading backups, the receive operation in the middle may cause orphan
> inodes to appear because of the open file descriptors on the snapshot files
> during receiving. And if we don't do the batch snapshot backups in their
> creation order, then we can have an inode, which is an orphan in the parent
> snapshot but refers to a file in the send snapshot. Since an orphan inode
> has no paths, the send operation will fail with a ENOENT error if it
> tries to generate a path for it.
>
> In that patch, this subcase will be treated as an inode with a new
> generation. However, when the routine tries to delete the old paths in
> the parent snapshot, the function process_all_refs() doesn't check whether
> there are paths recorded or not before it calls the function
> process_recorded_refs(). And the function process_recorded_refs() try
> to get the first path in the parent snapshot in the beginning. Since it has
> no paths in the parent snapshot, the send operation fails.
>
> To fix this, we can easily put a link count check to avoid entering the
> deletion routine like what we do a link count check to avoid creating a
> new one. Moreover, we can assume that the function process_all_refs()
> can always collect references to process because we know it has a
> positive link count.
>
> Signed-off-by: BingJing Chang <bingjingc@xxxxxxxxxxxx>

Added to misc-next, thanks.