Re: loop subsystem corrupted after mounting multiple btrfs sub-volumes

From: Austin S. Hemmelgarn
Date: Fri Feb 26 2016 - 07:35:23 EST


Added linux-btrfs as this should be documented there as a known issue until it gets fixed (although I have no idea which side is the issue).
On 2016-02-25 14:22, Stanislav Brabec wrote:
While writing a test suite for util-linux[1], I experienced a a strange
behavior of loop device:

When two loop devices refer to the same file, and two btrfs mounts are
called on them, the second mount changes loop device of the first,
already mounted sub-volume. (Note that the current implementation of
util-linux mount -oloop works exactly in this way, and it allocates new
loop device for each mount command, so this bug can be easily
reproduced without losetup, just using "mount -oloop" or fstab.)
I'm not 100% certain, but I think this is a interaction between how BTRFS handles multiple mounts of the same filesystem on a given system and how mount handles loop mounts. AFAIUI, all instances of a given BTRFS filesystem being mounted on a given system are internally identical to bind mounts of a hidden mount of that filesystem. This is what allows both manual mounting of sub-volumes, and multiple mounting of the FS in general.

/proc/self/mountinfo after first btrfs loop mount:

107 59 0:59 /d0/dd0/ddd0/s1/d1/dd1/ddd1/s2 /mnt/1 rw,relatime shared:45 - btrfs /dev/loop0 rw,space_cache,subvolid=257,subvol=/d0/dd0/ddd0/s1/d1/dd1/ddd1/s2

This line changes after second first btrfs loop to:

07 59 0:59 /d0/dd0/ddd0/s1/d1/dd1/ddd1/s2 /mnt/1 rw,relatime shared:45 - btrfs /dev/loop1 rw,space_cache,subvolid=257,subvol=/d0/dd0/ddd0/s1/d1/dd1/ddd1/s2

See the change of /dev/loop0 to /dev/loop1!

It is apparently not only proc file change, but it also causes a
corruption of loop device subsystem, as I observed severe problems
on the affected system later:

- mount(2) returning 0 but doing nothing.

- mount(8) entering an infinite loop while searching for free loop
device.
This seems odd that it would cause such a degree of inconsistency in the kernel itself. My guess though is that mount(8) sees that you're trying to mount a file and unconditionally tries to bind it to a loop device without checking any in-use loop devices to see if it's already bound to them, and then when it calls mount(2), this ends up somehow confusing the BTRFS driver (probably because you've now mounted two filesystems with effectively identical super-blocks, BTRFS already has issues if multiple filesystems have the same UUID, and I have no idea how it might react to filesystems that appear identical but are on separate devices).


Here is a main reproducer:

=====================
#!/bin/sh
# Prepare the environment:
/btrfs.sh
mkdir -p /mnt/1 /mnt/2
losetup /dev/loop0 /btrfs.img
# Verify that nothing is mounted:
cat /proc/self/mountinfo | grep /mnt
mount /dev/loop0 /mnt/1
echo "One file system should be mounted now."
cat /proc/self/mountinfo | grep /mnt
# Create another loop.
losetup /dev/loop1 /btrfs.img
echo "Going to mount second one."
mount -osubvol=/ /dev/loop1 /mnt/2 2>&1
echo "Two file system should be mounted now."
cat /proc/self/mountinfo | grep /mnt
echo "Strange. First mount changed its loop device!"
umount /mnt/2
echo "And now check, whether it remains changed after umount."
cat /proc/self/mountinfo | grep /mnt
umount /mnt/1
losetup -d /dev/loop1
losetup -d /dev/loop0
rmdir /mnt/1 /mnt/2
=====================

And here is its output:

One file system should be mounted now.
107 59 0:59 /d0/dd0/ddd0/s1/d1/dd1/ddd1/s2 /mnt/1 rw,relatime shared:45 - btrfs /dev/loop0 rw,space_cache,subvolid=257,subvol=/d0/dd0/ddd0/s1/d1/dd1/ddd1/s2
Going to mount second one.
Two file system should be mounted now.
107 59 0:59 /d0/dd0/ddd0/s1/d1/dd1/ddd1/s2 /mnt/1 rw,relatime shared:45 - btrfs /dev/loop1 rw,space_cache,subvolid=257,subvol=/d0/dd0/ddd0/s1/d1/dd1/ddd1/s2
108 59 0:59 / /mnt/2 rw,relatime shared:47 - btrfs /dev/loop1 rw,space_cache,subvolid=5,subvol=/
Strange. First mount changed its loop device!
And now check, whether it remains changed after umount.
107 59 0:59 /d0/dd0/ddd0/s1/d1/dd1/ddd1/s2 /mnt/1 rw,relatime shared:45 - btrfs /dev/loop1 rw,space_cache,subvolid=257,subvol=/d0/dd0/ddd0/s1/d1/dd1/ddd1/s2

It was actually reproduced on linux-4.4.1 on openSUSE Tumbleweed.


Test image creator:

===== /btrfs.sh =====
#!/bin/sh
truncate -s 42M /btrfs.img
mkfs.btrfs -f -d single -m single /btrfs.img >/dev/null
mount -o loop /btrfs.img /mnt
pushd . >/dev/null
cd /mnt
mkdir -p d0/dd0/ddd0
cd ./d0/dd0/ddd0
touch file{1..5}
btrfs subvol create s1 >/dev/null
cd ./s1
touch file{1..5}
mkdir bind-point
mkdir -p d1/dd1/ddd1
cd ./d1/dd1/ddd1
btrfs subvol create s2 >/dev/null
DEFAULT_SUBVOLID=$(btrfs inspect rootid s2)
btrfs subvol set-default $DEFAULT_SUBVOLID . >/dev/null
NON_DEFAULT_SUBVOLID=$(btrfs subvol list /mnt |
while read dummy id rest ; do if test $id = $DEFAULT_SUBVOLID ; then
continue ; fi ; echo $id ; done)
cd ../../../..
mkdir -p d2/dd2/ddd2
cd ./d2/dd2/ddd2
btrfs subvol create s3 >/dev/null
mkdir -p s3/bind-mnt
popd >/dev/null
NON_DEFAULT_SUBVOL=d0/dd0/ddd0/d2/dd2/ddd2/s3
umount /mnt
=====================

[1] http://marc.info/?l=util-linux-ng&m=145590643206663&w=2