Re: Linux 4.0.6 breaks my disk/lvm/filesystem setup

From: David Herrmann
Date: Thu Jun 25 2015 - 08:02:11 EST


Hi

On Thu, Jun 25, 2015 at 9:05 AM, Christian Hesse <list@xxxxxxxx> wrote:
> Hello everybody,
>
> I kind of nailed the issue. Adding CC to Greg for kdbus and to Herbert for
> the bad commit. Details are below.
>
> Christian Hesse <list@xxxxxxxx> on Tue, 2015/06/23 10:14:
>> with Linux 4.0.5 everything was perfectly fine, Linux 4.0.6 breaks the setup
>> on one of my systems: Only three of my logical volumes are available,
>> systemd reports:
>>
>> lvm2-pvscan@254:3.service: State 'stop-sigterm' timed out. Killing.
>>
>> Followed by a lot of failed dependencies. The setup looks like this:
>>
>> NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
>> sda 8:0 0 953,9G 0 disk
>> |-sda1 8:1 0 1M 0 part
>> |-sda2 8:2 0 256M 0 part /boot/efi
>> |-sda3 8:3 0 7,8G 0 part
>> | |-vg-iso 254:0 0 4G 0 lvm /srv/iso
>> | |-vg-persist 254:1 0 2G 0 lvm /srv/iso/persist
>> | `-vg-boot 254:2 0 128M 0 lvm /boot
>> |-sda4 8:4 0 913,9G 0 part
>> | `-cvg 254:3 0 913,9G 0 crypt
>> | |-cvg-swap 254:4 0 4G 0 lvm [SWAP]
>> | |-cvg-root 254:5 0 40G 0 lvm /
>> | |-cvg-log 254:6 0 1G 0 lvm /var/log
>> | |-cvg-home 254:7 0 500G 0 lvm /home
>> | |-cvg-vbox_win7 254:8 0 32G 0 lvm
>> | |-cvg-vbox_win8 254:9 0 32G 0 lvm
>> | |-cvg-git 254:10 0 12G 0 lvm /srv/git
>> | `-cvg-chroots 254:11 0 16G 0 lvm /var/lib/archbuild
>> `-sda5 8:5 0 32G 0 part
>>
>> Another system is just fine, the only difference is a logical volume with
>> btrfs (cvg-chroots). Possibly the btrfs fixes are involved?
>
> I am running Linux 4.0.x with kdbus from Greg's char-misc tree kdbus branch
> [0] merged, last commit is b69af624a0 ("kdbus: optimize if statements in
> kdbus_conn_disconnect()"). Everything works fine with Linux 4.0.5 but breaks
> with 4.0.6 on one of my systems.
>
> I bisected the problem and found this to be the bad commit [1]:
>
> From cf8befcc1a5538b035d478424efcc2d50e66928e Mon Sep 17 00:00:00 2001
> From: Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx>
> Date: Sat, 16 May 2015 21:16:28 +0800
> Subject: netlink: Disable insertions/removals during rehash
>
> [ Upstream commit: Not applicable ]
>
> The current rhashtable rehash code is buggy and can't deal with
> parallel insertions/removals without corrupting the hash table.
>
> This patch disables it by partially reverting
> c5adde9468b0714a051eac7f9666f23eb10b61f7 ("netlink: eliminate
> nl_sk_hash_lock").
>
> I can fix my system by booting with kdbus=0 to disable kdbus or by reverting
> this single commit. Looks like anything deadlocks... Any idea?

Greg's kdbus tree does not work on 4.0. How exactly did you do the
back-merge? You need to revert these patches at least to make it work
on 4.0 (in this order):
kdbus: no need to ref current->mm
kdbus: use rcu to access exe file in metadata
kdbus: pool: use __vfs_read()
Furthermore, we don't support kdbus on 4.0. So if this does not happen
on 4.1, I'd recommend staying with 4.1. It'd still be interesting to
see whether the netlink-locking back-port is indeed broken.

Regardless: It is highly unlikely that the netlink commit and kdbus
are in any way related. Either kdbus triggers some uncommon user-space
path, or you have a borked kdbus-merge.

Thanks
David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/