Re: Linux 4.0.6 breaks my disk/lvm/filesystem setup

From: Christian Hesse
Date: Thu Jun 25 2015 - 03:05:16 EST


Hello everybody,

I kind of nailed the issue. Adding CC to Greg for kdbus and to Herbert for
the bad commit. Details are below.

Christian Hesse <list@xxxxxxxx> on Tue, 2015/06/23 10:14:
> with Linux 4.0.5 everything was perfectly fine, Linux 4.0.6 breaks the setup
> on one of my systems: Only three of my logical volumes are available,
> systemd reports:
>
> lvm2-pvscan@254:3.service: State 'stop-sigterm' timed out. Killing.
>
> Followed by a lot of failed dependencies. The setup looks like this:
>
> NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
> sda 8:0 0 953,9G 0 disk
> |-sda1 8:1 0 1M 0 part
> |-sda2 8:2 0 256M 0 part /boot/efi
> |-sda3 8:3 0 7,8G 0 part
> | |-vg-iso 254:0 0 4G 0 lvm /srv/iso
> | |-vg-persist 254:1 0 2G 0 lvm /srv/iso/persist
> | `-vg-boot 254:2 0 128M 0 lvm /boot
> |-sda4 8:4 0 913,9G 0 part
> | `-cvg 254:3 0 913,9G 0 crypt
> | |-cvg-swap 254:4 0 4G 0 lvm [SWAP]
> | |-cvg-root 254:5 0 40G 0 lvm /
> | |-cvg-log 254:6 0 1G 0 lvm /var/log
> | |-cvg-home 254:7 0 500G 0 lvm /home
> | |-cvg-vbox_win7 254:8 0 32G 0 lvm
> | |-cvg-vbox_win8 254:9 0 32G 0 lvm
> | |-cvg-git 254:10 0 12G 0 lvm /srv/git
> | `-cvg-chroots 254:11 0 16G 0 lvm /var/lib/archbuild
> `-sda5 8:5 0 32G 0 part
>
> Another system is just fine, the only difference is a logical volume with
> btrfs (cvg-chroots). Possibly the btrfs fixes are involved?

I am running Linux 4.0.x with kdbus from Greg's char-misc tree kdbus branch
[0] merged, last commit is b69af624a0 ("kdbus: optimize if statements in
kdbus_conn_disconnect()"). Everything works fine with Linux 4.0.5 but breaks
with 4.0.6 on one of my systems.

I bisected the problem and found this to be the bad commit [1]:

From cf8befcc1a5538b035d478424efcc2d50e66928e Mon Sep 17 00:00:00 2001
From: Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx>
Date: Sat, 16 May 2015 21:16:28 +0800
Subject: netlink: Disable insertions/removals during rehash

[ Upstream commit: Not applicable ]

The current rhashtable rehash code is buggy and can't deal with
parallel insertions/removals without corrupting the hash table.

This patch disables it by partially reverting
c5adde9468b0714a051eac7f9666f23eb10b61f7 ("netlink: eliminate
nl_sk_hash_lock").

I can fix my system by booting with kdbus=0 to disable kdbus or by reverting
this single commit. Looks like anything deadlocks... Any idea?

[0] https://git.kernel.org/cgit/linux/kernel/git/gregkh/char-misc.git/?h=kdbus
[1]
https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?id=cf8befcc
--
main(a){char*c=/* Schoene Gruesse */"B?IJj;MEH"
"CX:;",b;for(a/* Chris get my mail address: */=0;b=c[a++];)
putchar(b-1/(/* gcc -o sig sig.c && ./sig */b/42*2-3)*42);}

Attachment: pgpv_Vy8lWxpO.pgp
Description: OpenPGP digital signature