Re: [REGRESSION] 4.11-rc: systemd doesn't see most devices

From: Greg KH
Date: Tue Apr 11 2017 - 11:38:55 EST


On Tue, Apr 11, 2017 at 11:00:40AM -0400, Theodore Ts'o wrote:
> There is a frustrating regression in 4.11 that I've been trying to
> track down. The symptoms are that a large number of systemd devices
> don't show up. So instead of "systemctl | grep .device | wc -l"
> listing some 50+ lines which look like this:
>
> sys-devices-pci0000:00-0000:00:14.0-usb1-1\x2d7-1\x2d7:1.0-bluetooth-hci0.device loaded active plugged /sys/devices/pci0000:00/0000:00:14.0/usb1/1-7/1-7:1.0/bluetooth/hci0
>
> I only get 5-10 lines of devices. This is problematic because it
> means that the wifi firmware is not automatically loaded. More
> annoyingly, because the device mapper systemd devices are missing:
>
> sys-devices-virtual-block-dm\x2d0.device loaded active plugged /sys/devices/virtual/block/dm-0
>
> ... the boot hangs for 90 seconds because it can't fsck devices that
> systemd doesn't think exists yet. (I'm using LVM on top of an
> encrypted block device, and it doesn't think the dm-crypt device is
> created, although given that the root file system is an LVM volume,
> obviously LVM and the LUKS setup had worked just fine --- and people
> wonder why some folks hate systemd. :-)
>
> The failure past v4.10-5879-gcaa59428971d starts starts becoming
> flaky, so sometimes I have to reboot three times or more before the
> failure shows up. This is why the bisect has been taking so long, and
> so while I'm *faily* certain that the failure is somewhere in the
> staging branch merge, it's possible that one of the earlier "git
> bisect good"'s are in error. I have been trying multiple reboots
> before concluding that a bisection point is "good" but this takes a
> huge amount of time, since having GRUB unlock an encrypted LVM volume
> takes a long time, and I have to type the decryption password twice at
> each boot.
>
> The end of the bisection doesn't make any sense, and so at this point
> I've given up, and am posting this to LKML with Linus and Greg cc'ed,
> in the hopes that someone else has seen this, or understands what sort
> of failure would cause systemd to not think various devices are
> present and/or finished initializing. I'm using a Debian testing
> distribution, and it would be really good to figure out what the ?!@#
> is going on, since if 4.11 releases with this, I suspect a lot of
> people will be affected. Unfortunately, while it's not particularly
> reliable deep into the bisection, at -rc3 or -rc5 it's **damned**
> reproducible. I know how to work around the systemd brain damage for
> now by using rc.local, futzing with the dependencies, manually loading
> the Wifi module by hand, and (sometimes) living without Audio, but
> this is requires a lot of hacking, and it's not, shall we say, a
> particularly nice user experience. :-(
>
> - Ted
>
> P.S. I've also attached the output of "systemd | grep devices" so you
> can see what happens in a good and bad case, in case that helps.

I haven't seen this at all, nor heard of it. As systemctl only gets
what udev reports to it, have you tried using 'udevadm' to monitor your
devices when you plug them in, to ensure it is really seeing them?

Any changes in dmesg from a working and non-working kernel?
The staging tree failure seems really odd, as I doubt you are even
running any of the staging drivers, and they are all self-contained, so
that seems really strange. Do you have any enabled in your build?
Android code perhaps accidentally?

Mess around with udevadm and see if that provides any clues.

thanks,

greg k-h