Re: hid-related 5.2-rc1 boot hang

From: Hans de Goede
Date: Tue Jun 04 2019 - 06:54:16 EST


Hi,

On 04-06-19 12:08, Benjamin Tissoires wrote:
On Tue, Jun 4, 2019 at 9:51 AM Benjamin Tissoires
<benjamin.tissoires@xxxxxxxxxx> wrote:

On Mon, Jun 3, 2019 at 4:17 PM Hans de Goede <hdegoede@xxxxxxxxxx> wrote:

Hi,

On 03-06-19 15:55, Benjamin Tissoires wrote:
On Mon, Jun 3, 2019 at 11:51 AM Hans de Goede <hdegoede@xxxxxxxxxx> wrote:

Hi Again,

On 03-06-19 11:11, Hans de Goede wrote:
<snip>

not sure about the rest of logitech issues yet) next week.

The main problem seems to be the request_module patches. Although I also

Can't we use request_module_nowait() instead, and set a reasonable
timeout that we detect only once to check if userspace is compatible:

In pseudo-code:
if (!request_module_checked) {
request_module_nowait(name);
use_request_module = wait_event_timeout(wq,
first_module_loaded, 10 seconds in jiffies);
request_module_checked = true;
} else if (use_request_module) {
request_module(name);
}

Well looking at the just attached dmesg , the modprobe
when triggered by udev from userspace succeeds in about
0.5 seconds, so it seems that the modprobe hangs happens
when called from within the kernel rather then from within
userspace.

What I do not know if is the hang is inside userspace, or
maybe it happens when modprobe calls back into the kernel,
if the hang happens when modprobe calls back into the kernel,
then other modprobes (done from udev) likely will hang too
since I think only 1 modprobe can happen at a time.

I really wish we knew what distinguished working systems
from non working systems :|

I cannot find a common denominator; other then the systems
are not running Fedora. So far we've reports from both Ubuntu 16.04
and Tumbleweed, so software version wise these 2 are wide apart.

I am trying to reproduce the lock locally, and installed an opensuse
Tumbleweed in a VM. When forwarding a Unifying receiver to the VM, I
do not see the lock with either my vanilla compiled kernel and the rpm
found in http://download.opensuse.org/repositories/Kernel:/HEAD/standard/x86_64/

Next step is install Tumbleweed on bare metal, but I do not see how
this could introduce a difference (maybe USB2 vs 3).

Making progress here.

The difference between Ubuntu/Tumbleweed and Fedora: usbhid is shipped
as a module while in Fedora usbhid is included in the kernel.

If I rmmod hid_* and usbhid, then modprobe usbhid, the command hangs
for 3 minutes.
If usbhid is already loaded, inserting a receiver is immediate
regarding the loading of the external modules.

So my assumption is that when the device gets detected at boot, usbhid
gets loaded by the kernel event, which in turns attempts to call
__request_module, but the modprobe can't be fulfilled because it's
already waiting for the initial usbhid modprobe to finish.

Still don't know how to solve that, but I thought I should share.

Hmm, we may be hitting the scenario described in the big comment
around line 3500 of kernel/module.c.

But I'm not sure that is what is happening here.

Maybe you can put a WARN_ON(1) in request_module and look at the
backtrace ? That may help to figure out what is going on; or
alternatively it might help to find some way to detect this and
if it happens skip the request_module...

Regards,

Hans