Re: deadlock in debugfs synchronize_srcu() when unplugging USB

From: Greg Kroah-Hartman
Date: Mon Oct 16 2017 - 03:51:00 EST


On Thu, Oct 12, 2017 at 04:01:48PM -0400, Tyler Hall wrote:
> Hi,
>
> I have a reproducible scenario wherein removing a USB device while
> reading /sys/kernel/debug/usb/devices causes a deadlock. This should
> not be specific to any USB device. Any USB device removal that causes
> a call to debugfs_remove() has inverted lock ordering with respect to
> the read() of debug/usb/devices.
>
> e.g.
> read thread: srcu_read_lock(&debugfs_srcu);
> -- usb unplug --
> remove thread: mutex_lock(&usb_bus_idr_lock);
> remove thread: synchronize_srcu(&debugfs_srcu); <- blocked
> read thread: mutex_lock(&usb_bus_idr_lock); <- blocked
> read thread: srcu_read_unlock(&debugfs_srcu, ...);
>
> This seems to be another flavor of what Johannes Berg reported:
> deadlock in synchronize_srcu() in debugfs?
> https://lkml.org/lkml/2017/3/23/415
>
> I applied this patch set from Nicolai Stange and can no longer
> reproduce the hang.
> [RFC PATCH v2 0/9] debugfs: per-file removal protection
> https://lkml.org/lkml/2017/5/3/292
>
> As patch 2/9 in the series indicates, commit 49d200deaa68 ("debugfs:
> prevent access to removed files' private data") is where this was
> first introduced, and it is reproducible on v4.14-rc4.
>
> How should we move forward with the resolution of this debugfs change?
> It seems to me that the USB locking is reasonable but the debugfs
> global srcu is overly restrictive. This could lead to unexpected lock
> inversion any time a driver shares a mutex between its debugfs read
> and removal paths.

As Paul stated, fixing up the patches and sending them in is the best
solution, can you help out with that?

thanks,

greg k-h