Re: PROBLEM: Long Workqueue delays.

From: Jim Baxter
Date: Tue Aug 18 2020 - 06:55:06 EST


On 17/08/2020 19:47, Alan Stern wrote:
>
> Unplugging a R/W USB drive without unmounting it first is a great way to
> corrupt the data.
>
Thank you, post development we will only mount the USB stick as R/O.

>> Using perf Iidentified the hub_events workqueue was spending a lot of time in
>> invalidate_partition(), I have included a cut down the captured data from perf in
>> [2] which shows the additional functions where the kworker spends most of its time.
>
> invalidate_partition() is part of the block layer, not part of USB. It
> gets called whenever a drive is removed from the system, no matter what
> type of drive it is. You should ask the people involved in that
> subsystem why it takes so long.
>

I included the linux-mm list but missed the filesystem, I will ask the question
to the linux-fsdevel too.

>> I realise that not unmounting the USB stick is not ideal, though I wonder what
>> additional work is done when unplugging the USB stick compared to unmounting it.
>
> Unmounting a drive flushes all the dirty buffers from memory back to the
> drive. Obviously that can't be done if the drive is unplugged first.
>
> As far as the USB subsystem is concerned, exactly the same amount of
> work is done during disconnect regardless of whether or not the drive is
> mounted. (In fact, the USB subsystem doesn't even know whether a drive
> is mounted; that concept is part of the block and filesystem layers.)
>>> I guess it may be waiting for a time-out during the operation without the unmount.
>
> That seems very unlikely. When a USB device gets unplugged the system
> realizes it. Any I/O meant for that device is immediately cancelled;
> there are no timeouts.
>
> (Okay, not strictly true; there is a fraction-of-a-second timeout during
> which the system waits to see whether the disconnect was permanent or
> just a temporary glitch. But you're talking about 6-second long
> delays.)
>

Thank you, no I don't expect that to cause the issue and it is very likely the delay
is in another subsystem.

Regards,
Jim Baxter


> Alan Stern
>