Re: [PATCH v10 0/5] shut down devices asynchronously

From: David Jeffery
Date: Mon Jul 07 2025 - 11:34:51 EST


On Fri, Jul 4, 2025 at 12:26 PM Sultan Alsawaf <sultan@xxxxxxxxxxxxxxx> wrote:
>
> On Fri, Jul 04, 2025 at 09:45:44AM -0400, David Jeffery wrote:
> > On Thu, Jul 3, 2025 at 12:13 PM Jeremy Allison <jra@xxxxxxxxx> wrote:
> > >
> > > On Thu, Jul 03, 2025 at 01:46:56PM +0200, Christoph Hellwig wrote:
> > > >On Wed, Jun 25, 2025 at 03:18:48PM -0500, Stuart Hayes wrote:
> > > >> Address resource and timing issues when spawning a unique async thread
> > > >> for every device during shutdown:
> > > >> * Make the asynchronous threads able to shut down multiple devices,
> > > >> instead of spawning a unique thread for every device.
> > > >> * Modify core kernel async code with a custom wake function so it
> > > >> doesn't wake up threads waiting to synchronize every time the cookie
> > > >> changes
> > > >
> > > >Given all these thread spawning issues, why can't we just go back
> > > >to the approach that kicks off shutdown asynchronously and then waits
> > > >for it without spawning all these threads?
> > >
> > > It isn't just an nvme issue. Red Hat found the same issue
> > > with SCSI devices.
> > >
> > > My colleague Sultan Alsawaf posted a simpler fix for the
> > > earlier patch here:
> > >
> > > https://lists.infradead.org/pipermail/linux-nvme/2025-January/053666.html
> > >
> > > Maybe this could be explored.
> > >
> >
> > Unfortunately, this approach looks flawed. If I am reading it right,
> > it assumes async shutdown devices do not have dependencies on sync
> > shutdown devices.
>
> It does not make any such assumption. Dependency on a sync device is handled
> through a combination of queue_device_async_shutdown() setting an async device's
> shutdown_after and the synchronous shutdown loop dispatching an "async" shutdown
> for a sync device when it encounters a sync device that has a downstream async
> dependency.
>

Yes, but not what I think fails. This handles a sync parent having an
async child. It does not handle the reverse, a sync child having an
async parent.

For example, take a system with 1 pci nvme device. The nvme device
which is flagged for async shutdown can have sync shutdown children as
well as a sync shutdown parent. The patch linked pulls the async
device off the shutdown list into a separate async list, then starts
this lone async device with queue_device_async_shutdown from being on
the async list. The device then is passed to the async subsystem
running shutdown_one_device_async where it will immediately do
shutdown due to a zero value shutdown_after. The patch sets
shutdown_after for its parent, but there is nothing connecting and
ordering the async device to its sync children which will be shutdown
later from the original device_shutdown task.

> > Maintaining all the dependencies is the core problem and source of the
> > complexity of the async shutdown patches.
>
> I am acutely aware. Please take a closer look at my patch.
>

I have, and it still looks incomplete to me.

David Jeffery