Re: regression caused by block: freeze the queue earlier in del_gendisk

From: Ming Lei
Date: Mon Sep 12 2022 - 21:56:19 EST


On Mon, Sep 12, 2022 at 09:16:18AM +0200, Christoph Hellwig wrote:
> On Fri, Sep 09, 2022 at 04:24:40PM +0800, Ming Lei wrote:
> > On Wed, Sep 07, 2022 at 09:33:24AM +0200, Christoph Hellwig wrote:
> > > On Thu, Sep 01, 2022 at 03:06:08PM +0800, Ming Lei wrote:
> > > > It is a bit hard to associate the above commit with reported issue.
> > >
> > > So the messages clearly are about something trying to open a device
> > > that went away at the block layer, but somehow does not get removed
> > > in time by udev (which seems to be a userspace bug in CoreOS). But
> > > even with that we really should not hang.
> >
> > Xiao Ni provides one script[1] which can reproduce the issue more or less.
>
> I've run the reproduced 10000 times on current mainline, and while
> it prints one of the autoloading messages per run, I've not actually
> seen any kind of hang.

I can't reproduce the hang too.

What I meant is that new raid disk can be added by mdadm after stopping
the imsm container and raid disk with the autoloading messages printed,
I understand this behavior isn't correct, but I am not familiar with
raid enough.

It might be related with the delay deleting gendisk from wq & md kobj
release handler.

During reboot, if mdadm does this stupid thing without stopping, the hang
could be caused.

I think the root cause is that why mdadm tries to open/add new raid bdev
crazily during reboot.


Thanks,
Ming