Re: [PROBLEM] nbd requests become stuck when devices watched by inotify emit udev uevent changes

From: Matthew Ruffell
Date: Mon Apr 25 2022 - 18:22:09 EST


Hi Josef,

The pastebin has expired the link, and I can't access your patch.
Seems to default to 1 day deletion.

Could you please create a new paste or send the patch inline in this
email thread?

I am more than happy to try the patch out.

Thank you for your analysis.
Matthew

On Sat, Apr 23, 2022 at 3:24 AM Josef Bacik <josef@xxxxxxxxxxxxxx> wrote:
>
> On Fri, Apr 22, 2022 at 1:42 AM Matthew Ruffell
> <matthew.ruffell@xxxxxxxxxxxxx> wrote:
> >
> > Dear maintainers of the nbd subsystem,
> >
> > A user has come across an issue which causes the nbd module to hang after a
> > disconnect where a write has been made to a qemu qcow image file, with qemu-nbd
> > being the server.
> >
>
> Ok there's two problems here, but I want to make sure I have the right
> fix for the hang first. Can you apply this patch
>
> https://paste.centos.org/view/b1a2d01a
>
> and make sure the hang goes away? Once that part is fixed I'll fix
> the IO errors, this is just us racing with systemd while we teardown
> the device and then we're triggering a partition read while the device
> is going down and it's complaining loudly. Before we would
> set_capacity to 0 whenever we disconnected, but that causes problems
> with file systems that may still have the device open. However now we
> only do this if the server does the CLEAR_SOCK ioctl, which clearly
> can race with systemd poking the device, so I need to make it
> set_capacity(0) when the last opener closes the device to prevent this
> style of race.
>
> Let me know if that patch fixes the hang, and then I'll work up
> something for the capacity problem. Thanks,
>
> Josef