Re: [PATCH] PCI: hv: Do not wait forever on a device that has disappeared

From: Lorenzo Pieralisi
Date: Fri May 25 2018 - 08:00:32 EST


On Thu, May 24, 2018 at 11:55:35PM +0000, Dexuan Cui wrote:
> > From: Lorenzo Pieralisi <lorenzo.pieralisi@xxxxxxx>
> > Sent: Thursday, May 24, 2018 05:41
> > On Wed, May 23, 2018 at 09:12:01PM +0000, Dexuan Cui wrote:
> > >
> > > Before the guest finishes the device initialization, the device can be
> > > removed anytime by the host, and after that the host won't respond to
> > > the guest's request, so the guest should be prepared to handle this
> > > case.
> > >
> > > --- a/drivers/pci/host/pci-hyperv.c
> > > +++ b/drivers/pci/host/pci-hyperv.c
> > > @@ -556,6 +556,26 @@ static void put_pcichild(struct hv_pci_dev
> > *hv_pcidev,
> > > static void get_hvpcibus(struct hv_pcibus_device *hv_pcibus);
> > > static void put_hvpcibus(struct hv_pcibus_device *hv_pcibus);
> > >
> > > +/*
> > > + * There is no good way to get notified from vmbus_onoffer_rescind(),
> > > + * so let's use polling here, since this is not a hot path.
> > > + */
> > > +static int wait_for_response(struct hv_device *hdev,
> > > + struct completion *comp)
> > > +{
> > > + while (true) {
> > > + if (hdev->channel->rescind) {
> > > + dev_warn_once(&hdev->device, "The device is gone.\n");
> > > + return -ENODEV;
> > > + }
> > > +
> > > + if (wait_for_completion_timeout(comp, HZ / 10))
> > > + break;
> > > + }
> > > +
> > > + return 0;
> >
> > This is pretty racy, isn't it ? Also, I reckon you should consider the
> > timeout return value as an error condition unless I am completely
> > missing the point of what you are doing.
> >
> > Lorenzo
>
> Actually, this is not racy: we only exit the loop when
> 1) the channel is rescinded
> or
> 2) the channel is not rescinded, and the event is completed.
>
> wait_for_completion_timeout() returns 0 if timed out: in this case,
> we keep spinning in the loop every 0.1 second, testing the 2 conditions.

Yes sorry, you are right, the exit condition is correct, I am waiting for
maintainers ACK to merge it, I need it as soon as possible if you want
this to make it for v4.18.

Thanks,
Lorenzo

> If the chanel is not rescinded, here we should wait for the event
> forever, as the host is supposed to respond to us quickly, and the
> event will be completed accordingly. This is what the current code
> does. But, in case the channel is rescinded, we need to exit the loop
> immediately with an error return value: this is the only change
> made by the patch.
>
> Ideally, we should not use this ugly "polling" method, and the
> rescind-handler, i.e. vmbus_onoffer_rescind(), should notify
> wait_for_response(), but as I mentioned, there is no good way
> to get notified from vmbus_onoffer_rescind(), so I'm proposing
> this "polling" method: it's simple and it can work correctly,
> and this is not a hot path.
>
> Thanks,
> -- Dexuan