RE: [PATCH 0/4] Drivers: hv: Further protection for the rescind path

From: KY Srinivasan
Date: Thu Feb 05 2015 - 10:53:01 EST




> -----Original Message-----
> From: Vitaly Kuznetsov [mailto:vkuznets@xxxxxxxxxx]
> Sent: Thursday, February 5, 2015 2:10 AM
> To: KY Srinivasan
> Cc: devel@xxxxxxxxxxxxxxxxxxxxxx; Haiyang Zhang; linux-
> kernel@xxxxxxxxxxxxxxx; Dexuan Cui; Jason Wang
> Subject: Re: [PATCH 0/4] Drivers: hv: Further protection for the rescind path
>
> KY Srinivasan <kys@xxxxxxxxxxxxx> writes:
>
> >> -----Original Message-----
> >> From: Vitaly Kuznetsov [mailto:vkuznets@xxxxxxxxxx]
> >> Sent: Tuesday, February 3, 2015 9:01 AM
> >> To: KY Srinivasan; devel@xxxxxxxxxxxxxxxxxxxxxx
> >> Cc: Haiyang Zhang; linux-kernel@xxxxxxxxxxxxxxx; Dexuan Cui; Jason
> >> Wang
> >> Subject: [PATCH 0/4] Drivers: hv: Further protection for the rescind
> >> path
> >>
> >> This series is a continuation of the "Drivers: hv: vmbus: serialize
> >> Offer and Rescind offer". I'm trying to address a number of
> >> theoretically possible issues with rescind offer handling. All these
> >> complications come from the fact that a rescind offer results in
> >> vmbus channel being freed and we must ensure nobody still uses it.
> >> Instead of introducing new locks I suggest we switch channels usage to
> the get/put workflow.
> >>
> >> The main part of the series is [PATCH 1/4] which introduces the
> >> workflow for vmbus channels, all other patches fix different corner
> >> cases using this workflow. I'm not sure all such cases are covered
> >> with this series (probably not), but in case protection is required
> >> in some other places it should become relatively easy to add one.
> >>
> >> I did some sanity testing with CONFIG_DEBUG_LOCKDEP=y and nothing
> >> popped out, however, additional testing would be much appreciated.
> >>
> >> K.Y., Haiyang, I'm not sending this series to netdev@ and linux-scsi@
> >> as it is supposed to be applied as a whole, please resend these
> >> patches with your sign-offs when (and if) we're done with reviews.
> Thanks!
> >
> > Vitaly,
> >
> > Thanks for looking into this issue. While today, rescind offer results
> > in the freeing of the channel, I don't think that is required. By not
> > freeing up the channel in the rescind path, we can have a safe way to
> > access the channel and that does not have to involve taking a
> > reference on the channel every time you access it - the get/put
> > workflow in your patch set. As part of the network performance
> > improvement work, I had eliminated all locks in the receive path by setting
> up per-cpu data structures for mapping the relid to channel etc. These set of
> patches introduces locking/atomic operations in performance critical code
> paths to deal with an event that is truly rare - the channel getting rescinded.
>
> It is possible to eliminate all locks/atomic operations from performance
> critical pyth in my patch series by following Dexuan's suggestion - we'll get
> the channel in vmbus_open and put it in vmbus_close (and on processing
> offer/rescind offer) this won't affect performance. I'm in the middle of
> testing this approach.
>
> >
> > All channel messages are handled in a single work context:
> >
> > vmbus_on_msg_dpc() -> vmbus_onmessage_work()-> Various channel
> > messages [offer, rescind etc.]
> >
> > So, the rescind message cannot be processed while we are processing
> > the offer message and since an offer cannot be rescinded before it is
> > offered, offer and rescind are naturally serialized (I think I have
> > patchset in my queue from you that is trying to solve the concurrent
> execution of offer and rescind and looking at the code I cannot see how this
> can occur).
> >
> > As part of handling the rescind message, we will just set the channel
> > state to indicate that the offer is rescinded (we can add the rescind state to
> the channel states already defined and this will be done under the protection
> of the channel lock).
> > The cleanup of the channel and sending of the RELID release message
> > will only be done in the context of the driver as part of driver
> > remove function. I think this should be doable in a way that does not
> penalize the normal path. If it is ok with you, I will try to put together a patch
> along the lines I have described here.
> >
>
> Yes, if we consider rescind event as a very rare event we can avoid freeing
> channels, but if (in some conditions) it happens frequently we'll have
> significant memory leakage.
>
Rescind offer is rare; but I am not suggesting that we would leak memory in this case.
What I am suggesting is that we can place the burden where it should be - do all
the cleanup in vmbus_close() driven by the remove function.

K. Y
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/