Re: [PATCH 0/4] Drivers: hv: Further protection for the rescind path

From: Vitaly Kuznetsov
Date: Thu Feb 05 2015 - 05:14:10 EST


KY Srinivasan <kys@xxxxxxxxxxxxx> writes:

>> -----Original Message-----
>> From: Vitaly Kuznetsov [mailto:vkuznets@xxxxxxxxxx]
>> Sent: Tuesday, February 3, 2015 9:01 AM
>> To: KY Srinivasan; devel@xxxxxxxxxxxxxxxxxxxxxx
>> Cc: Haiyang Zhang; linux-kernel@xxxxxxxxxxxxxxx; Dexuan Cui; Jason Wang
>> Subject: [PATCH 0/4] Drivers: hv: Further protection for the rescind path
>>
>> This series is a continuation of the "Drivers: hv: vmbus: serialize Offer and
>> Rescind offer". I'm trying to address a number of theoretically possible issues
>> with rescind offer handling. All these complications come from the fact that a
>> rescind offer results in vmbus channel being freed and we must ensure
>> nobody still uses it. Instead of introducing new locks I suggest we switch
>> channels usage to the get/put workflow.
>>
>> The main part of the series is [PATCH 1/4] which introduces the workflow for
>> vmbus channels, all other patches fix different corner cases using this
>> workflow. I'm not sure all such cases are covered with this series (probably
>> not), but in case protection is required in some other places it should become
>> relatively easy to add one.
>>
>> I did some sanity testing with CONFIG_DEBUG_LOCKDEP=y and nothing
>> popped out, however, additional testing would be much appreciated.
>>
>> K.Y., Haiyang, I'm not sending this series to netdev@ and linux-scsi@ as it is
>> supposed to be applied as a whole, please resend these patches with your
>> sign-offs when (and if) we're done with reviews. Thanks!
>
> Vitaly,
>
> Thanks for looking into this issue. While today, rescind offer results in the freeing of the channel, I don't think
> that is required. By not freeing up the channel in the rescind path, we can have a safe way to access the channel and
> that does not have to involve taking a reference on the channel every time you access it - the get/put workflow in your
> patch set. As part of the network performance improvement work, I had eliminated all locks in the receive path by setting
> up per-cpu data structures for mapping the relid to channel etc. These set of patches introduces locking/atomic operations
> in performance critical code paths to deal with an event that is truly
> rare - the channel getting rescinded.

It is possible to eliminate all locks/atomic operations from performance
critical pyth in my patch series by following Dexuan's suggestion -
we'll get the channel in vmbus_open and put it in vmbus_close (and on
processing offer/rescind offer) this won't affect performance. I'm in
the middle of testing this approach.

>
> All channel messages are handled in a single work context:
>
> vmbus_on_msg_dpc() -> vmbus_onmessage_work()-> Various channel messages [offer, rescind etc.]
>
> So, the rescind message cannot be processed while we are processing the offer message and since an offer
> cannot be rescinded before it is offered, offer and rescind are naturally serialized (I think I have patchset in my queue
> from you that is trying to solve the concurrent execution of offer and rescind and looking at the code I cannot see how
> this can occur).
>
> As part of handling the rescind message, we will just set the channel state to indicate that the offer is rescinded (we can add
> the rescind state to the channel states already defined and this will be done under the protection of the channel lock).
> The cleanup of the channel and sending of the RELID release message will only be done in the context of the driver as part of
> driver remove function. I think this should be doable in a way that does not penalize the normal path. If it is ok with you, I will
> try to put together a patch along the lines I have described here.
>

Yes, if we consider rescind event as a very rare event we can avoid
freeing channels, but if (in some conditions) it happens frequently
we'll have significant memory leakage.

We can also free them with something like schedule_deyalyed_work with
e.g. 10 second delay after removing it from all lists so probability of
hitting a crash will me very low, I seriously doubt we will ever hit it.

Please let me know what you think is better. In case we follow 'never
free' or 'delayed free' approach I'll extract and send separately PATCH
2/4 from my series to address 'loosing rescind offer' issue pointed out
by Dexuan.

Thanks,

> Regards,
>
> K. Y
>
>>
>> Vitaly Kuznetsov (4):
>> Drivers: hv: vmbus: implement get/put usage workflow for vmbus
>> channels
>> Drivers: hv: vmbus: do not lose rescind offer on failure in
>> vmbus_process_offer()
>> Drivers: hv: vmbus: protect vmbus_get_outgoing_channel() against
>> channel removal
>> hyperv: netvsc: improve protection against rescind offer
>>
>> drivers/hv/channel_mgmt.c | 75
>> +++++++++++++++++++++++++++++++++++++--------
>> drivers/hv/connection.c | 7 +++--
>> drivers/hv/hyperv_vmbus.h | 4 +++
>> drivers/net/hyperv/netvsc.c | 10 ++++-- drivers/scsi/storvsc_drv.c | 2 ++
>> include/linux/hyperv.h | 13 ++++++++
>> 6 files changed, 95 insertions(+), 16 deletions(-)
>>
>> --
>> 1.9.3

--
Vitaly
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/