RE: [PATCH 0/4] Drivers: hv: Further protection for the rescind path

From: Dexuan Cui
Date: Thu Feb 05 2015 - 07:45:22 EST


> -----Original Message-----
> From: Vitaly Kuznetsov [mailto:vkuznets@xxxxxxxxxx]
> Sent: Thursday, February 5, 2015 18:10 PM
> To: KY Srinivasan
> Cc: devel@xxxxxxxxxxxxxxxxxxxxxx; Haiyang Zhang; linux-kernel@xxxxxxxxxxxxxxx;
> Dexuan Cui; Jason Wang
> Subject: Re: [PATCH 0/4] Drivers: hv: Further protection for the rescind path
>
> KY Srinivasan <kys@xxxxxxxxxxxxx> writes:
>
> >> -----Original Message-----
> >> From: Vitaly Kuznetsov [mailto:vkuznets@xxxxxxxxxx]
> >> Sent: Tuesday, February 3, 2015 9:01 AM
> >> To: KY Srinivasan; devel@xxxxxxxxxxxxxxxxxxxxxx
> >> Cc: Haiyang Zhang; linux-kernel@xxxxxxxxxxxxxxx; Dexuan Cui; Jason Wang
> >> Subject: [PATCH 0/4] Drivers: hv: Further protection for the rescind path
> >>
> >> This series is a continuation of the "Drivers: hv: vmbus: serialize Offer and
> >> Rescind offer". I'm trying to address a number of theoretically possible issues
> >> with rescind offer handling. All these complications come from the fact that a
> >> rescind offer results in vmbus channel being freed and we must ensure
> >> nobody still uses it. Instead of introducing new locks I suggest we switch
> >> channels usage to the get/put workflow.
> >>
> >> The main part of the series is [PATCH 1/4] which introduces the workflow for
> >> vmbus channels, all other patches fix different corner cases using this
> >> workflow. I'm not sure all such cases are covered with this series (probably
> >> not), but in case protection is required in some other places it should become
> >> relatively easy to add one.
> >>
> >> I did some sanity testing with CONFIG_DEBUG_LOCKDEP=y and nothing
> >> popped out, however, additional testing would be much appreciated.
> >>
> >> K.Y., Haiyang, I'm not sending this series to netdev@ and linux-scsi@ as it is
> >> supposed to be applied as a whole, please resend these patches with your
> >> sign-offs when (and if) we're done with reviews. Thanks!
> >
> > Vitaly,
> >
> > Thanks for looking into this issue. While today, rescind offer results in the
> freeing of the channel, I don't think
> > that is required. By not freeing up the channel in the rescind path, we can have
> a safe way to access the channel and
> > that does not have to involve taking a reference on the channel every time you
> access it - the get/put workflow in your
> > patch set. As part of the network performance improvement work, I had
> eliminated all locks in the receive path by setting
> > up per-cpu data structures for mapping the relid to channel etc. These set of
> patches introduces locking/atomic operations
> > in performance critical code paths to deal with an event that is truly
> > rare - the channel getting rescinded.
>
> It is possible to eliminate all locks/atomic operations from performance
> critical pyth in my patch series by following Dexuan's suggestion -
> we'll get the channel in vmbus_open and put it in vmbus_close (and on
> processing offer/rescind offer) this won't affect performance. I'm in
> the middle of testing this approach.
>
> >
> > All channel messages are handled in a single work context:
> > vmbus_on_msg_dpc() -> vmbus_onmessage_work()-> Various channel
> messages [offer, rescind etc.]
This is true.

> >
> > So, the rescind message cannot be processed while we are processing the
> offer message and since an offer
> > cannot be rescinded before it is offered, offer and rescind are naturally
> > serialized (I think I have patchset in my queue
IMO this may not be true.
The cause is:
(I'm using the latest linux-next repo, which includes Vitaly's patch
"Drivers: hv: vmbus: serialize Offer and Rescind offer".)

vmbus_onoffer_rescind() runs in vmbus_connection.work_queue, but
vmbus_process_offer() runs in the per-channel newchannel->controlwq, so the
two functions are not serialized, at least in theory.

As a result, in vmbus_process_offer(): after the new channel is added into
vmbus_connection.chn_list, but before the channel is completely initialized by
us (we need to create a vmbus device and associate the device with the
channel -- this procedure could fail and we goto err_free_chan and free the
channel directly!), vmbus_onoffer_rescind() can see the new channel, but
doesn't know the channel could be freed in another place at the same time.

BTW, when vmbus_process_offer() -> vmbus_device_create() fails, we
goto err_free_chan without removing the new channel from
vmbus_connection.chn_list?

As another result : in vmbus_process_offer(), in the case
vmbus_device_register() fails, we'll run "list_del(&newchannel->listentry) and
unlock vmbus_connection.channel_lock" -- just after these 2 lines, at this time,
vmbus_onoffer_rescind() -> relid2channel() can return NULL, and we'll miss the
rescind message, i.e., we fail to send the CHANNELMSG_RELID_RELEASED
message to the host.

Of course, these corner cases should be rare since the rescind message is really
rare today, but it may not be true in future: we can have a VMsock-like
guest/host communication channel and many dynamic channels can be
relatively frequently created and rescinded.

Please let me know if my understanding is correct.
If yes, I'm wondering if we can remove the per-channel workqueue and run all the
offer/rescind works in the global vmbus_connection.work_queue? This can realize
the thorough serialization. :-)

> > from you that is trying to solve the concurrent execution of offer and rescind
> > and looking at the code I cannot see how this can occur).
> >
> > As part of handling the rescind message, we will just set the channel state to
> indicate that the offer is rescinded (we can add
> > the rescind state to the channel states already defined and this will be done
> under the protection of the channel lock).
> > The cleanup of the channel and sending of the RELID release message will
> only be done in the context of the driver as part of
> > driver remove function. I think this should be doable in a way that does not
> penalize the normal path. If it is ok with you, I will
> > try to put together a patch along the lines I have described here.
> >
>
> Yes, if we consider rescind event as a very rare event we can avoid
> freeing channels, but if (in some conditions) it happens frequently
> we'll have significant memory leakage.
>
> We can also free them with something like schedule_deyalyed_work with
> e.g. 10 second delay after removing it from all lists so probability of
> hitting a crash will me very low, I seriously doubt we will ever hit it.
>
> Please let me know what you think is better. In case we follow 'never
> free' or 'delayed free' approach I'll extract and send separately PATCH
> 2/4 from my series to address 'loosing rescind offer' issue pointed out
> by Dexuan.
>
> Vitaly

IMO it's not a good idea to 'never free' a channel, considering the future
VMsock-like feature.

Thanks,
-- Dexuan

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/