RE: [PATCH] Drivers: hv: vm_bus: Handle vmbus rescind calls after vmbus is suspended

From: Michael Kelley (LINUX)
Date: Tue Jul 05 2022 - 11:42:12 EST


From: Shradha Gupta <shradhagupta@xxxxxxxxxxxxxxxxxxx> Sent: Saturday, July 2, 2022 11:54 PM
>
> Add a flag to indicate that the vmbus is suspended so we should ignore the
> rescind offer message.

>From the code, it appears that the flag causes both offer and rescind offer
messages to be ignored.

> Add a new work_queue for rescind msg, so we could
> drain it in vmbus_suspend processing. This should help avoid any rescind
> offer msg processing if ignore_offer_rescind_msg flag is set to true.
> It was observed in some hibernation related scenario testing that after
> KVP_vmbus suspend, we get another rescind offer message for the vmbus. This

I'm not clear on what "KVP_vmbus suspend" is.

> led to rescind message processing after vm_suspend and we would end up with
> a warning and stack dumps

And what is "vm_suspend"? There is a function vmbus_bus_suspend() -- is that
perhaps the intention here?

But I think the core issue you observed is that after vmbus_bus_suspend() runs,
we might still process a rescind message for a channel that has already
been closed as part of the suspend operation.

>
> Signed-off-by: Shradha Gupta <shradhagupta@xxxxxxxxxxxxxxxxxxx>
> ---
> drivers/hv/connection.c | 11 +++++++++++
> drivers/hv/hyperv_vmbus.h | 7 +++++++
> drivers/hv/vmbus_drv.c | 24 +++++++++++++++++++++++-
> 3 files changed, 41 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
> index 6218bbf6863a..88a0fd8e80c0 100644
> --- a/drivers/hv/connection.c
> +++ b/drivers/hv/connection.c
> @@ -171,6 +171,14 @@ int vmbus_connect(void)
> goto cleanup;
> }
>
> + vmbus_connection.rescind_work_queue =
> + create_workqueue("hv_vmbus_rescind");
> + if (!vmbus_connection.rescind_work_queue) {
> + ret = -ENOMEM;
> + goto cleanup;
> + }
> + vmbus_connection.ignore_offer_rescind_msg = false;
> +
> vmbus_connection.handle_primary_chan_wq =
> create_workqueue("hv_pri_chan");
> if (!vmbus_connection.handle_primary_chan_wq) {
> @@ -357,6 +365,9 @@ void vmbus_disconnect(void)
> if (vmbus_connection.handle_primary_chan_wq)
> destroy_workqueue(vmbus_connection.handle_primary_chan_wq);
>
> + if (vmbus_connection.rescind_work_queue)
> + destroy_workqueue(vmbus_connection.rescind_work_queue);
> +
> if (vmbus_connection.work_queue)
> destroy_workqueue(vmbus_connection.work_queue);
>
> diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
> index 4f5b824b16cf..ff8707284554 100644
> --- a/drivers/hv/hyperv_vmbus.h
> +++ b/drivers/hv/hyperv_vmbus.h
> @@ -262,6 +262,13 @@ struct vmbus_connection {
> struct workqueue_struct *handle_primary_chan_wq;
> struct workqueue_struct *handle_sub_chan_wq;
>
> + /*
> + * On suspension of the vmbus, the accumulated rescind message
> + * must be dropped.

As noted regarding the commit message, the flag causes both offer
messages and rescind offer messages to be dropped. Maybe the flag
should be renamed so that it is clear both message types are
dropped?

> + */
> + bool ignore_offer_rescind_msg;
> + struct workqueue_struct *rescind_work_queue;
> +
> /*
> * The number of sub-channels and hv_sock channels that should be
> * cleaned up upon suspend: sub-channels will be re-created upon
> diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
> index 547ae334e5cd..46bd867f11ba 100644
> --- a/drivers/hv/vmbus_drv.c
> +++ b/drivers/hv/vmbus_drv.c
> @@ -1160,7 +1160,9 @@ void vmbus_on_msg_dpc(unsigned long data)
> * work queue: the RESCIND handler can not start to
> * run before the OFFER handler finishes.
> */
> - schedule_work(&ctx->work);
> + if (vmbus_connection.ignore_offer_rescind_msg)
> + break;
> + queue_work(vmbus_connection.rescind_work_queue, &ctx->work);
> break;
>
> case CHANNELMSG_OFFERCHANNEL:
> @@ -1186,6 +1188,8 @@ void vmbus_on_msg_dpc(unsigned long data)
> * to the CPUs which will execute the offer & rescind
> * works by the time these works will start execution.
> */
> + if (vmbus_connection.ignore_offer_rescind_msg)
> + break;
> atomic_inc(&vmbus_connection.offer_in_progress);
> fallthrough;
>
> @@ -2446,8 +2450,20 @@ static int vmbus_acpi_add(struct acpi_device *device)
> #ifdef CONFIG_PM_SLEEP
> static int vmbus_bus_suspend(struct device *dev)
> {
> + struct hv_per_cpu_context *hv_cpu = per_cpu_ptr(
> + hv_context.cpu_context, VMBUS_CONNECT_CPU);
> struct vmbus_channel *channel, *sc;
>
> + tasklet_disable(&hv_cpu->msg_dpc);
> + vmbus_connection.ignore_offer_rescind_msg = true;
> + tasklet_enable(&hv_cpu->msg_dpc);
> +
> + /* Drain all the workqueues as we are in suspend */
> + drain_workqueue(vmbus_connection.rescind_work_queue);
> + drain_workqueue(vmbus_connection.work_queue);
> + drain_workqueue(vmbus_connection.handle_primary_chan_wq);
> + drain_workqueue(vmbus_connection.handle_sub_chan_wq);
> +

The above looks good to me. The ordering is very important, and you have
the order correct. Once the tasklet is re-enabled, there may be messages
queued up that will be processed. But the flag is now set, so those messages
will not result in any work being queued to work_queue or rescind_work_queue.
Note that a memory barrier should be used after setting the flag to true, but
tasklet_enable() provides that memory barrier, which might be worth a comment.
Then the work_queue and rescind_work_queues are drained. The process
of draining these work queues could put entries into the handle_primary_chan_wq
or handle_sub_chan_wq, so then those two queues are drained.

> while (atomic_read(&vmbus_connection.offer_in_progress) != 0) {

With your above changes in place, I think the test here for offer_in_progress is
redundant. There can't be any offers in progress since the workqueues
have been drained, and new entries to the workqueues are prevented.

> /*
> * We wait here until the completion of any channel
> @@ -2527,10 +2543,16 @@ static int vmbus_bus_suspend(struct device *dev)
>
> static int vmbus_bus_resume(struct device *dev)
> {
> + struct hv_per_cpu_context *hv_cpu = per_cpu_ptr(
> + hv_context.cpu_context, VMBUS_CONNECT_CPU);
> struct vmbus_channel_msginfo *msginfo;
> size_t msgsize;
> int ret;
>
> + tasklet_disable(&hv_cpu->msg_dpc);
> + vmbus_connection.ignore_offer_rescind_msg = false;
> + tasklet_enable(&hv_cpu->msg_dpc);
> +

Is there a reason for the tasklet_disable()/enable() calls? I can see that they
are needed when the flag transitions from false to true, but it's not clear to
me why they would be needed for the transition from true to false.

> /*
> * We only use the 'vmbus_proto_version', which was in use before
> * hibernation, to re-negotiate with the host.
> --
> 2.17.1