RE: [PATCH] hv: hv_balloon: Fixed an issue in the earor handling code if probe failed

From: Michael Kelley (LINUX)
Date: Sat May 07 2022 - 11:13:36 EST


From: Shradha Gupta <shradhagupta@xxxxxxxxxxxxx>
> Subject: [PATCH] hv: hv_balloon: Fixed an issue in the earor handling code if probe failed

Typo in the "Subject" line -- 'earor' should be 'error'. Also the prefix
to use has varied historically, but is either "hv_balloon:" or the full
"Drivers: hv: balloon:". Use one of those two prefixes rather than
creating yet another variant.

And let me suggest simpler wording. For example:

hv_balloon: Fix balloon_probe() and balloon_remove() error handling

>
> If the balloon_probe() function fails, we do some cleanup and similar
> functions are called again when after this a balloon_remove()
> call is made. This fix makes sure that the cleanup is not called
> twice. Also made sure if dm_state is DM_INIT_ERROR, the clean up is
> already done properly.

I don't think this commit message is quite correct. There is indeed
some missing cleanup in balloon_probe() if balloon_connect_vsp() fails.
But if balloon_probe() fails, balloon_remove() will never be called, so
there's no duplicate cleanup in that scenario.

But if balloon_resume() fails, then balloon_remove() could be
called at some point and mistakenly do duplicate cleanup. That's
how balloon_remove() could be called with dm_state set to
DM_INIT_ERROR.

Also commit messages should generally not use language like "this
patch" or "this fix". Just state the change in the imperative
voice.

I would suggest wording such as the following:

Add missing cleanup in balloon_probe() if the call to
balloon_connect_vsp() fails. Also correctly handle cleanup in
balloon_remove() when dm_state is DM_INIT_ERROR because
balloon_resume() failed.

>
> Signed-off-by: Shradha Gupta <shradhagupta@xxxxxxxxxxxxx>
> ---
> drivers/hv/hv_balloon.c | 17 ++++++++++++-----
> 1 file changed, 12 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
> index eee7402cfc02..7c62c1c3ffc7 100644
> --- a/drivers/hv/hv_balloon.c
> +++ b/drivers/hv/hv_balloon.c
> @@ -1841,8 +1841,13 @@ static int balloon_probe(struct hv_device *dev,
> hv_set_drvdata(dev, &dm_device);
>
> ret = balloon_connect_vsp(dev);
> - if (ret != 0)
> + if (ret != 0) {
> +#ifdef CONFIG_MEMORY_HOTPLUG
> + unregister_memory_notifier(&hv_memory_nb);
> + restore_online_page_callback(&hv_online_page);
> +#endif
> return ret;
> + }

Rather than coding the cleanup steps directly under the "if"
statement, I would suggest using "goto connect_error", and adding
a "connect_error" label immediately after the vmbus_close() in
the "probe_error" sequence. This approach avoids some
duplicate code.

>
> enable_page_reporting();
> dm_device.state = DM_INITIALIZED;
> @@ -1882,12 +1887,14 @@ static int balloon_remove(struct hv_device *dev)
> cancel_work_sync(&dm->ha_wrk.wrk);
>
> kthread_stop(dm->thread);
> - disable_page_reporting();
> - vmbus_close(dev->channel);
> + if (dm_device.state != DM_INIT_ERROR) {
> + disable_page_reporting();
> + vmbus_close(dev->channel);
> #ifdef CONFIG_MEMORY_HOTPLUG
> - unregister_memory_notifier(&hv_memory_nb);
> - restore_online_page_callback(&hv_online_page);
> + unregister_memory_notifier(&hv_memory_nb);
> + restore_online_page_callback(&hv_online_page);
> #endif
> + }

This case where dm_device.state != DM_INIT_ERROR can only
occur when balloon_resume() fails. Looking at balloon_resume(),
its error handling code has already done the cleanup except for
disable_page_reporting(). A call to disable_page_reporting()
needs to be done somewhere for this case, and for consistency
balloon_resume() is probably the best place.

I would also suggest adding a comment just above the new "if"
statement you've added, indicating that the case being handled
is when balloon_resume() fails. Again, balloon_remove() won't
be called if balloon_probe() fails, so it is a little bit surprising
to end up in balloon_remove() with the state being DM_INIT_ERROR.

Michael

> spin_lock_irqsave(&dm_device.ha_lock, flags);
> list_for_each_entry_safe(has, tmp, &dm->ha_region_list, list) {
> list_for_each_entry_safe(gap, tmp_gap, &has->gap_list, list) {
> --
> 2.17.1
>