RE: [PATCH net] hv_netvsc: Fix hibernation for mlx5 VF driver

From: Dexuan Cui
Date: Sat Sep 05 2020 - 23:06:25 EST


> From: Jakub Kicinski <kuba@xxxxxxxxxx>
> Sent: Saturday, September 5, 2020 4:27 PM
> [...]
> On Fri, 4 Sep 2020 19:52:18 -0700 Dexuan Cui wrote:
> > mlx5_suspend()/resume() keep the network interface, so during hibernation
> > netvsc_unregister_vf() and netvsc_register_vf() are not called, and hence
> > netvsc_resume() should call netvsc_vf_changed() to switch the data path
> > back to the VF after hibernation.
>
> Does suspending the system automatically switch back to the synthetic
> datapath?
Yes.

For mlx4, since the VF network interafce is explicitly destroyed and re-created
during hibernation (i.e. suspend + resume), hv_netvsc explicitly switches the
data path from and to the VF.

For mlx5, the VF network interface persists across hibernation, so there is no
explicit switch-over, but after we close and re-open the vmbus channel of
the netvsc NIC in netvsc_suspend() and netvsc_resume(), the data path is
implicitly switched to the netvsc NIC, and with this patch netvsc_resume() ->
netvsc_vf_changed() switches the data path back to the mlx5 NIC.

> Please clarify this in the commit message and/or add a code
> comment.
I will add a comment in the commit message and the code.

> > @@ -2587,7 +2587,7 @@ static int netvsc_remove(struct hv_device *dev)
> > static int netvsc_suspend(struct hv_device *dev)
> > {
> > struct net_device_context *ndev_ctx;
> > - struct net_device *vf_netdev, *net;
> > + struct net_device *net;
> > struct netvsc_device *nvdev;
> > int ret;
>
> Please keep reverse xmas tree variable ordering.

Will do.

> > @@ -2635,6 +2632,10 @@ static int netvsc_resume(struct hv_device *dev)
> > netvsc_devinfo_put(device_info);
> > net_device_ctx->saved_netvsc_dev_info = NULL;
> >
> > + vf_netdev = rtnl_dereference(net_device_ctx->vf_netdev);
> > + if (vf_netdev && netvsc_vf_changed(vf_netdev) != NOTIFY_OK)
> > + ret = -EINVAL;
>
> Should you perhaps remove the VF in case of the failure?
IMO this failure actually should not happen since we're resuming the netvsc
NIC, so we're sure we have a valid pointer to the netvsc net device, and
netvsc_vf_changed() should be able to find the netvsc pointer and return
NOTIFY_OK. In case of a failure, something really bad must be happening,
and I'm not sure if it's safe to simply remove the VF, so I just return
-EINVAL for simplicity, since I believe the failure should not happen in practice.

I would rather keep the code as-is, but I'm OK to add a WARN_ON(1) if you
think that's necessary.

Thanks,
-- Dexuan