Re: [PATCH] bonding: fix arp_validate toggling in active-backup mode

From: Jay Vosburgh
Date: Fri May 10 2019 - 19:04:44 EST


Jarod Wilson <jarod@xxxxxxxxxx> wrote:

>There's currently a problem with toggling arp_validate on and off with an
>active-backup bond. At the moment, you can start up a bond, like so:
>
>modprobe bonding mode=1 arp_interval=100 arp_validate=0 arp_ip_targets=192.168.1.1
>ip link set bond0 down
>echo "ens4f0" > /sys/class/net/bond0/bonding/slaves
>echo "ens4f1" > /sys/class/net/bond0/bonding/slaves
>ip link set bond0 up
>ip addr add 192.168.1.2/24 dev bond0
>
>Pings to 192.168.1.1 work just fine. Now turn on arp_validate:
>
>echo 1 > /sys/class/net/bond0/bonding/arp_validate
>
>Pings to 192.168.1.1 continue to work just fine. Now when you go to turn
>arp_validate off again, the link falls flat on it's face:
>
>echo 0 > /sys/class/net/bond0/bonding/arp_validate
>dmesg
>...
>[133191.911987] bond0: Setting arp_validate to none (0)
>[133194.257793] bond0: bond_should_notify_peers: slave ens4f0
>[133194.258031] bond0: link status definitely down for interface ens4f0, disabling it
>[133194.259000] bond0: making interface ens4f1 the new active one
>[133197.330130] bond0: link status definitely down for interface ens4f1, disabling it
>[133197.331191] bond0: now running without any active interface!
>
>The problem lies in bond_options.c, where passing in arp_validate=0
>results in bond->recv_probe getting set to NULL. This flies directly in
>the face of commit 3fe68df97c7f, which says we need to set recv_probe =
>bond_arp_recv, even if we're not using arp_validate. Said commit fixed
>this in bond_option_arp_interval_set, but missed that we can get to that
>same state in bond_option_arp_validate_set as well.
>
>One solution would be to universally set recv_probe = bond_arp_recv here
>as well, but I don't think bond_option_arp_validate_set has any business
>touching recv_probe at all, and that should be left to the arp_interval
>code, so we can just make things much tidier here.
>
>Fixes: 3fe68df97c7f ("bonding: always set recv_probe to bond_arp_rcv in arp monitor")

Is the above Fixes: tag correct? 3fe68df97c7f is not the source
of the erroneous logic being removed, which was introduced by

commit 29c4948293bfc426e52a921f4259eb3676961e81
Author: sfeldma@xxxxxxxxxxxxxxxxxxx <sfeldma@xxxxxxxxxxxxxxxxxxx>
Date: Thu Dec 12 14:10:38 2013 -0800

bonding: add arp_validate netlink support

Regardless of which Fixes: is correct, the patch itself looks
fine to me:

Signed-off-by: Jay Vosburgh <jay.vosburgh@xxxxxxxxxxxxx>

-J


>CC: Jay Vosburgh <j.vosburgh@xxxxxxxxx>
>CC: Veaceslav Falico <vfalico@xxxxxxxxx>
>CC: Andy Gospodarek <andy@xxxxxxxxxxxxx>
>CC: "David S. Miller" <davem@xxxxxxxxxxxxx>
>CC: netdev@xxxxxxxxxxxxxxx
>Signed-off-by: Jarod Wilson <jarod@xxxxxxxxxx>
>---
> drivers/net/bonding/bond_options.c | 7 -------
> 1 file changed, 7 deletions(-)
>
>diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c
>index da1fc17295d9..b996967af8d9 100644
>--- a/drivers/net/bonding/bond_options.c
>+++ b/drivers/net/bonding/bond_options.c
>@@ -1098,13 +1098,6 @@ static int bond_option_arp_validate_set(struct bonding *bond,
> {
> netdev_dbg(bond->dev, "Setting arp_validate to %s (%llu)\n",
> newval->string, newval->value);
>-
>- if (bond->dev->flags & IFF_UP) {
>- if (!newval->value)
>- bond->recv_probe = NULL;
>- else if (bond->params.arp_interval)
>- bond->recv_probe = bond_arp_rcv;
>- }
> bond->params.arp_validate = newval->value;
>
> return 0;
>--
>2.20.1
>

---
-Jay Vosburgh, jay.vosburgh@xxxxxxxxxxxxx