Re: [PATCH net v3 2/2] bonding: 802.3ad: fix no transmission of LACPDUs

From: Jonathan Toppins
Date: Tue Aug 16 2022 - 13:47:18 EST


On 8/16/22 09:41, Jonathan Toppins wrote:
On 8/16/22 09:11, Jay Vosburgh wrote:
Jonathan Toppins <jtoppins@xxxxxxxxxx> wrote:

This is caused by the global variable ad_ticks_per_sec being zero as
demonstrated by the reproducer script discussed below. This causes
all timer values in __ad_timer_to_ticks to be zero, resulting
in the periodic timer to never fire.

To reproduce:
Run the script in
`tools/testing/selftests/drivers/net/bonding/bond-break-lacpdu-tx.sh` which
puts bonding into a state where it never transmits LACPDUs.

line 44: ip link add fbond type bond mode 4 miimon 200 \
            xmit_hash_policy 1 ad_actor_sys_prio 65535 lacp_rate fast
setting bond param: ad_actor_sys_prio
given:
    params.ad_actor_system = 0
call stack:
    bond_option_ad_actor_sys_prio()
    -> bond_3ad_update_ad_actor_settings()
       -> set ad.system.sys_priority = bond->params.ad_actor_sys_prio
       -> ad.system.sys_mac_addr = bond->dev->dev_addr; because
            params.ad_actor_system == 0
results:
     ad.system.sys_mac_addr = bond->dev->dev_addr

line 48: ip link set fbond address 52:54:00:3B:7C:A6
setting bond MAC addr
call stack:
    bond->dev->dev_addr = new_mac

line 52: ip link set fbond type bond ad_actor_sys_prio 65535
setting bond param: ad_actor_sys_prio
given:
    params.ad_actor_system = 0
call stack:
    bond_option_ad_actor_sys_prio()
    -> bond_3ad_update_ad_actor_settings()
       -> set ad.system.sys_priority = bond->params.ad_actor_sys_prio
       -> ad.system.sys_mac_addr = bond->dev->dev_addr; because
            params.ad_actor_system == 0
results:
     ad.system.sys_mac_addr = bond->dev->dev_addr

line 60: ip link set veth1-bond down master fbond
given:
    params.ad_actor_system = 0
    params.mode = BOND_MODE_8023AD
    ad.system.sys_mac_addr == bond->dev->dev_addr
call stack:
    bond_enslave
    -> bond_3ad_initialize(); because first slave
       -> if ad.system.sys_mac_addr != bond->dev->dev_addr
          return
results:
     Nothing is run in bond_3ad_initialize() because dev_add equals
     sys_mac_addr leaving the global ad_ticks_per_sec zero as it is
     never initialized anywhere else.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Jonathan Toppins <jtoppins@xxxxxxxxxx>
---

Notes:
    v2:
     * split this fix from the reproducer
    v3:
     * rebased to latest net/master

drivers/net/bonding/bond_3ad.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index d7fb33c078e8..957d30db6f95 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -84,7 +84,8 @@ enum ad_link_speed_type {
static const u8 null_mac_addr[ETH_ALEN + 2] __long_aligned = {
    0, 0, 0, 0, 0, 0
};
-static u16 ad_ticks_per_sec;
+
+static u16 ad_ticks_per_sec = 1000 / AD_TIMER_INTERVAL;
static const int ad_delta_in_ticks = (AD_TIMER_INTERVAL * HZ) / 1000;

    I still feel like this is kind of a hack, as it's not really
fixing bond_3ad_initialize to actually work (which is the real problem
as I understand it).  If it's ok to skip all that for this case, then
why do we ever need to call bond_3ad_initialize?


The way it is currently written you still need to call bond_3ad_initialize() just not for setting the tick resolution. The issue here is ad_ticks_per_sec is used in several places to calculate timer periods, __ad_timer_to_ticks(), for various timers in the 802.3ad protocol. And if this variable, ad_ticks_per_sec, is left uninitialized all of these timer periods go to zero. Since the value passed in bond_3ad_initialize() is an immediate value I simply moved it off of the call stack and set the static global variable instead.

To fix bond_3ad_initialize(), probably something like the below is needed, but I do not understand why the guard if check was placed in bond_3ad_initialize().

I looked at the history of the if guard in bond_3ad_initialize and it has existed since the creation of the git tree. It appears since commit 5ee14e6d336f ("bonding: 3ad: apply ad_actor settings changes immediately") the if guard is no longer needed and removing the if guard would also fix the problem, I have not tested yet.

I think this patch series can be accepted as-is because, it does fix the issue as demonstrated by the kselftest accompanying the series and is the smallest change to fix the issue.

Further, I don't see why we want to set the file-scoped variable, ad_ticks_per_sec, inside bond_3ad_initialize() as ad_ticks_per_sec is utilized across all bonds. It seems like ad_ticks_per_sec should be changed to a const and set at the top of the file. I see no value in passing the value as an unnamed constant on the stack when bond_3ad_initialize is called. These changes however could be done in the net-next tree as follow-on cleanup patches.

Jay, how would you like to proceed?

[...]

Thanks,
-Jon