Re: [PATCH] net: bonding: alb disable balance for IPv6 multicast related mac

From: Jay Vosburgh
Date: Wed Oct 28 2020 - 21:47:25 EST


LIU Yulong <i@xxxxxxxxxxxx> wrote:

>According to the RFC 2464 [1] the prefix "33:33:xx:xx:xx:xx" is defined to
>construct the multicast destination MAC address for IPv6 multicast traffic.
>The NDP (Neighbor Discovery Protocol for IPv6)[2] will comply with such
>rule. The work steps [6] are:
> *) Let's assume a destination address of 2001:db8:1:1::1.
> *) This is mapped into the "Solicited Node Multicast Address" (SNMA)
> format of ff02::1:ffXX:XXXX.
> *) The XX:XXXX represent the last 24 bits of the SNMA, and are derived
> directly from the last 24 bits of the destination address.
> *) Resulting in a SNMA ff02::1:ff00:0001, or ff02::1:ff00:1.
> *) This, being a multicast address, can be mapped to a multicast MAC
> address, using the format 33-33-XX-XX-XX-XX
> *) Resulting in 33-33-ff-00-00-01.
> *) This is a MAC address that is only being listened for by nodes
> sharing the same last 24 bits.
> *) In other words, while there is a chance for a "address collision",
> it is a vast improvement over ARP's guaranteed "collision".
>Kernel related code can be found at [3][4][5].
>
>The current bond alb has some leaks of such MAC ranges which will cause
>the physical world failed to determain the back tunnel of the reply
>packet during the response in a Spine-and-Leaf data center architecture.
>The basic topology looks like this:
>
> +-------------+
> | |
> +---| Border Leaf |-----+
> | | | |
> | +-------------+ |
> | |
> | tunnel-1 | tunnel-2
> | |
> | |
>+---+----+ +------+-+
>| | | |
>| Leaf1 +--X-X-X-X--+ Leaf2 | tunnel-3 will be checked to prevent loop
>| | tunnel-3 | |
>+--------+ +-+------+
> | |
> | |
> | |
> | |
> | |
> | |
> +----+ +----+
> +--+nic1+---+nic2+---+
> | +----+ +----+ |
> | bond6 |
> | |
> | HOST |
> +--------------------+

This description is, overall, very comprehensive, and I believe
I generally understand what issue you're fixing (which seems to be a
complicated means to cause MAC flapping), although I'm unclear on a few
details, below.

However, if you could make the ASCII art smaller I think that
would be better.

>When nic1 is sending the normal IPv6 traffic to the gateway in Border leaf,
>the nic2 (slave) will send the NS packet out periodically, automatically
>and implicitly as well. This is an example packet sending from the slave
>nic2 which will broke the traffic.

With this patch applied, what would happen if nic2 sends the
normal IPv6 traffic from the source MAC in question (because it is
tx-balanced there), and the Neighbor Solicitation multicast then goes
out via nic1?

> ac:1f:6b:90:5c:eb > 33:33:ff:00:00:01, ethertype 802.1Q (0x8100),
> length 90: vlan 205, p 0, ethertype IPv6, (hlim 255,
> next-header ICMPv6 (58) payload length: 32)
> fe80::f816:3eff:feba:2d8c > ff02::1:ff00:1:
> [icmp6 sum ok] ICMP6, neighbor solicitation, length 32,
> who has 240e:980:2f00:4000::1
> source link-address option (1), length 8 (1): fa:16:3e:ba:2d:8c
> 0x0000: fa16 3eba 2d8c
> 0x0000: 3333 ff00 0001 ac1f 6b90 5ceb 8100 00cd
> 0x0010: 86dd 6000 0000 0020 3aff fe80 0000 0000
> 0x0020: 0000 f816 3eff feba 2d8c ff02 0000 0000
> 0x0030: 0000 0000 0001 ff00 0001 8700 14d3 0000
> 0x0040: 0000 240e 0980 2f00 4000 0000 0000 0000
> 0x0050: 0001 0101 fa16 3eba 2d8c

And perhaps trim out the hex dump here.

>MAC "fa:16:3e:ba:2d:8c" was first learnt at Leaf1 based on the underlay
>mechanism(BGP EVPN). When this example packet was sent to Border leaf and
>replied with dst_mac "fa:16:3e:ba:2d:8c", Leaf2 will try to send packet
>back to tunnel-3 at this point dropping happens because of the loop
>defense. All the original normal IPv6 traffic will be lead to the tunnel-2
>and then drop. Link is broken now.

Where does MAC fa:16:3e:ba:2d:8c come from? Is this the MAC
address of the bond itself?

Assuming that "learnt at Leaf1" means that Leaf1 knows to
forward it to bond6:nic1, why does the loop defense drop the packet if
Leaf1 is on the forwarding path?

>This patch addresses such issue by check the entire MAC range definde by
>the RFC 2464. Adding a new helper method to check the first two octets
>are the value 3333. If the dest mac is matched, no balance will be
>enabled.
>
>[1] https://tools.ietf.org/html/rfc2464#section-7
>[2] https://tools.ietf.org/html/rfc4861
>[3] linux.git/tree/include/net/if_inet6.h#n209-n221
>[4] linux.git/tree/net/ipv6/ndisc.c#n291
>[5] linux.git/tree/net/ipv6/ndisc.c#n346-n348
>[6] https://en.citizendium.org/wiki/Neighbor_Discovery
>
>Signed-off-by: LIU Yulong <i@xxxxxxxxxxxx>
>---
> drivers/net/bonding/bond_alb.c | 10 ++++------
> include/linux/etherdevice.h | 12 ++++++++++++
> 2 files changed, 16 insertions(+), 6 deletions(-)
>
>diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
>index 095ea51..a4a30bd 100644
>--- a/drivers/net/bonding/bond_alb.c
>+++ b/drivers/net/bonding/bond_alb.c
>@@ -24,9 +24,6 @@
> #include <net/bonding.h>
> #include <net/bond_alb.h>
>
>-static const u8 mac_v6_allmcast[ETH_ALEN + 2] __long_aligned = {
>- 0x33, 0x33, 0x00, 0x00, 0x00, 0x01
>-};
> static const int alb_delta_in_ticks = HZ / ALB_TIMER_TICKS_PER_SEC;
>
> #pragma pack(1)
>@@ -1422,10 +1419,11 @@ struct slave *bond_xmit_alb_slave_get(struct bonding *bond,
> break;
> }
>
>- /* IPv6 uses all-nodes multicast as an equivalent to
>- * broadcasts in IPv4.
>+ /* IPv6 multicast destination should disable the tx-balance since
>+ * the pyhsical world may get into a mass status which will lead
>+ * to the IPv6 traffic broken.

I think this comment can be simplified to simply say that IPv6
multicast destinations should not be tx-balanced, which I suspect is the
real purpose.

> */
>- if (ether_addr_equal_64bits(eth_data->h_dest, mac_v6_allmcast)) {
>+ if (is_ipv6_multicast_ether_addr(eth_data->h_dest)) {
> do_tx_balance = false;
> break;
> }
>diff --git a/include/linux/etherdevice.h b/include/linux/etherdevice.h
>index 2e5debc..c6101ab 100644
>--- a/include/linux/etherdevice.h
>+++ b/include/linux/etherdevice.h
>@@ -178,6 +178,18 @@ static inline bool is_unicast_ether_addr(const u8 *addr)
> }
>
> /**
>+ * is_ipv6_multicast_ether_addr - Determine if the Ethernet address is for
>+ * IPv6 multicast (rfc2464).
>+ * @addr: Pointer to a six-byte array containing the Ethernet address
>+ *
>+ * Return true if the address is a multicast for IPv6.
>+ */
>+static inline bool is_ipv6_multicast_ether_addr(const u8 *addr)
>+{
>+ return (addr[0] & addr[1]) == 0x33;
>+}

I don't think this does what is intended. It will return true
for a MAC that starts with any two values whose bitwise AND is 0x33,
e.g., 0x73 0x3b. For IPv6 multicast, the first two octets of the MAC
must be exactly 0x33 0x33.

-J

>+
>+/**
> * is_valid_ether_addr - Determine if the given Ethernet address is valid
> * @addr: Pointer to a six-byte array containing the Ethernet address
> *
>--
>1.8.3.1

---
-Jay Vosburgh, jay.vosburgh@xxxxxxxxxxxxx