Re: bridge:fragmented packets dropped by bridge

From: Vadim Fedorenko
Date: Wed Oct 12 2022 - 21:25:53 EST


On 07.10.2022 12:21, Vyacheslav Salnikov wrote:
Hi.

I switched from kernel versions 4.9 to 5.15 and found that the MTU on
the interfaces in the bridge does not change.
For example:
I have the following bridge:
bridge interface
br0 sw1
sw2
sw3

And I change with ifconfig MTU.
I see that br0 sw1..sw3 has changed MTU from 1500 -> 1982.

But if i send a ping through these interfaces, I get 1500(I added
prints for output)
I investigated the code and found the reason:
The following commit came in the new kernel:
https://github.com/torvalds/linux/commit/ac6627a28dbfb5d96736544a00c3938fa7ea6dfb

And the behavior of the MTU setting has changed:

Kernel 4.9:
if (net->ipv4.sysctl_ip_fwd_use_pmtu ||
ip_mtu_locked(dst) ||
!forwarding) <--- True
return dst_mtu(dst) <--- 1982


/ 'forwarding = true' case should always honour route mtu /
mtu = dst_metric_raw(dst, RTAX_MTU);
if (mtu)
return mtu;



Kernel 5.15:

if (READ_ONCE(net->ipv4.sysctl_ip_fwd_use_pmtu) ||
ip_mtu_locked(dst) ||
!forwarding) { <--- True
mtu = rt->rt_pmtu; <--- 0
if (mtu && time_before(jiffies, rt->dst.expires)) <-- False
goto out;
}

/ 'forwarding = true' case should always honour route mtu /
mtu = dst_metric_raw(dst, RTAX_MTU); <---- 1500
if (mtu) <--- True
goto out;

As I see from the code in the end takes mtu from br_dst_default_metrics
static const u32 br_dst_default_metrics[RTAX_MAX] = {
[RTAX_MTU - 1] = 1500,
};

Why is rt_pmtu now used instead of dst_mtu?
Why is forwarding = False called with dst_metric_raw?
Maybe we should add processing when mtu = rt->rt_pmtu == 0?
Could this be an error?

If you compare ipv4_mtu code from 4.9 you will see that the very first mtu value is filled by rt->rt_pmtu value. I believe there were changes to the bridge code where rt_pmtu value got empty or cleared.

I'm still looking for the root cause of the problem, will update you once I find it.



I found a thread discussing a similar problem. It suggested porting
the next patch:
Signed-off-by: Rundong Ge <rdong.ge@xxxxxxxxx>
---
include/net/ip.h | 2 ++
1 file changed, 2 insertions(+)

diff --git a/include/net/ip.h b/include/net/ip.h
index 29d89de..0512de3 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -450,6 +450,8 @@ static inline unsigned int
ip_dst_mtu_maybe_forward(const struct dst_entry *dst,
static inline unsigned int ip_skb_dst_mtu(struct sock *sk,
const struct sk_buff *skb)
{
+ if ((skb_dst(skb)->flags & DST_FAKE_RTABLE) && skb->dev)
+ return min(skb->dev->mtu, IP_MAX_MTU);
if (!sk || !sk_fullsock(sk) || ip_sk_use_pmtu(sk)) {
bool forwarding = IPCB(skb)->flags & IPSKB_FORWARDED;


Why was this patch not accepted in the end?