Re: [BUG] v4.20 - bridge not getting DHCP responses? (works in 4.19.13)

From: Ian Kumlien
Date: Wed Jan 09 2019 - 19:39:09 EST


Confirmed, sending a new mail with summary etc

On Thu, Jan 10, 2019 at 1:16 AM Ian Kumlien <ian.kumlien@xxxxxxxxx> wrote:
>
> On Wed, Jan 9, 2019 at 12:17 AM Ian Kumlien <ian.kumlien@xxxxxxxxx> wrote:
> > On Wed, Jan 9, 2019, 00:09 Florian Fainelli <f.fainelli@xxxxxxxxx wrote:
>
> [--8<---]
>
> >> > when looking at "git log v4.19...v4.20
> >> > drivers/net/ethernet/intel/ixgbe/" nothing else really stands out...
> >> > The machine is also running NAT for my home network and all of that
> >> > works just fine...
> >> >
> >> > I started with tcpdump, prooving that packets reached all the way
> >> > outside but replies never made it, reboorting
> >> > with 4.19.13 resulted in replies appearing in the tcpdump.
> >> >
> >> > I don't quite know where to look - and what can i do to test - i tried
> >> > disabling all offloading (due to the UDP
> >> > offloading changes) but nothing helped...
> >> >
> >> > Ideas? Patches? ;)
> >>
> >> Running a bisection would certainly help find the offending commit if
> >> that is something that you can do?
> >
> > I was hoping for a likely suspect but this was on my "Todo" for Friday night anyway... (And I already started testing with some patches reversed)
>
> So after lengthy git bisect sections, both from the latest stable i
> was using (not the best of ideas)
> and from 4.19.
>
> The latest stable yielded 72b0094f918294e6cb8cf5c3b4520d928fbb1a57 -
> which is incorrect...
>
> However, the proper bisect gave me this:
> fb420d5d91c1274d5966917725e71f27ed092a85 is the first bad commit
> commit fb420d5d91c1274d5966917725e71f27ed092a85
> Author: Eric Dumazet <edumazet@xxxxxxxxxx>
> Date: Fri Sep 28 10:28:44 2018 -0700
>
> tcp/fq: move back to CLOCK_MONOTONIC
>
> In the recent TCP/EDT patch series, I switched TCP and sch_fq
> clocks from MONOTONIC to TAI, in order to meet the choice done
> earlier for sch_etf packet scheduler.
>
> But sure enough, this broke some setups were the TAI clock
> jumps forward (by almost 50 year...), as reported
> by Leonard Crestez.
>
> If we want to converge later, we'll probably need to add
> an skb field to differentiate the clock bases, or a socket option.
>
> In the meantime, an UDP application will need to use CLOCK_MONOTONIC
> base for its SCM_TXTIME timestamps if using fq packet scheduler.
>
> Fixes: 72b0094f9182 ("tcp: switch tcp_clock_ns() to CLOCK_TAI base")
> Fixes: 142537e41923 ("net_sched: sch_fq: switch to CLOCK_TAI")
> Fixes: fd2bca2aa789 ("tcp: switch internal pacing timer to CLOCK_TAI")
> Signed-off-by: Eric Dumazet <edumazet@xxxxxxxxxx>
> Reported-by: Leonard Crestez <leonard.crestez@xxxxxxx>
> Tested-by: Leonard Crestez <leonard.crestez@xxxxxxx>
> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx>
>
> :040000 040000 06615f5ed4486fd0af77a8fb59775a9f2346aebc
> 7f883c7753cb3d5d881e0edbef2989f4e6db6a1f M include
> :040000 040000 767c5e93fe5cfd609f90834d93978511c284ea01
> cc47bd361516622c0b21602e188181fdfc6b2995 M net
> ----
>
> Which could actually be the culprit - I'm having problems *with* UDP
> traffic (DHCP) and I am using fq
>
> Lets hope it's so, since this was kinda boring:
> ls /lib/modules |grep 4.19.0 |wc -l
> 27
>
> Testing 4.20.1 and then 4.20.1 with the suspected patch reverted, will
> report shortly!