Re: [PATCH AUTOSEL 4.19 04/42] netfilter: conntrack: always store window size un-scaled

From: Thomas Jarosch
Date: Thu Aug 08 2019 - 05:07:44 EST


Hello together,

You wrote on Fri, Aug 02, 2019 at 09:22:24AM -0400:
> From: Florian Westphal <fw@xxxxxxxxx>
>
> [ Upstream commit 959b69ef57db00cb33e9c4777400ae7183ebddd3 ]
>
> Jakub Jankowski reported following oddity:
>
> After 3 way handshake completes, timeout of new connection is set to
> max_retrans (300s) instead of established (5 days).
>
> shortened excerpt from pcap provided:
> 25.070622 IP (flags [DF], proto TCP (6), length 52)
> 10.8.5.4.1025 > 10.8.1.2.80: Flags [S], seq 11, win 64240, [wscale 8]
> 26.070462 IP (flags [DF], proto TCP (6), length 48)
> 10.8.1.2.80 > 10.8.5.4.1025: Flags [S.], seq 82, ack 12, win 65535, [wscale 3]
> 27.070449 IP (flags [DF], proto TCP (6), length 40)
> 10.8.5.4.1025 > 10.8.1.2.80: Flags [.], ack 83, win 512, length 0
>
> Turns out the last_win is of u16 type, but we store the scaled value:
> 512 << 8 (== 0x20000) becomes 0 window.
>
> The Fixes tag is not correct, as the bug has existed forever, but
> without that change all that this causes might cause is to mistake a
> window update (to-nonzero-from-zero) for a retransmit.
>
> Fixes: fbcd253d2448b8 ("netfilter: conntrack: lower timeout to RETRANS seconds if window is 0")
> Reported-by: Jakub Jankowski <shasta@xxxxxxxxxxx>
> Tested-by: Jakub Jankowski <shasta@xxxxxxxxxxx>
> Signed-off-by: Florian Westphal <fw@xxxxxxxxx>
> Acked-by: Jozsef Kadlecsik <kadlec@xxxxxxxxxxxxxxxxx>
> Signed-off-by: Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx>
> Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>

Also:
Tested-by: Thomas Jarosch <thomas.jarosch@xxxxxxxxxxxxx>

;)

We've hit the issue with the wrong conntrack timeout at two different sites,
long-lived connections to a SAP server over IPSec VPN were constantly dropping.

For us this was a regression after updating from kernel 3.14 to 4.19.
Yesterday I've applied the patch to kernel 4.19.57 and the problem is fixed.

The issue was extra hard to debug as we could just boot the new kernel
for twenty minutes in the evening on these productive systems.

The stable kernel patch from last Friday came right on time. I was just
about the replay the TCP connection with tcpreplay, so this saved
me from another week of debugging. Thanks everyone!

Best regards,
Thomas Jarosch