Re: [PATCH AUTOSEL 4.19 04/42] netfilter: conntrack: always store window size un-scaled

From: Reindl Harald
Date: Wed Aug 14 2019 - 06:19:15 EST


that's still not in 5.2.8

without the exception and "nf_conntrack_tcp_timeout_max_retrans = 60" a
vnc-over-ssh session having the VNC view in the background freezes
within 60 secods

-----------------------------------------------------------------------------------------------
IPV4 TABLE MANGLE (STATEFUL PRE-NAT/FILTER)
-----------------------------------------------------------------------------------------------
Chain PREROUTING (policy ACCEPT 100 packets, 9437 bytes)
num pkts bytes target prot opt in out source
destination
1 6526 3892K ACCEPT all -- * * 0.0.0.0/0
0.0.0.0/0 ctstate RELATED,ESTABLISHED
2 125 6264 ACCEPT all -- lo * 0.0.0.0/0
0.0.0.0/0
3 64 4952 ACCEPT all -- vmnet8 * 0.0.0.0/0
0.0.0.0/0
4 1 40 DROP all -- * * 0.0.0.0/0
0.0.0.0/0 ctstate INVALID

-------- Weitergeleitete Nachricht --------
Betreff: [PATCH AUTOSEL 5.2 07/76] netfilter: conntrack: always store
window size un-scaled

Am 08.08.19 um 11:02 schrieb Thomas Jarosch:
> Hello together,
>
> You wrote on Fri, Aug 02, 2019 at 09:22:24AM -0400:
>> From: Florian Westphal <fw@xxxxxxxxx>
>>
>> [ Upstream commit 959b69ef57db00cb33e9c4777400ae7183ebddd3 ]
>>
>> Jakub Jankowski reported following oddity:
>>
>> After 3 way handshake completes, timeout of new connection is set to
>> max_retrans (300s) instead of established (5 days).
>>
>> shortened excerpt from pcap provided:
>> 25.070622 IP (flags [DF], proto TCP (6), length 52)
>> 10.8.5.4.1025 > 10.8.1.2.80: Flags [S], seq 11, win 64240, [wscale 8]
>> 26.070462 IP (flags [DF], proto TCP (6), length 48)
>> 10.8.1.2.80 > 10.8.5.4.1025: Flags [S.], seq 82, ack 12, win 65535, [wscale 3]
>> 27.070449 IP (flags [DF], proto TCP (6), length 40)
>> 10.8.5.4.1025 > 10.8.1.2.80: Flags [.], ack 83, win 512, length 0
>>
>> Turns out the last_win is of u16 type, but we store the scaled value:
>> 512 << 8 (== 0x20000) becomes 0 window.
>>
>> The Fixes tag is not correct, as the bug has existed forever, but
>> without that change all that this causes might cause is to mistake a
>> window update (to-nonzero-from-zero) for a retransmit.
>>
>> Fixes: fbcd253d2448b8 ("netfilter: conntrack: lower timeout to RETRANS seconds if window is 0")
>> Reported-by: Jakub Jankowski <shasta@xxxxxxxxxxx>
>> Tested-by: Jakub Jankowski <shasta@xxxxxxxxxxx>
>> Signed-off-by: Florian Westphal <fw@xxxxxxxxx>
>> Acked-by: Jozsef Kadlecsik <kadlec@xxxxxxxxxxxxxxxxx>
>> Signed-off-by: Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx>
>> Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>
>
> Also:
> Tested-by: Thomas Jarosch <thomas.jarosch@xxxxxxxxxxxxx>
>
> ;)
>
> We've hit the issue with the wrong conntrack timeout at two different sites,
> long-lived connections to a SAP server over IPSec VPN were constantly dropping.
>
> For us this was a regression after updating from kernel 3.14 to 4.19.
> Yesterday I've applied the patch to kernel 4.19.57 and the problem is fixed.
>
> The issue was extra hard to debug as we could just boot the new kernel
> for twenty minutes in the evening on these productive systems.
>
> The stable kernel patch from last Friday came right on time. I was just
> about the replay the TCP connection with tcpreplay, so this saved
> me from another week of debugging. Thanks everyone!