Re: [PATCH net] tcp: fix keepalive when data remain undelivered

From: Yuchung Cheng
Date: Fri Feb 19 2021 - 21:26:01 EST


On Fri, Feb 19, 2021 at 4:54 PM Enke Chen <enkechen2020@xxxxxxxxx> wrote:
>
> From: Enke Chen <enchen@xxxxxxxxxxxxxxxxxxxx>
>
> TCP keepalive does not timeout under the condition that network connection
> is lost and data remain undelivered (incl. retransmit). A very simple
> scenarios of the failure is to write data to a tcp socket after the network
> connection is lost.

AFAIK current Linux TCP KA implementation does not violate the
RFC793bis (Section 3.7.4 Keep-Alives)
https://tools.ietf.org/html/draft-ietf-tcpm-rfc793bis-20#section-3.7.4

We have even raised that in IETF tcpm list to get more clarity
https://mailarchive.ietf.org/arch/msg/tcpm/KxcEsLtDuDNhcP8UjbyPkJqpzsE/

I believe you interpret the keep-alive differently -- so this is
arguably a subjective "bug-fix". As Neal and I have expressed in our
private communications, current Linux KA has been implemented for more
than a decade. Changing this behavior may break many existing
applications even if it may benefit certain.

>
> Under the specified condition the keepalive timeout is not evaluated in
> the keepalive timer. That is the primary cause of the failure. In addition,
> the keepalive probe is not sent out in the keepalive timer. Although packet
> retransmit or 0-window probe can serve a similar purpose, they have their
> own timers and backoffs that are generally not aligned with the keepalive
> parameters for probes and timeout.
>
> As the timing and conditions of the events involved are random, the tcp
> keepalive can fail randomly. Given the randomness of the failures, fixing
> the issue would not cause any backward compatibility issues. As was well
> said, "Determinism is a special case of randomness".
>
> The fix in this patch consists of the following:
>
> a. Always evaluate the keepalive timeout in the keepalive timer.
>
> b. Always send out the keepalive probe in the keepalive timer (post the
> keepalive idle time). Given that the keepalive intervals are usually
> in the range of 30 - 60 seconds, there is no need for an optimization
> to further reduce the number of keepalive probes in the presence of
> packet retransmit.
>
> c. Use the elapsed time (instead of the 0-window probe counter) in
> evaluating tcp keepalive timeout.
>
> Thanks to Eric Dumazet, Neal Cardwell, and Yuchung Cheng for helpful
> discussions about the issue and options for fixing it.
>
> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2 Initial git repository build")
> Signed-off-by: Enke Chen <enchen@xxxxxxxxxxxxxxxxxxxx>
> ---
> net/ipv4/tcp_timer.c | 20 ++++++--------------
> 1 file changed, 6 insertions(+), 14 deletions(-)
>
> diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
> index 4ef08079ccfa..16a044da20db 100644
> --- a/net/ipv4/tcp_timer.c
> +++ b/net/ipv4/tcp_timer.c
> @@ -708,29 +708,23 @@ static void tcp_keepalive_timer (struct timer_list *t)
> ((1 << sk->sk_state) & (TCPF_CLOSE | TCPF_SYN_SENT)))
> goto out;
>
> - elapsed = keepalive_time_when(tp);
> -
> - /* It is alive without keepalive 8) */
> - if (tp->packets_out || !tcp_write_queue_empty(sk))
> - goto resched;
> -
> elapsed = keepalive_time_elapsed(tp);
>
> if (elapsed >= keepalive_time_when(tp)) {
> /* If the TCP_USER_TIMEOUT option is enabled, use that
> * to determine when to timeout instead.
> */
> - if ((icsk->icsk_user_timeout != 0 &&
> - elapsed >= msecs_to_jiffies(icsk->icsk_user_timeout) &&
> - icsk->icsk_probes_out > 0) ||
> - (icsk->icsk_user_timeout == 0 &&
> - icsk->icsk_probes_out >= keepalive_probes(tp))) {
> + u32 timeout = icsk->icsk_user_timeout ?
> + msecs_to_jiffies(icsk->icsk_user_timeout) :
> + keepalive_intvl_when(tp) * keepalive_probes(tp) +
> + keepalive_time_when(tp);
> +
> + if (elapsed >= timeout) {
> tcp_send_active_reset(sk, GFP_ATOMIC);
> tcp_write_err(sk);
> goto out;
> }
> if (tcp_write_wakeup(sk, LINUX_MIB_TCPKEEPALIVE) <= 0) {
> - icsk->icsk_probes_out++;
> elapsed = keepalive_intvl_when(tp);
> } else {
> /* If keepalive was lost due to local congestion,
> @@ -744,8 +738,6 @@ static void tcp_keepalive_timer (struct timer_list *t)
> }
>
> sk_mem_reclaim(sk);
> -
> -resched:
> inet_csk_reset_keepalive_timer (sk, elapsed);
> goto out;
>
> --
> 2.29.2
>