Re: [PATCH net-next v2] tcp: extend tcp_retransmit_skb tracepoint with failure reasons

From: Eric Dumazet
Date: Tue Jul 01 2025 - 12:38:41 EST


On Tue, Jul 1, 2025 at 2:42 AM <xu.xin16@xxxxxxxxxx> wrote:
>
> From: Fan Yu <fan.yu9@xxxxxxxxxx>
>
> Background
> ==========
> When TCP retransmits a packet due to missing ACKs, the
> retransmission may fail for various reasons (e.g., packets
> stuck in driver queues, sequence errors, or routing issues).
>
> The original tcp_retransmit_skb tracepoint:
> 'commit e086101b150a ("tcp: add a tracepoint for tcp retransmission")'
> lacks visibility into these failure causes, making production
> diagnostics difficult.
>
> Solution
> =======
> Adds a 'reason' field to the tcp_retransmit_skb tracepoint,
> enumerating with explicit failure cases:
> TCP_RETRANS_ERR_DEFAULT (retransmit terminate unexpectedly)
> TCP_RETRANS_IN_HOST_QUEUE (packet still queued in driver)
> TCP_RETRANS_END_SEQ_ERROR (invalid end sequence)
> TCP_RETRANS_TRIM_HEAD_NOMEM (trim head no memory)
> TCP_RETRANS_UNCLONE_NOMEM (skb unclone keeptruesize no memory)
> TCP_RETRANS_FRAG_NOMEM (fragment no memory)
> TCP_RETRANS_ROUTE_FAIL (routing failure)
> TCP_RETRANS_RCV_ZERO_WINDOW (closed recevier window)
> TCP_RETRANS_PSKB_COPY_NOBUFS (no buffer for skb copy)
>
> Functionality
> =============
> Enables users to know why some tcp retransmission quitted and filter
> retransmission failures by reason.
>
>

...

> +enum tcp_retransmit_quit_reason {
> + TCP_RETRANS_ERR_DEFAULT,
> + TCP_RETRANS_SUCCESS,
> + TCP_RETRANS_IN_HOST_QUEUE,
> + TCP_RETRANS_END_SEQ_ERROR,
> + TCP_RETRANS_TRIM_HEAD_NOMEM,
> + TCP_RETRANS_UNCLONE_NOMEM,
> + TCP_RETRANS_FRAG_NOMEM,
> + TCP_RETRANS_ROUTE_FAIL,
> + TCP_RETRANS_RCV_ZERO_WINDOW,
> + TCP_RETRANS_PSKB_COPY_NOBUFS,
> +};
> +
> #define tcp_sk(ptr) container_of_const(ptr, struct tcp_sock, inet_conn.icsk_inet.sk)
>
> /* Variant of tcp_sk() upgrading a const sock to a read/write tcp socket.
> diff --git a/include/trace/events/tcp.h b/include/trace/events/tcp.h
> index 54e60c6009e3..3e24740d759e 100644
> --- a/include/trace/events/tcp.h
> +++ b/include/trace/events/tcp.h
> @@ -13,17 +13,38 @@
> #include <linux/sock_diag.h>
> #include <net/rstreason.h>
>
> -/*
> - * tcp event with arguments sk and skb
> - *
> - * Note: this class requires a valid sk pointer; while skb pointer could
> - * be NULL.
> - */
> -DECLARE_EVENT_CLASS(tcp_event_sk_skb,
> +#define TCP_RETRANSMIT_QUIT_REASON \
> + ENUM(TCP_RETRANS_ERR_DEFAULT, "retransmit terminate unexpectedly") \
> + ENUM(TCP_RETRANS_SUCCESS, "retransmit successfully") \
> + ENUM(TCP_RETRANS_IN_HOST_QUEUE, "packet still queued in driver") \
> + ENUM(TCP_RETRANS_END_SEQ_ERROR, "invalid end sequence") \
> + ENUM(TCP_RETRANS_TRIM_HEAD_NOMEM, "trim head no memory") \
> + ENUM(TCP_RETRANS_UNCLONE_NOMEM, "skb unclone keeptruesize no memory") \
> + ENUM(TCP_RETRANS_FRAG_NOMEM, "fragment no memory") \

Do we really need 3 + 1 different 'NOMEMORY' status ?

> + ENUM(TCP_RETRANS_ROUTE_FAIL, "routing failure") \
> + ENUM(TCP_RETRANS_RCV_ZERO_WINDOW, "closed recevier window") \

receiver

> + ENUMe(TCP_RETRANS_PSKB_COPY_NOBUFS, "no buffer for skb copy") \

-> another NOMEM...

> +
> +


> + __entry->quit_reason = quit_reason;
> ),
>
> - TP_printk("skbaddr=%p skaddr=%p family=%s sport=%hu dport=%hu saddr=%pI4 daddr=%pI4 saddrv6=%pI6c daddrv6=%pI6c state=%s",
> + TP_printk("skbaddr=%p skaddr=%p family=%s sport=%hu dport=%hu saddr=%pI4 daddr=%pI4 saddrv6=%pI6c daddrv6=%pI6c state=%s quit_reason=%s",

quit_reason is really weird, since most retransmits are a success.

What about using : status or result ?

Also, for scripts parsing the output, I would try to keep key=val
format (no space in @val), and concise 'vals'