Re: [PATCH] net: tcp_drop adds `reason` parameter for tracing v2

From: Brendan Gregg
Date: Thu Aug 26 2021 - 01:13:49 EST


On Thu, Aug 26, 2021 at 1:20 PM Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
>
> On Wed, 25 Aug 2021 08:47:46 -0700
> Eric Dumazet <edumazet@xxxxxxxxxx> wrote:
>
> > > @@ -5703,15 +5700,15 @@ static bool tcp_validate_incoming(struct sock *sk, struct sk_buff *skb,
> > > TCP_INC_STATS(sock_net(sk), TCP_MIB_INERRS);
> > > NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPSYNCHALLENGE);
> > > tcp_send_challenge_ack(sk, skb);
> > > - goto discard;
> > > + tcp_drop(sk, skb, TCP_DROP_MASK(__LINE__, TCP_VALIDATE_INCOMING));
> >
> > I'd rather use a string. So that we can more easily identify _why_ the
> > packet was drop, without looking at the source code
> > of the exact kernel version to locate line number 1057
> >
> > You can be sure that we will get reports in the future from users of
> > heavily modified kernels.
> > Having to download a git tree, or apply semi-private patches is a no go.
> >
> > If you really want to include __FILE__ and __LINE__, these both can be
> > stringified and included in the report, with the help of macros.
>
> I agree the __LINE__ is pointless, but if this has a tracepoint
> involved, then you can simply enable the stacktrace trigger to it and
> it will save a stack trace in the ring buffer for you.
>
> echo stacktrace > /sys/kernel/tracing/events/tcp/tcp_drop/trigger
>
> And when the event triggers it will record a stack trace. You can also
> even add a filter to do it only for specific reasons.
>
> echo 'stacktrace if reason == 1' > /sys/kernel/tracing/events/tcp/tcp_drop/trigger
>
> And it even works for flags:
>
> echo 'stacktrace if reason & 0xa' > /sys/kernel/tracing/events/tcp/tcp_drop/trigger
>
> Which gives another reason to use an enum over a string.

You can't do string comparisons? The more string support Ftrace has,
the more convenient they will be. Using bpftrace as an example of
convenience and showing drop frequency counted by human-readable
reason and stack trace:

# bpftrace -e 'k:tcp_drop { @[str(arg2), kstack] = count(); }'

Don't need further translation beyond the str(arg2). And filtering on
backlog drops:

bpftrace -e 'k:tcp_drop /str(arg2) == "SYN backlog drop"/ { @[kstack]
= count(); }'

etc. (Although ultimately we'll want a tracepoint added in tcp_drop
with those arguments.) If it's a enum I'll need to translate it back,
and deal with enum additions that my tool might not be coded for. I
can do it, it just needs maintenance, e.g. [0]. Plus the kernel code
needs maintenance. For a narrow observability use case, it starts to
feel like overkill to maintain an enum.

I wouldn't mind an optional _additional_ reason argument that's the
enum SNMP counter if appropriate. E.g.:

tcpdrop(sk, skb, "Accept backlog full", LINUX_MIB_LISTENDROPS);
tcpdrop(sk, skb, "No route", LINUX_MIB_LISTENDROPS);

So you could trace LINUX_MIB_LISTENDROPS and see different string
reasons for each different code path.

I don't feel strongly about having __LINE__. I'd look it up from the
stack trace anyway.

Brendan

[0] https://github.com/brendangregg/bpf-perf-tools-book/blob/master/originals/Ch16_Hypervisors/kvmexits.bt