Re: linux 5.17.1 disregarding ACK values resulting in stalled TCP connections

From: Eric Dumazet
Date: Fri Apr 01 2022 - 09:56:26 EST


On Fri, Apr 1, 2022 at 4:36 AM Jaco Kroon <jaco@xxxxxxxxx> wrote:
>
> Hi Eric,
>
> On 2022/04/01 02:54, Eric Dumazet wrote:
> > On Thu, Mar 31, 2022 at 5:41 PM Eric Dumazet <edumazet@xxxxxxxxxx> wrote:
> >> On Thu, Mar 31, 2022 at 5:33 PM Jaco Kroon <jaco@xxxxxxxxx> wrote:
> >>
> >>> I'll deploy same on a dev host we've got in the coming week and start a
> >>> bisect process.
> >> Thanks, this will definitely help.
> > One thing I noticed in your pcap is a good amount of drops, as if
> > Hystart was not able to stop slow-start before the drops are
> > happening.
> >
> > TFO with one less RTT at connection establishment could be the trigger.
> >
> > If you are still using cubic, please try to revert.
> Sorry, I understand TCP itself a bit, but I've given up trying to
> understand the various schedulers a long time ago and am just using the
> defaults that the kernel provides. How do I check what I'm using, and
> how can I change that? What is recommended at this stage?

How to check: cat /proc/sys/net/ipv4/tcp_congestion_control"

This is of course orthogonal to the buf we are tracking here,
but given your long RTT, I would recommend using fq packet scheduler and bbr.

tc qd replace dev eth0 root fq # or use mq+fq if your NIC is multi
queue and you need a good amount of throughput

insmod tcp_bbr # (after enabling CONFIG_TCP_CONG_BBR=m)
echo bbr >/proc/sys/net/ipv4/tcp_congestion_control


> >
> >
> > commit 4e1fddc98d2585ddd4792b5e44433dcee7ece001
> > Author: Eric Dumazet <edumazet@xxxxxxxxxx>
> > Date: Tue Nov 23 12:25:35 2021 -0800
> >
> > tcp_cubic: fix spurious Hystart ACK train detections for
> > not-cwnd-limited flows
> Ok, instead of starting with bisect, if I can reproduce in dev I'll use
> this one first.

Thanks ! (again this won't fix the bug, this is really a shoot in the dark)