Re: Packet gets stuck in NOLOCK pfifo_fast qdisc

From: Jonas Bonn
Date: Wed Jul 01 2020 - 03:53:15 EST




On 30/06/2020 21:14, Josh Hunt wrote:
On 6/23/20 6:42 AM, Michael Zhivich wrote:
From: Jonas Bonn <jonas.bonn@xxxxxxxxxxxxx>
To: Paolo Abeni <pabeni@xxxxxxxxxx>,
ÂÂÂÂ"netdev@xxxxxxxxxxxxxxx" <netdev@xxxxxxxxxxxxxxx>,
ÂÂÂÂLKML <linux-kernel@xxxxxxxxxxxxxxx>,
ÂÂÂÂ"David S . Miller" <davem@xxxxxxxxxxxxx>,
ÂÂÂÂJohn Fastabend <john.fastabend@xxxxxxxxx>
Subject: Re: Packet gets stuck in NOLOCK pfifo_fast qdisc
Date: Fri, 11 Oct 2019 02:39:48 +0200
Message-ID: <465a540e-5296-32e7-f6a6-79942dfe2618@xxxxxxxxxxxxx> (raw)
In-Reply-To: <95c5a697932e19ebd6577b5dac4d7052fe8c4255.camel@xxxxxxxxxx>

Hi Paolo,

On 09/10/2019 21:14, Paolo Abeni wrote:
Something alike the following code - completely untested - can possibly
address the issue, but it's a bit rough and I would prefer not adding
additonal complexity to the lockless qdiscs, can you please have a spin
a it?

We've tested a couple of variants of this patch today, but unfortunately
it doesn't fix the problem of packets getting stuck in the queue.

A couple of comments:

i) On 5.4, there is the BYPASS path that also needs the same treatment
as it's essentially replicating the behavour of qdisc_run, just without
the queue/dequeue steps

ii)Â We are working a lot with the 4.19 kernel so I backported to the
patch to this version and tested there. Here the solution would seem to
be more robust as the BYPASS path does not exist.

Unfortunately, in both cases we continue to see the issue of the "last
packet" getting stuck in the queue.

/Jonas

Hello Jonas, Paolo,

We have observed the same problem with pfifo_fast qdisc when sending periodic small
packets on a TCP flow with multiple simultaneous connections on a 4.19.75
kernel. We've been able to catch it in action using perf probes (see trace
below). For qdisc = 0xffff900d7c247c00, skb = 0xffff900b72c334f0,
it takes 200270us to traverse the networking stack on a system that's not otherwise busy.
qdisc only resumes processing when another enqueued packet comes in,
so the packet could have been stuck indefinitely.

ÂÂÂ proc-19902 19902 [032] 580644.045480: probe:pfifo_fast_dequeue_end: (ffffffff9b69d99d) qdisc=0xffff900d7c247c00 skb=0xffff900bfc294af0 band=2 atomic_qlen=0
ÂÂÂ proc-19902 19902 [032] 580644.045480: probe:pfifo_fast_dequeue: (ffffffff9b69d8c0) qdisc=0xffff900d7c247c00 skb=0xffffffff9b69d8c0 band=2
ÂÂÂ proc-19927 19927 [014] 580644.045480: probe:tcp_transmit_skb2: (ffffffff9b6dc4e5) skb=0xffff900b72c334f0 sk=0xffff900d62958040 source=0x4b4e dest=0x9abe
ÂÂÂ proc-19902 19902 [032] 580644.045480: probe:pfifo_fast_dequeue_end: (ffffffff9b69d99d) qdisc=0xffff900d7c247c00 skb=0x0 band=3 atomic_qlen=0
ÂÂÂ proc-19927 19927 [014] 580644.045481: probe:ip_finish_output2: (ffffffff9b6bc650) net=0xffffffff9c107c80 sk=0xffff900d62958040 skb=0xffff900b72c334f0 __func__=0x0
ÂÂÂ proc-19902 19902 [032] 580644.045481: probe:sch_direct_xmit: (ffffffff9b69e570) skb=0xffff900bfc294af0 q=0xffff900d7c247c00 dev=0xffff900d6a140000 txq=0xffff900d6a181180 root_lock=0x0 validate=1 ret=-1 again=155
ÂÂÂ proc-19927 19927 [014] 580644.045481: net:net_dev_queue: dev=eth0 skbaddr=0xffff900b72c334f0 len=115
ÂÂÂ proc-19902 19902 [032] 580644.045482: probe:pfifo_fast_dequeue: (ffffffff9b69d8c0) qdisc=0xffff900d7c247c00 skb=0xffffffff9b69d8c0 band=1
ÂÂÂ proc-19927 19927 [014] 580644.045483: probe:pfifo_fast_enqueue: (ffffffff9b69d9f0) skb=0xffff900b72c334f0 qdisc=0xffff900d7c247c00 to_free=18446622925407304000
ÂÂÂ proc-19902 19902 [032] 580644.045483: probe:pfifo_fast_dequeue_end: (ffffffff9b69d99d) qdisc=0xffff900d7c247c00 skb=0x0 band=3 atomic_qlen=0
ÂÂÂ proc-19927 19927 [014] 580644.045483: probe:pfifo_fast_enqueue_end: (ffffffff9b69da9f) skb=0xffff900b72c334f0 qdisc=0xffff900d7c247c00 to_free=0xffff91d0f67ab940 atomic_qlen=1
ÂÂÂ proc-19902 19902 [032] 580644.045484: probe:__qdisc_run_2: (ffffffff9b69ea5a) q=0xffff900d7c247c00 packets=1
ÂÂÂ proc-19927 19927 [014] 580644.245745: probe:pfifo_fast_enqueue: (ffffffff9b69d9f0) skb=0xffff900d98fdf6f0 qdisc=0xffff900d7c247c00 to_free=18446622925407304000
ÂÂÂ proc-19927 19927 [014] 580644.245745: probe:pfifo_fast_enqueue_end: (ffffffff9b69da9f) skb=0xffff900d98fdf6f0 qdisc=0xffff900d7c247c00 to_free=0xffff91d0f67ab940 atomic_qlen=2
ÂÂÂ proc-19927 19927 [014] 580644.245746: probe:pfifo_fast_dequeue: (ffffffff9b69d8c0) qdisc=0xffff900d7c247c00 skb=0xffffffff9b69d8c0 band=0
ÂÂÂ proc-19927 19927 [014] 580644.245746: probe:pfifo_fast_dequeue_end: (ffffffff9b69d99d) qdisc=0xffff900d7c247c00 skb=0xffff900b72c334f0 band=2 atomic_qlen=1
ÂÂÂ proc-19927 19927 [014] 580644.245747: probe:pfifo_fast_dequeue: (ffffffff9b69d8c0) qdisc=0xffff900d7c247c00 skb=0xffffffff9b69d8c0 band=2
ÂÂÂ proc-19927 19927 [014] 580644.245747: probe:pfifo_fast_dequeue_end: (ffffffff9b69d99d) qdisc=0xffff900d7c247c00 skb=0xffff900d98fdf6f0 band=2 atomic_qlen=0
ÂÂÂ proc-19927 19927 [014] 580644.245748: probe:pfifo_fast_dequeue: (ffffffff9b69d8c0) qdisc=0xffff900d7c247c00 skb=0xffffffff9b69d8c0 band=2
ÂÂÂ proc-19927 19927 [014] 580644.245748: probe:pfifo_fast_dequeue_end: (ffffffff9b69d99d) qdisc=0xffff900d7c247c00 skb=0x0 band=3 atomic_qlen=0
ÂÂÂ proc-19927 19927 [014] 580644.245749: qdisc:qdisc_dequeue: dequeue ifindex=5 qdisc handle=0x0 parent=0xF txq_state=0x0 packets=2 skbaddr=0xffff900b72c334f0
ÂÂÂ proc-19927 19927 [014] 580644.245749: probe:sch_direct_xmit: (ffffffff9b69e570) skb=0xffff900b72c334f0 q=0xffff900d7c247c00 dev=0xffff900d6a140000 txq=0xffff900d6a181180 root_lock=0x0 validate=1 ret=-1 again=155
ÂÂÂ proc-19927 19927 [014] 580644.245750: net:net_dev_start_xmit: dev=eth0 queue_mapping=14 skbaddr=0xffff900b72c334f0 vlan_tagged=0 vlan_proto=0x0000 vlan_tci=0x0000 protocol=0x0800 ip_summed=3 len=115 data_len=0 network_offset=14 transport_offset_valid=1 transport_offset=34 tx_flags=0 gso_size=0 gso_segs=1 gso_type=0x1

I was wondering if you had any more luck in finding a solution or workaround for this problem
(that is, aside from switching to a different qdisc)?

Thanks,
~ Michael


Jonas/Paolo

Do either of you know if there's been any development on a fix for this issue? If not we can propose something.

Hi Josh,

No, I haven't been able to do any more work on this and the affected user switched qdisc (to avoid this problem) so I lost the reliable reproducer that I had...

/Jonas


Thanks
Josh