Re: Packet gets stuck in NOLOCK pfifo_fast qdisc

From: Kehuan Feng
Date: Thu Sep 03 2020 - 23:21:21 EST


Hi Hillf, Cong, Paolo,

Sorry for the late reply due to other urgent task.

I tried Hillf's patch (shown below on my tree) and it doesn't help and
the jitter shows up very quickly.

--- ./include/net/sch_generic.h.orig 2020-08-21 15:13:51.787952710 +0800
+++ ./include/net/sch_generic.h 2020-09-04 10:48:32.081217156 +0800
@@ -108,6 +108,7 @@

spinlock_t busylock ____cacheline_aligned_in_smp;
spinlock_t seqlock;
+ int run, seq;
};

static inline void qdisc_refcount_inc(struct Qdisc *qdisc)
@@ -127,8 +128,11 @@
static inline bool qdisc_run_begin(struct Qdisc *qdisc)
{
if (qdisc->flags & TCQ_F_NOLOCK) {
+ qdisc->run++;
+ smp_wmb();
if (!spin_trylock(&qdisc->seqlock))
return false;
+ qdisc->seq = qdisc->run;
} else if (qdisc_is_running(qdisc)) {
return false;
}
@@ -143,8 +147,15 @@
static inline void qdisc_run_end(struct Qdisc *qdisc)
{
write_seqcount_end(&qdisc->running);
- if (qdisc->flags & TCQ_F_NOLOCK)
+ if (qdisc->flags & TCQ_F_NOLOCK) {
+ int seq = qdisc->seq;
+
spin_unlock(&qdisc->seqlock);
+ smp_rmb();
+ if (seq != qdisc->run)
+ __netif_schedule(qdisc);
+
+ }
}


I also tried Cong's patch (shown below on my tree) and it could avoid
the issue (stressing for 30 minutus for three times and not jitter
observed).

--- ./include/net/sch_generic.h.orig 2020-08-21 15:13:51.787952710 +0800
+++ ./include/net/sch_generic.h 2020-09-03 21:36:11.468383738 +0800
@@ -127,8 +127,7 @@
static inline bool qdisc_run_begin(struct Qdisc *qdisc)
{
if (qdisc->flags & TCQ_F_NOLOCK) {
- if (!spin_trylock(&qdisc->seqlock))
- return false;
+ spin_lock(&qdisc->seqlock);
} else if (qdisc_is_running(qdisc)) {
return false;
}

I am not actually know what you are discussing above. It seems to me
that Cong's patch is similar as disabling lockless feature.

Anyway, we are going to use fq_codel instead, since CentOS 8/kernel
4.18 also uses fq_codel as the default qdisc, not sure whehter they
found some thing related to this.

Thanks,
Kehuan

Hillf Danton <hdanton@xxxxxxxx> 于2020年9月3日周四 下午6:20写道:
>
>
> On Thu, 03 Sep 2020 10:39:54 +0200 Paolo Abeni wrote:
> > On Wed, 2020-09-02 at 22:01 -0700, Cong Wang wrote:
> > > Can you test the attached one-line fix? I think we are overthinking,
> > > probably all
> > > we need here is a busy wait.
> >
> > I think that will solve, but I also think that will kill NOLOCK
> > performances due to really increased contention.
> >
> > At this point I fear we could consider reverting the NOLOCK stuff.
> > I personally would hate doing so, but it looks like NOLOCK benefits are
> > outweighed by its issues.
> >
> > Any other opinion more than welcome!
>
> Hi Paolo,
>
> I suspect it's too late to fix the -27% below.
> Surgery to cut NOLOCK seems too early before the fix.
>
> Hillf
>
> >pktgen threads vanilla patched[II] delta
> >nr kpps kpps %
> >1 3240 3240 0
> >2 3910 2830 -27%
> >4 5140 5140 0
>