RE: [PATCH RFC] net/sched: adjust device watchdog timer to detect stopped queue at right time

From: Praveen Kannoju
Date: Fri May 03 2024 - 10:28:40 EST


Hi Jakub,
Thank you very much for the review and positive feedback.
We've incorporated your review comments except, using "continue" instead of adding another indentation level.
Request you to elaborate your comment on it.

-
Praveen.

> -----Original Message-----
> From: Jakub Kicinski <kuba@xxxxxxxxxx>
> Sent: 02 May 2024 03:43 AM
> To: Praveen Kannoju <praveen.kannoju@xxxxxxxxxx>
> Cc: jhs@xxxxxxxxxxxx; xiyou.wangcong@xxxxxxxxx; jiri@xxxxxxxxxxx; davem@xxxxxxxxxxxxx; netdev@xxxxxxxxxxxxxxx; linux-
> kernel@xxxxxxxxxxxxxxx; Rajesh Sivaramasubramaniom <rajesh.sivaramasubramaniom@xxxxxxxxxx>; Rama Nichanamatlu
> <rama.nichanamatlu@xxxxxxxxxx>; Manjunath Patil <manjunath.b.patil@oraclecom>
> Subject: Re: [PATCH RFC] net/sched: adjust device watchdog timer to detect stopped queue at right time
>
> On Tue, 30 Apr 2024 19:30:10 +0530 Praveen Kumar Kannoju wrote:
> > Applications are sensitive to long network latency, particularly
> > heartbeat monitoring ones. Longer the tx timeout recovery higher the
> > risk with such applications on a production machines. This patch
> > remedies, yet honoring device set tx timeout.
> >
> > Modify watchdog next timeout to be shorter than the device specified.
> > Compute the next timeout be equal to device watchdog timeout less the
> > how long ago queue stop had been done. At next watchdog timeout tx
> > timeout handler is called into if still in stopped state. Either
> > called or not called, restore the watchdog timeout back to device specified.
>
> Idea makes sense, some comments on the code below.
>
> > diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c index
> > 4a2c763e2d11..64e31f8b4ac1 100644
> > --- a/net/sched/sch_generic.c
> > +++ b/net/sched/sch_generic.c
> > @@ -506,18 +506,25 @@ static void dev_watchdog(struct timer_list *t)
> > unsigned int timedout_ms = 0;
> > unsigned int i;
> > unsigned long trans_start;
> > + unsigned long next_check = 0;
> > + unsigned long current_jiffies;
> >
> > for (i = 0; i < dev->num_tx_queues; i++) {
> > struct netdev_queue *txq;
> > + current_jiffies = jiffies;
>
> Not sure why you save current jiffies.
>
> > txq = netdev_get_tx_queue(dev, i);
> > trans_start = READ_ONCE(txq->trans_start);
> > - if (netif_xmit_stopped(txq) &&
> > - time_after(jiffies, (trans_start +
> > - dev->watchdog_timeo))) {
> > - timedout_ms = jiffies_to_msecs(jiffies - trans_start);
> > - atomic_long_inc(&txq->trans_timeout);
> > - break;
> > + if (netif_xmit_stopped(txq)) {
>
> please use continue instead of adding another indentation level

We need to take decision on whether to break out of loop or modify "oldest_start" only when
Queue is stopped. Hence one more level of indentation is needed. Can you please elaborate
on using "continue" in existing condition instead of adding a new indentation level.

>
> > + if (time_after(current_jiffies, (trans_start +
>
> wrap at 80 characters
>
> > + dev->watchdog_timeo))) {
> > + timedout_ms = jiffies_to_msecs(current_jiffies -
> > + trans_start);
> > + atomic_long_inc(&txq->trans_timeout);
> > + break;
> > + }
> > + next_check = trans_start + dev->watchdog_timeo -
> > + current_jiffies;
>
> this will give us "next_check" for last queue. Let's instead find the oldest trans_start in the loop. Do:
>
> unsigned long oldest_start = jiffies;
>
> then in the loop:
>
> oldest_start = min(...)
>
> > }
> > }
> >
> > @@ -530,9 +537,11 @@ static void dev_watchdog(struct timer_list *t)
> > dev->netdev_ops->ndo_tx_timeout(dev, i);
> > netif_unfreeze_queues(dev);
> > }
> > + if (!next_check)
> > + next_check = dev->watchdog_timeo;
> > if (!mod_timer(&dev->watchdog_timer,
> > round_jiffies(jiffies +
> > - dev->watchdog_timeo)))
> > + next_check)))
>
> then here you just need to swap jiffies for oldest_start
>
> > release = false;
> > }
> > }