Re: [PATCH net-next 0/1] mlx5: Add netdev-genl queue stats

From: Tariq Toukan
Date: Wed May 08 2024 - 17:40:16 EST




On 06/05/2024 21:04, Joe Damato wrote:
On Fri, May 03, 2024 at 05:34:29PM -0700, Jakub Kicinski wrote:
On Fri, 3 May 2024 16:53:40 -0700 Joe Damato wrote:
diff --git a/include/net/netdev_queues.h b/include/net/netdev_queues.h
index c7ac4539eafc..f5d9f3ad5b66 100644
--- a/include/net/netdev_queues.h
+++ b/include/net/netdev_queues.h
@@ -59,6 +59,8 @@ struct netdev_queue_stats_tx {
* statistics will not generally add up to the total number of events for
* the device. The @get_base_stats callback allows filling in the delta
* between events for currently live queues and overall device history.
+ * @get_base_stats can also be used to report any miscellaneous packets
+ * transferred outside of the main set of queues used by the networking stack.
* When the statistics for the entire device are queried, first @get_base_stats
* is issued to collect the delta, and then a series of per-queue callbacks.
* Only statistics which are set in @get_base_stats will be reported


SG?

I think that sounds good and makes sense, yea. By that definition, then I
should leave the PTP stats as shown above. If you agree, I'll add that
to the v2.

Yup, agreed.

I feel like I should probably wait before sending a v2 with PTP included in
get_base_stats to see if the Mellanox folks have any hints about why rtnl
!= queue stats on mlx5?

What do you think?

Very odd, the code doesn't appear to be doing any magic :S Did you try
to print what the delta in values is? Does bringing the interface up and
down affect the size of it?

I booted the kernel which includes PTP stats in the base stats as you've
suggested (as shown in the diff in this thread) and I've brought the
interface down and back up:

$ sudo ip link set dev eth0 down
$ sudo ip link set dev eth0 up

Re ran the test script, which includes some mild debugging print out I
added to show the delta for rx-packets (but I think all stats are off):

# Exception| Exception: Qstats are lower, fetched later

key: rx-packets rstat: 1192281902 qstat: 1186755777
key: rx-packets rstat: 1192281902 qstat: 1186755781

So qstat is lower by (1192281902 - 1186755781) = 5,526,121

Not really sure why, but I'll take another look at the code this morning to
see if I can figure out what's going on.

I'm clearly doing something wrong or misunderstanding something about the
accounting that will seem extremely obvious in retrospect.

Hi Joe,

Thanks for your patch.
Apologies for the late response. I was on PTO for some time.

From first look the patch looks okay. The overall approach seems correct.

The off-channels queues (like PTP) do not exist in default. So they are out of the game unless you explicitly enables them.

A possible reason for this difference is the queues included in the sum.
Our stats are persistent across configuration changes, so they doesn't reset when number of channels changes for example.

We keep stats entries for al ring indices that ever existed. Our driver loops and sums up the stats for all of them, while the stack loops only up to the current netdev->real_num_rx_queues.

Can this explain the diff here?