Re: [PATCH net] veth: Fix RX stats for bpf_redirect_peer() traffic

From: Daniel Borkmann
Date: Tue Oct 31 2023 - 18:20:50 EST


On 10/31/23 8:53 PM, Jakub Kicinski wrote:
On Mon, 30 Oct 2023 15:19:26 +0100 Daniel Borkmann wrote:
Since I didn't want to update host-veth's TX counters. If we
bpf_redirect_peer()ed a packet from NIC TC ingress to Pod-veth TC ingress,
I think it means we've bypassed host-veth TX?

Yes. So the idea is to transition to tstats replace the location where
we used to bump lstats with tstat's tx counter, and only the peer redirect
would bump the rx counter.. then upon stats traversal we fold the latter into
the rx stats which was populated by the opposite's tx counters. Makes sense.

OT: does cadvisor run inside the Pod to collect the device stats? Just
curious how it gathers them.

Somewhat related - where does netkit count stats?

Yeap, it needs it as well, I have a local branch here where I pushed all
of it - coming out soon; I was planning to add some selftests in addition
till end of this week:

https://github.com/cilium/linux/commits/pr/ndo_peer

Definitely no new stats ndo resp indirect call in fast path.

Yeah, I think I'll put a comment saying that all devices that support
BPF_F_PEER must use tstats (or must use lstats), then.

sgtm.

Is comment good enough? Can we try to do something more robust?
Move the allocation of stats into the core at registration based
on some u8 assigned in the driver? (I haven't looked at the code TBH)
Hm, not sure. One thing that comes to mind is lazy one-time allocation
like in case of netdev_core_stats_inc(), so whenever one of the helpers
like dev_sw_netstats_{rx,tx}_add() are called and dev->tstats are still
NULL, the core knows about the driver's intent, but tbh that doesn't
feel overly clean and in case of netdev_core_stats_inc() it's more in
the exception case rather than fast-path.

Other option could be to have two small helpers in the core which then
set a flag as well:

static inline int netdev_tstats_alloc(struct net_device *dev)
{
dev->tstats = netdev_alloc_pcpu_stats(struct pcpu_sw_netstats);
if (!dev->tstats)
return -ENOMEM;
dev->priv_flags |= IFF_USES_TSTATS;
return 0;
}

static inline void netdev_tstats_free(struct net_device *dev)
{
free_percpu(dev->tstats);
}

They can then be used from .ndo_init/uninit - not sure if this would
be overall nicer.. or just leaving it at the .ndo callback comment for
the time being until really more users show up (which I doubt tbh).

Thanks,
Daniel