Re: [PATCH] net: openvswitch: Use for_each_cpu_from() where appropriate

From: Yury Norov
Date: Thu Aug 14 2025 - 17:05:42 EST


On Thu, Aug 14, 2025 at 10:49:30PM +0200, Ilya Maximets wrote:
> On 8/14/25 9:58 PM, Yury Norov wrote:
> > From: Yury Norov (NVIDIA) <yury.norov@xxxxxxxxx>
> >
> > Openvswitch opencodes for_each_cpu_from(). Fix it and drop some
> > housekeeping code.
> >
> > Signed-off-by: Yury Norov (NVIDIA) <yury.norov@xxxxxxxxx>
> > ---
> > net/openvswitch/flow.c | 14 ++++++--------
> > net/openvswitch/flow_table.c | 8 ++++----
> > 2 files changed, 10 insertions(+), 12 deletions(-)
> >
> > diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
> > index b80bd3a90773..b464ab120731 100644
> > --- a/net/openvswitch/flow.c
> > +++ b/net/openvswitch/flow.c
> > @@ -129,15 +129,14 @@ void ovs_flow_stats_get(const struct sw_flow *flow,
> > struct ovs_flow_stats *ovs_stats,
> > unsigned long *used, __be16 *tcp_flags)
> > {
> > - int cpu;
> > + /* CPU 0 is always considered */
> > + unsigned int cpu = 1;
>
> Hmm. I'm a bit confused here. Where is CPU 0 considered if we start
> iteration from 1?

I didn't touch this part of the original comment, as you see, and I'm
not a domain expert, so don't know what does this wording mean.

Most likely 'always considered' means that CPU0 is not accounted in this
statistics.

> > *used = 0;
> > *tcp_flags = 0;
> > memset(ovs_stats, 0, sizeof(*ovs_stats));
> >
> > - /* We open code this to make sure cpu 0 is always considered */
> > - for (cpu = 0; cpu < nr_cpu_ids;
> > - cpu = cpumask_next(cpu, flow->cpu_used_mask)) {
> > + for_each_cpu_from(cpu, flow->cpu_used_mask) {
>
> And why it needs to be a for_each_cpu_from() and not just for_each_cpu() ?

The original code explicitly ignores CPU0. If we use for_each_cpu(),
it would ignore initial value in 'cpu'. Contrary, for_each_cpu_from()
does respect it.

> Note: the original logic here came from using for_each_node() back when
> stats were collected per numa, and it was important to check node 0 when
> the system didn't have it, so the loop was open-coded, see commit:
> 40773966ccf1 ("openvswitch: fix flow stats accounting when node 0 is not possible")
>
> Later the stats collection was changed to be per-CPU instead of per-NUMA,
> th eloop was adjusted to CPUs, but remained open-coded, even though it
> was probbaly safe to use for_each_cpu() macro here, as it accepts the
> mask and doesn't limit it to available CPUs, unlike the for_each_node()
> macro that only iterates over possible NUMA node numbers and will skip
> the zero. The zero is importnat, because it is used as long as only one
> core updates the stats, regardless of the number of that core, AFAIU.
>
> So, the comments in the code do not really make a lot of sense, especially
> in this patch.

I can include CPU0 and iterate over it, but it would break the existing
logic. The intention of my work is to minimize direct cpumask_next()
usage over the kernel, and as I said I'm not a domain expert here.

Let's wait for more comments. If it's indeed a bug in current logic,
I'll happily send v2.

Thanks,
Yury