Re: [PATCH v13 2/6] mm/vmstat: Use vmstat_dirty to track CPU-specific vmstat discrepancies

From: Marcelo Tosatti
Date: Wed Jan 11 2023 - 12:10:12 EST


On Wed, Jan 11, 2023 at 09:42:18AM +0100, Christoph Lameter wrote:
> On Tue, 10 Jan 2023, Marcelo Tosatti wrote:
>
> > > The basic primitives add a lot of weight.
> >
> > Can't see any alternative given the necessity to avoid interruption
> > by the work to sync per-CPU vmstats to global vmstats.
>
> this_cpu operations are designed to operate on a *single* value (a counter) and can
> be run on an arbitrary cpu, There is no preemption or interrupt
> disable required since the counters of all cpus will be added up at the
> end.
>
> You want *two* values (the counter and the dirty flag) to be modified
> together and want to use the counters/flag to identify the cpu where
> these events occurred. this_cpu_xxx operations are not suitable for that
> purpose. You would need a way to ensure that both operations occur on the
> same cpu.

Which is either preempt_disable (CONFIG_HAVE_CMPXCHG_LOCAL case), or
local_irq_disable (!CONFIG_HAVE_CMPXCHG_LOCAL case).

> > > > And the pre cpu atomic updates operations require the modification
> > > of multiple values. The operation
> > > cannot be "atomic" in that sense anymore and we need some other form of
> > > synchronization that can
> > > span multiple instructions.
> >
> > So use this_cpu_cmpxchg() to avoid the overhead. Since we can no longer
> > count on preremption being disabled we still have some minor issues.
> > The fetching of the counter thresholds is racy.
> > A threshold from another cpu may be applied if we happen to be
> > rescheduled on another cpu. However, the following vmstat operation
> > will then bring the counter again under the threshold limit.
> >
> > Those small issues are gone, OTOH.
>
> Well you could use this_cpu_cmpxchg128 to update a 64 bit counter and a
> flag at the same time.

But then you transform the "per-CPU vmstat is dirty" bit (bool) into a
number of flags that must be scanned (when returning to userspace).

Which increases the overhead of a fast path (return to userspace).

> Otherwise you will have to switch off preemption or
> interrupts when incrementing the counters and updating the dirty flag.
>
> Thus you do not really need the this_cpu operations anymore. It would
> best to use a preempt_disable section and uuse C operators -- ++ for the
> counter and do regular assignment for the flag.

OK, can replace this_cpu operations with this_cpu_ptr + standard C operators
(and in fact can do that for interrupt disabled functions as well, that
is CONFIG_HAVE_CMPXCHG_LOCAL not defined).

Is that it?