Re: [PATCH v13 2/6] mm/vmstat: Use vmstat_dirty to track CPU-specific vmstat discrepancies

From: Christoph Lameter
Date: Tue Jan 17 2023 - 07:54:01 EST


On Mon, 16 Jan 2023, Marcelo Tosatti wrote:

> Honestly, to me, there is no dilemma:
>
> * There is a requirement from applications to be uninterrupted
> by operating system activities. Examples include radio access
> network software, software defined PLCs for industrial automation (1).
>
> * There exists vm-statistics counters (which count
> the number of pages on different states, for example, number of
> free pages, locked pages, pages under writeback, pagetable pages,
> file pages, etc).
> To reduce number of accesses to the global counters, each CPU maintains
> its own delta relative to the global VM counters
> (which can be cached in the local processor cache, therefore fast).

The counters only count accurately as a global sum. A counter may be
specific to a zone and at which time it counts uses of that zone of from
all processors.

> Now you are objecting to this patchset because:
>
> It increases the number of cycles to execute the function to modify
> the counters by 6. Can you mention any benchmark where this
> increase is significant?

I am objecting because of a fundamental misunderstanding of how these
counters work and because the patchset is incorrect in the way it handles
these counters. Come up with a correct approach and then we can consider
regressions and/or improvements in performance.

> Alternatives:
> 1) Disable periodic synchronization for nohz_full CPUs.
> 2) Processor instructions which can modify more than
> one address in memory.
> 3) Synchronize the per-CPU stats remotely (which
> increases per-CPU and per-node accesses).

Or remove the assumptions that may exist in current code that a delta on a
specific cpu counter means that something occurred on that cpu?

If there is a delta then that *does not* mean that there is something to
do on that processor. The delta could be folded by another processor into
the global counter if that processor is idle or not entering the Kernel
and stays that way throughout the operation.

So I guess that would be #3. The function cpu_vm_stats_fold() already does
this for offline cpus. Can something similar be made to work for idle cpus
or those continually running in user space?