Re: [PATCH 01/31] mm, vmstat: add infrastructure for per-node vmstats

From: Minchan Kim
Date: Mon Jul 04 2016 - 19:49:36 EST


On Fri, Jul 01, 2016 at 09:01:09PM +0100, Mel Gorman wrote:
> VM statistic counters for reclaim decisions are zone-based. If the kernel
> is to reclaim on a per-node basis then we need to track per-node
> statistics but there is no infrastructure for that. The most notable
> change is that the old node_page_state is renamed to
> sum_zone_node_page_state. The new node_page_state takes a pglist_data and
> uses per-node stats but none exist yet. There is some renaming such as
> vm_stat to vm_zone_stat and the addition of vm_node_stat and the renaming
> of mod_state to mod_zone_state. Otherwise, this is mostly a mechanical
> patch with no functional change. There is a lot of similarity between the
> node and zone helpers which is unfortunate but there was no obvious way of
> reusing the code and maintaining type safety.
>
> Signed-off-by: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx>
> Acked-by: Johannes Weiner <hannes@xxxxxxxxxxx>
> Acked-by: Vlastimil Babka <vbabka@xxxxxxx>
> ---

<snip>

> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index 7997f52935c9..90b0737ee4be 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -86,8 +86,10 @@ void vm_events_fold_cpu(int cpu)
> *
> * vm_stat contains the global counters
> */
> -atomic_long_t vm_stat[NR_VM_ZONE_STAT_ITEMS] __cacheline_aligned_in_smp;
> -EXPORT_SYMBOL(vm_stat);
> +atomic_long_t vm_zone_stat[NR_VM_ZONE_STAT_ITEMS] __cacheline_aligned_in_smp;
> +atomic_long_t vm_node_stat[NR_VM_NODE_STAT_ITEMS] __cacheline_aligned_in_smp;
> +EXPORT_SYMBOL(vm_zone_stat);
> +EXPORT_SYMBOL(vm_node_stat);
>
> #ifdef CONFIG_SMP
>
> @@ -172,13 +174,17 @@ void refresh_zone_stat_thresholds(void)
> int threshold;
>
> for_each_populated_zone(zone) {
> + struct pglist_data *pgdat = zone->zone_pgdat;
> unsigned long max_drift, tolerate_drift;
>
> threshold = calculate_normal_threshold(zone);
>
> - for_each_online_cpu(cpu)
> + for_each_online_cpu(cpu) {
> per_cpu_ptr(zone->pageset, cpu)->stat_threshold
> = threshold;
> + per_cpu_ptr(pgdat->per_cpu_nodestats, cpu)->stat_threshold
> + = threshold;
> + }

I didn't see other patches yet so it might fix it then.

per_cpu_nodestats is per node not zone but it use per-zone threshold
and even overwritten by next zones. I don't think it's not intended.