Re: [PATCH 0/2] Separate NUMA statistics from zone statistics

From: kemi
Date: Tue Aug 22 2017 - 21:15:54 EST




On 2017å08æ23æ 05:22, Christopher Lameter wrote:
> Can we simple get rid of the stats or make then configurable (off by
> defaut)? I agree they are rarely used and have been rarely used in the past.
>

I agree that we can make numa stats as well as other stats items that are rarely
used configurable. Perhaps we can introduce a general mechanism to hide such unimportant
stats(suggested by *Dave Hansen* initially), it works like this:

when performance is not important and when you want all tooling to work, you set:

sysctl vm.strict_stats=1

but if you can tolerate some possible tool breakage and some decreased
counter precision, you can do:

sysctl vm.strict_stats=0

What's your idea for that? I can help to implement it later.

But it may not a good idea to simply get rid of such kinds of stats.

> Maybe some instrumentation for perf etc will allow
> similar statistics these days? Thus its possible to drop them?
>
> The space in the pcp pageset is precious and we should strive to use no
> more than a cacheline for the diffs.
>
>

Andi has helped to explain it very clearly. Thanks very much.

For 64-bit OS:
base with this patch(even include numa_threshold)
sizeof(struct per_cpu_pageset) 88 96

Copy the discussion before from another email thread in case you missed it:

> Hi Mel
> I am refreshing this patch. Would you pls be more explicit of what "that
> structure" indicates.
> If you mean "struct per_cpu_pageset", for 64 bits machine, this structure
> still occupies two caches line after extending s8 to s16/u16, that should
> not be a problem.

You're right, I was in error. I miscalculated badly initially. It still
fits in as expected.

> For 32 bits machine, we probably does not need to extend
> the size of vm_numa_stat_diff[] since 32 bits OS nearly not be used in large
> numa system, and s8/u8 is large enough for it, in this case, we can keep the
> same size of "struct per_cpu_pageset".
>

I don't believe it's worth the complexity of making this
bitness-specific. 32-bit takes penalties in other places and besides,
32-bit does not necessarily mean a change in cache line size.

Fortunately, I think you should still be able to gain a bit more with
some special casing the fact it's always incrementing and always do full
spill of the counters instead of half. If so, then using u16 instead of
s16 should also reduce the update frequency. However, if you find it's
too complex and the gain is too marginal then I'll ack without it.