Re: [RFC PATCH 1/2] percpu_counter: Allow falling back to global counter on large system

From: Waiman Long
Date: Mon Mar 07 2016 - 14:50:40 EST


On 03/07/2016 01:24 PM, Christoph Lameter wrote:
On Fri, 4 Mar 2016, Waiman Long wrote:

This patch provides a mechanism to selectively degenerate per-cpu
counters to global counters at per-cpu counter initialization time. The
following new API is added:

percpu_counter_set_limit(struct percpu_counter *fbc,
u32 percpu_limit)

The function should be called after percpu_counter_set(). It will
compare the total limit (nr_cpu * percpu_limit) against the current
counter value. If the limit is not smaller, it will disable per-cpu
counter and use only the global counter instead. At run time, when
the counter value grows past the total limit, per-cpu counter will
be enabled again.
Hmmm... That is requiring manual setting of a limit. Would it not be
possible to completely automatize the switch over? F.e. one could
keep a cpumask of processors that use the per cpu counters.

The limit is usually the batch size used or a multiple of it.

Then in the fastpath if the current cpu is a member increment the per cpu
counter. If not do the spinlock thing. If there is contention add the
cpu to the cpumask and use the per cpu counters. Thus automatically
scaling for the processors on which frequent increments are operating.

That is an interesting idea. I will do some prototyping and see how it goes. One of the downside that I see is the increase in the size of the percpu_counter structure.

Then regularly (once per minute or so) degenerate the counter by folding
the per cpu diffs into the global count and zapping the cpumask.

Actually, I think we need 2 cpumasks - one for deciding to use global or percpu count and another one for which percpu counts are used as it is not safe to change a per-cpu count other than your own one.

If the cpumask is empty you can use the global count. Otherwise you just
need to add up the counters of the cpus set in the cpumask.

Cheers,
Longman