Re: [patch 2/4] net: percpufy frequently used vars -- structproto.memory_allocated

From: Andrew Morton
Date: Tue Mar 07 2006 - 22:22:09 EST


Ravikiran G Thirumalai <kiran@xxxxxxxxxxxx> wrote:
>
> On Tue, Mar 07, 2006 at 06:14:22PM -0800, Andrew Morton wrote:
> > Ravikiran G Thirumalai <kiran@xxxxxxxxxxxx> wrote:
> > >
> > > - if (atomic_read(sk->sk_prot->memory_allocated) < sk->sk_prot->sysctl_mem[0]) {
> > > + if (percpu_counter_read(sk->sk_prot->memory_allocated) <
> > > + sk->sk_prot->sysctl_mem[0]) {
> >
> > Bear in mind that percpu_counter_read[_positive] can be inaccurate on large
> > CPU counts.
> >
> > It might be worth running percpu_counter_sum() to get the exact count if we
> > think we're about to cause something to fail.
>
> The problem is percpu_counter_sum has to read all the cpus cachelines. If
> we have to use percpu_counter_sum everywhere, then might as well use plain
> per-cpu counters instead of batching ones no?

I didn't say "use it everywhere" ;)

Just in places like this:

if (percpu_counter_read(something) > something_else)
make_an_application_fail();

in that case it's worth running percpu_counter_sum(). And bear in mind
that once we've done that, the following percpu_counter_read()s become
accurate, so we won't run the expensive percpu_counter_sum() again
for a while. Unless we're really close to or over the limit, in which case
blowing a few cycles is relatively unimportant.

All that should be captured in library code (per_cpu_counter_exceeds(ptr,
threshold), for example) rather than open-coded everywhere.

> sysctl_mem[0] is about 196K and on a 16 cpu box variance is 512 bytes, which
> is OK with just percpu_counter_read I hope.

You mean a 16 CPU box with NR_CPUS=16 as well...

> Maybe, on very large cpu counts,
> we should just change the FBC_BATCH so that variance does not go quadratic.
> Something like 32. So that variance is 32 * NR_CPUS in that case, instead
> of (NR_CPUS * NR_CPUS * 2) currently. Comments?

Sure, we need to make that happen. But it got all mixed up with the
spinlock removal and it does need quite some thought and testing and
documentation to help developers to choose the right settings and
appropriate selection of defaults, etc.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/