[RFC] cpualloc: improvements to per-cpu allocation

From: Rusty Russell
Date: Tue Dec 30 2008 - 18:08:35 EST


This finally uses the "static" per-cpu region for alloc_percpu. It includes
a number of cleanups required to make that happen, and some cleanups are done
afterwards, with many more to come (eg. local_t is now more useful, if we sort
the semantics of what it should look like).

http://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-cpualloc

commit 6b24141396c424dd3b9979fc1b915a51636f6ad6
Author: Rusty Russell <rusty@xxxxxxxxxxxxxxx>
Date: Wed Dec 31 09:21:46 2008 +1030

alloc_percpu: big_percpu_alloc

Impact: New API

We never allowed the dynamic per-cpu allocator to use the per-cpu region,
waiting until that region became resizable. That was over 5 years ago.

Christoph Lameter has demonstrated that small dynamic per-cpu allocations
make sense, and most allocations are small. So we want to use the same
efficient per-cpu access paths for alloc_percpu() memory.

But some places really do allocate large amounts of percpu memory. We
give them a separate API (equivalent to the current inefficient one).

Signed-off-by: Rusty Russell <rusty@xxxxxxxxxxxxxxx>

commit b9b67f3c818b617bb47344a8574de8bcd5f6b294
Author: Rusty Russell <rusty@xxxxxxxxxxxxxxx>
Date: Wed Dec 31 09:21:46 2008 +1030

alloc_percpu: Move SNMP and ip_rt_acct to big_percpu

If we look at percpu allocations to boot an x86/32 "allyesconfig"
kernel, we see the following:

File and line Number Size Total
net/ipv4/af_inet.c:1287 21 2048 43008
net/ipv4/af_inet.c:1290 21 2048 43008
net/ipv4/af_inet.c:1287 48 128 6144
net/ipv4/af_inet.c:1290 48 128 6144
net/ipv4/route.c:3258 1 4096 4096
net/ipv4/af_inet.c:1287 1 288 288
net/ipv4/af_inet.c:1290 1 288 288
net/ipv4/af_inet.c:1287 1 256 256
net/ipv4/af_inet.c:1290 1 256 256
net/ipv4/af_inet.c:1287 1 104 104
net/ipv4/af_inet.c:1290 1 104 104
Subtotal: 103696

Total (including other callers untouched by this patch) 117228

So networking is chubby: worst case about 100k per cpu for SNMP stats.
This can be reduced, but it's still 88% of the percpu memory, and IA64
has a hardcoded limit of 64k at the moment.

That's why these two callers get moved to big_percpu_alloc.

Note: ip_rt_acct could be simplified and made more efficient by making
it a DEFINE_PER_CPU. But DaveM made some argument about image size
and I have far too much respect to argue with BloatBoy :)

Signed-off-by: Rusty Russell <rusty@xxxxxxxxxxxxxxx>
Cc: netdev@xxxxxxxxxxxxxxx

commit d782824cdeacc530836661d98014a3872772d3a0
Author: Rusty Russell <rusty@xxxxxxxxxxxxxxx>
Date: Wed Dec 31 09:21:47 2008 +1030

alloc_percpu: remove alignment on cpu_workqueue_struct.

Since we're improving alloc_percpu, we don't need to waste space.
On 32-bit this drops struct cpu_workqueue_struct from 128 to 64 bytes.

File and line Number Size Total
Before:
kernel/workqueue.c:819 72 128 9126
After:
kernel/workqueue.c:819 72 64 4608

It's still the biggest dynamic percpu user though.

Signed-off-by: Rusty Russell <rusty@xxxxxxxxxxxxxxx>

commit 9eac9d07744882b52653d47fef6facd93d469b35
Author: Rusty Russell <rusty@xxxxxxxxxxxxxxx>
Date: Wed Dec 31 09:21:47 2008 +1030

alloc_percpu: change percpu_ptr to per_cpu_ptr

Impact: cleanup

There are two allocated per-cpu accessor macros with almost identical
spelling. The original and far more popular is per_cpu_ptr (44
files), so change over the other 4 files.

Signed-off-by: Rusty Russell <rusty@xxxxxxxxxxxxxxx>
Cc: mingo@xxxxxxxxxx
Cc: lenb@xxxxxxxxxx
Cc: cpufreq@xxxxxxxxxxxxxxx

commit 1aebde676d5a1e4965e98de401cf60e9a8e35351
Author: Rusty Russell <rusty@xxxxxxxxxxxxxxx>
Date: Wed Dec 31 09:21:48 2008 +1030

alloc_percpu: add align argument to __alloc_percpu.

This prepares for a real __alloc_percpu, by adding an alignment argument.
Only one place uses __alloc_percpu directly, and that's for a string.

Signed-off-by: Rusty Russell <rusty@xxxxxxxxxxxxxxx>
Cc: Christoph Lameter <cl@xxxxxxxxxxxxxxxxxxxx>
Cc: Jens Axboe <axboe@xxxxxxxxx>

commit ba26c860d5d47896aee45ec5f13c0d7bd21a784a
Author: Rusty Russell <rusty@xxxxxxxxxxxxxxx>
Date: Wed Dec 31 09:21:48 2008 +1030

alloc_percpu: make the per cpu reserve configurable and larger.

This is based on Christoph Lameter's "cpualloc: make the per cpu
reserve configurable" patch, and his "Increase default reserve percpu
area" patch.

Christoph did the hard work of figuring out what the number should be
(based on converting the slub allocator). allyesconfig on x86-32 uses
7k before mounting root, so 10k seems reasonable.

Signed-off-by: Rusty Russell <rusty@xxxxxxxxxxxxxxx>
Cc: Christoph Lameter <cl@xxxxxxxxxxxxxxxxxxxx>

commit f39b80d1009c1611f791b665640e8b41191733f2
Author: Rusty Russell <rusty@xxxxxxxxxxxxxxx>
Date: Wed Dec 31 09:21:49 2008 +1030

alloc_percpu: make percpu_modalloc/modfree more generic

Remove the "name" arg to percpu_modalloc, and make it zero memory.
Make percpu_modfree take NULL without barfing.
Make non-SMP versions do kzalloc/kfree.

These trivial changes make it suitable for use as a general per-cpu
allocator.

Signed-off-by: Rusty Russell <rusty@xxxxxxxxxxxxxxx>
Cc: Christoph Lameter <cl@xxxxxxxxxxxxxxxxxxxx>

commit 98bd1519536ef62d283cee221232c3c2d6157218
Author: Rusty Russell <rusty@xxxxxxxxxxxxxxx>
Date: Wed Dec 31 09:21:49 2008 +1030

alloc_percpu: expose percpu_modalloc and percpu_modfree

This simply moves the percpu allocator functions from the module code
to mm/allocpercpu.c. percpu_modinit is renamed percpu_alloc_init and
called from init/main.c.

(Note: this allocator will need to be weaned off krealloc for use in
the slab allocator itself as Christoph does in one of his patches).

Signed-off-by: Rusty Russell <rusty@xxxxxxxxxxxxxxx>
Cc: Christoph Lameter <cl@xxxxxxxxxxxxxxxxxxxx>

commit 5fe633a3432021d0cf719ab238f6c936ef99d5ee
Author: Rusty Russell <rusty@xxxxxxxxxxxxxxx>
Date: Wed Dec 31 09:21:50 2008 +1030

alloc_percpu: switch over to (renamed) percpu_modfree.

Now the switch: rename percpu_modalloc to __alloc_percpu, and
percpu_modfree to free_percpu and export them.

We delete the old ones, including the unused percpu_alloc,
percpu_alloc_mask and percpu_ptr.

per_cpu_ptr now uses RELOC_HIDE on per_cpu_offset, just like static
per-cpu variables (not SHIFT_PERCPU_PTR: this has side effects on
S/390 and Alpha).

The Alpha changes are untested.

Signed-off-by: Rusty Russell <rusty@xxxxxxxxxxxxxxx>
Cc: Christoph Lameter <cl@xxxxxxxxxxxxxxxxxxxx>

commit 5f930541571ae751b5b0639c0294c3109f76933f
Author: Rusty Russell <rusty@xxxxxxxxxxxxxxx>
Date: Wed Dec 31 09:21:50 2008 +1030

alloc_percpu: documentation

Since we can now endorse this interface without wincing, we should
document it. Nothing has changed API-wise, but it's a nice cleanup.

Signed-off-by: Rusty Russell <rusty@xxxxxxxxxxxxxxx>
Cc: Christoph Lameter <cl@xxxxxxxxxxxxxxxxxxxx>

commit 987afcb643485258568e02cd94e2b906969e080e
Author: Rusty Russell <rusty@xxxxxxxxxxxxxxx>
Date: Wed Dec 31 09:21:51 2008 +1030

alloc_percpu: __get_cpu_ptr/get_cpu_ptr/put_cpu_ptr

Now we have a decent implementation it makes sense to have an
interface for "this cpu", analogous to __get_cpu_var.

Alpha is untested.

Signed-off-by: Rusty Russell <rusty@xxxxxxxxxxxxxxx>
Cc: Christoph Lameter <cl@xxxxxxxxxxxxxxxxxxxx>

commit ae6f86a1492dd62523ef8c75455b3b784799cefa
Author: Rusty Russell <rusty@xxxxxxxxxxxxxxx>
Date: Wed Dec 31 09:21:52 2008 +1030

alloc_percpu: read_percpu_var / read_percpu_ptr

Impact: New API

get_cpu_var/get_cpu_ptr return lvalues, but for ia64 and x86/32 (and one
day x86/64) there are more efficient ways if you just want an rvalue.

The x86-specific versions have proven popular in that arch, so I expect
they'll be popular generally.

So introduce read_percpu_var and read_percpu_ptr. They are separate,
because when we enlarge percpu areas IA64 is not going to be able to
do its trick on abitrary per-cpu pointers, only vars (which are in the
first 64k).

Signed-off-by: Rusty Russell <rusty@xxxxxxxxxxxxxxx>

commit defbdecc2b971063c6311db91f0be6007306a9bd
Author: Rusty Russell <rusty@xxxxxxxxxxxxxxx>
Date: Wed Dec 31 09:21:52 2008 +1030

alloc_percpu: rename percpu vars which cause name clashes.

Currently DECLARE_PER_CPU vars have per_cpu__ prefixed to them, and
this effectively puts them in a separate namespace. No surprise that
they clash with other names when that prefix is removed.

There may be others I've missed, but if so the transform is simple.

Signed-off-by: Rusty Russell <rusty@xxxxxxxxxxxxxxx>

commit a56f30e8a075e8ce15b4dff8e3fe0edb9baff7ea
Author: Rusty Russell <rusty@xxxxxxxxxxxxxxx>
Date: Wed Dec 31 09:21:53 2008 +1030

alloc_percpu: remove per_cpu__ prefix.

Now that the return from alloc_percpu is compatible with the address
of per-cpu vars, it makes sense to hand around the address of per-cpu
variables. To make this sane, we remove the per_cpu__ prefix we used
created to stop people accidentally using these vars directly.

Now we have sparse, we can use that (next patch).

Signed-off-by: Rusty Russell <rusty@xxxxxxxxxxxxxxx>

commit a3c45e59bb71da1a013e0063469c523fca50295b
Author: Rusty Russell <rusty@xxxxxxxxxxxxxxx>
Date: Wed Dec 31 09:21:53 2008 +1030

alloc_percpu: use __percpu annotation for sparse.

Add __percpu for sparse.

We have to make __kernel "__attribute__((address_space(0)))" so we can
cast to it.

Signed-off-by: Rusty Russell <rusty@xxxxxxxxxxxxxxx>
Cc: Al Viro <viro@xxxxxxxxxxxxxxxxxx>

commit 1d8e4bd04436d08d66807bfc772f6adadf0eadf8
Author: Rusty Russell <rusty@xxxxxxxxxxxxxxx>
Date: Wed Dec 31 09:21:54 2008 +1030

alloc_percpu: Use __get_cpu_ptr in block/

Impact: slight efficiency improvement on some archs.

Let's use __get_cpu_ptr and get_cpu_ptr now, rather than
per_cpu_ptr(..., smp_processor_id()).

Signed-off-by: Rusty Russell <rusty@xxxxxxxxxxxxxxx>
Cc: Jens Axboe <axboe@xxxxxxxxx>

commit 7de8836a4c54bb950932508a2396b822bc41f939
Author: Rusty Russell <rusty@xxxxxxxxxxxxxxx>
Date: Wed Dec 31 09:21:54 2008 +1030

alloc_percpu: Use __get_cpu_ptr in crypto/

Impact: slight efficiency improvement on some archs.

Let's use __get_cpu_ptr and get_cpu_ptr now, rather than
per_cpu_ptr(..., smp_processor_id()).

Signed-off-by: Rusty Russell <rusty@xxxxxxxxxxxxxxx>
Cc: Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx>

commit 21307114ab782c98ae172f4bd1db6685de8f0269
Author: Rusty Russell <rusty@xxxxxxxxxxxxxxx>
Date: Wed Dec 31 09:21:55 2008 +1030

alloc_percpu: Use __get_cpu_ptr in drivers/dma

Impact: slight efficiency improvement on some archs.

Let's use __get_cpu_ptr and get_cpu_ptr now, rather than
per_cpu_ptr(..., smp_processor_id()).

Signed-off-by: Rusty Russell <rusty@xxxxxxxxxxxxxxx>
Cc: Maciej Sosnowski <maciej.sosnowski@xxxxxxxxx>
Cc: Dan Williams <dan.j.williams@xxxxxxxxx>

commit c88c3d8ed7eea6bc06d1f128c86bc5f309d71235
Author: Rusty Russell <rusty@xxxxxxxxxxxxxxx>
Date: Wed Dec 31 09:21:55 2008 +1030

alloc_percpu: Use __raw_get_cpu_ptr in fs/ext4

Impact: slight efficiency improvement on some archs.

Let's use __raw_get_cpu_ptr now, rather than per_cpu_ptr(...,
raw_smp_processor_id()).

Signed-off-by: Rusty Russell <rusty@xxxxxxxxxxxxxxx>
Cc: Theodore Ts'o <tytso@xxxxxxx>
Cc: adilger@xxxxxxx
Cc: linux-ext4@xxxxxxxxxxxxxxx

commit 010674876450701d458cfb15d8d4c70388603cf6
Author: Rusty Russell <rusty@xxxxxxxxxxxxxxx>
Date: Wed Dec 31 09:21:56 2008 +1030

alloc_percpu: Use get_cpu_ptr in fs/nfs

Impact: slight efficiency improvement on some archs.

Let's use get_cpu_ptr_ptr now, rather than per_cpu_ptr(..., smp_processor_id()).

Note that now we have real dynamic per-cpu allocations, cacheline aligning
should not win anything, and is a little antisocial.

Signed-off-by: Rusty Russell <rusty@xxxxxxxxxxxxxxx>
Cc: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx>
Cc: linux-nfs@xxxxxxxxxxxxxxx

commit b456282af0715bd986d57cba6a629a5b71ca7692
Author: Rusty Russell <rusty@xxxxxxxxxxxxxxx>
Date: Wed Dec 31 09:21:56 2008 +1030

alloc_percpu: Use __get_cpu_ptr in fs/xfs

Impact: slight efficiency improvement on some archs.

Let's use get_cpu_ptr now, rather than per_cpu_ptr(..., smp_processor_id()).

Signed-off-by: Rusty Russell <rusty@xxxxxxxxxxxxxxx>
Cc: Tim Shimmin <xfs-masters@xxxxxxxxxxx>
Cc: xfs@xxxxxxxxxxx

commit ea59384cda8cbeb40c5749f3b070df346667f7a7
Author: Rusty Russell <rusty@xxxxxxxxxxxxxxx>
Date: Wed Dec 31 09:21:57 2008 +1030

alloc_percpu: Use __get_cpu_ptr / get_cpu_ptr / get_cpu_var in kernel

Impact: slight efficiency improvement on some archs.

Let's use __get_cpu_ptr and get_cpu_ptr now, rather than
per_cpu_ptr(..., smp_processor_id()).

Signed-off-by: Rusty Russell <rusty@xxxxxxxxxxxxxxx>
Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Cc: Ingo Molnar <mingo@xxxxxxx>
Cc: Dipankar Sarma <dipankar@xxxxxxxxxx>

commit 9ba7023fdebca58d35f970bb66fbcd91403a9ab7
Author: Rusty Russell <rusty@xxxxxxxxxxxxxxx>
Date: Wed Dec 31 09:21:57 2008 +1030

alloc_percpu: Use __get_cpu_ptr in networking

Impact: slight efficiency improvement on some archs.

Let's use __get_cpu_ptr and get_cpu_ptr now, rather than
per_cpu_ptr(..., smp_processor_id()).

(The nfconntrack code seems a bit confused over when raw_smp_processor_id()
should be used: not here).

Signed-off-by: Rusty Russell <rusty@xxxxxxxxxxxxxxx>
Cc: Hoang-Nam Nguyen <hnguyen@xxxxxxxxxx>
Cc: Christoph Raisch <raisch@xxxxxxxxxx>
Cc: netdev@xxxxxxxxxxxxxxx


Documentation/kernel-parameters.txt | 8 +
arch/alpha/include/asm/percpu.h | 21 ++-
arch/cris/arch-v10/kernel/entry.S | 2 +-
arch/cris/arch-v32/mm/mmu.S | 2 +-
arch/ia64/include/asm/percpu.h | 5 +-
arch/ia64/kernel/ia64_ksyms.c | 4 +-
arch/ia64/mm/discontig.c | 2 +-
arch/parisc/lib/fixup.S | 8 +-
arch/powerpc/platforms/pseries/hvCall.S | 2 +-
arch/sparc64/kernel/rtrap.S | 8 +-
arch/x86/include/asm/percpu.h | 24 ++-
arch/x86/include/asm/timer.h | 5 +-
arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c | 2 +-
arch/x86/kernel/entry_64.S | 4 +-
arch/x86/kernel/head_32.S | 2 +-
arch/x86/kernel/head_64.S | 2 +-
arch/x86/kernel/tsc.c | 4 +-
arch/x86/xen/xen-asm_32.S | 4 +-
block/blktrace.c | 4 +-
crypto/async_tx/async_tx.c | 5 +-
drivers/acpi/processor_perflib.c | 4 +-
drivers/dma/dmaengine.c | 32 ++--
drivers/infiniband/hw/ehca/ehca_irq.c | 3 +-
drivers/net/chelsio/sge.c | 5 +-
drivers/net/loopback.c | 4 +-
drivers/net/veth.c | 7 +-
fs/ext4/mballoc.c | 2 +-
fs/nfs/iostat.h | 10 +-
fs/xfs/xfs_mount.c | 9 +-
include/asm-generic/percpu.h | 60 ++++++-
include/linux/compiler.h | 4 +-
include/linux/percpu.h | 117 +++++++-----
include/net/netfilter/nf_conntrack.h | 4 +-
include/net/netfilter/nf_conntrack_ecache.h | 2 +-
include/net/snmp.h | 12 +-
init/main.c | 3 +
kernel/lockdep.c | 11 +-
kernel/module.c | 154 +---------------
kernel/posix-cpu-timers.c | 2 +-
kernel/sched.c | 21 +-
kernel/sched_stats.h | 4 +-
kernel/softirq.c | 4 +-
kernel/softlockup.c | 12 +-
kernel/srcu.c | 4 +-
kernel/stop_machine.c | 2 +-
kernel/workqueue.c | 6 +-
mm/allocpercpu.c | 265 ++++++++++++++++-----------
mm/vmstat.c | 6 +-
net/core/sock.c | 3 +-
net/ipv4/af_inet.c | 14 +-
net/ipv4/ip_input.c | 3 +-
net/ipv4/route.c | 4 +-
net/netfilter/nf_conntrack_ecache.c | 4 +-
net/xfrm/xfrm_ipcomp.c | 20 +--
54 files changed, 469 insertions(+), 466 deletions(-)

¢éì®&Þ~º&¶¬–+-±éÝ¥Šw®žË±Êâmébžìdz¹Þ)í…æèw*jg¬±¨¶‰šŽŠÝj/êäz¹ÞŠà2ŠÞ¨è­Ú&¢)ß«a¶Úþø®G«éh®æj:+v‰¨Šwè†Ù>Wš±êÞiÛaxPjØm¶Ÿÿà -»+ƒùdš_