I am working on a simplistic allocator for alloc_percpu whichInteresting. Slab internally uses lots of large per-cpu arrays. Alltogether something like around 40 kB/cpu. Right now implemented with NR_CPUs pointers. In the long run I'll try to switch to your allocator.
1. Minimises cache footprint (simple pointer arithmetic to get to each cpus version
2. Does numa aware allocation
3. Does not fragment
4. Is simple and extends simple pointer arithmetic to get to cpus offsets
I wouldn't be using the slab at all because using slabs would mean using
NR_CPUs pointers and one extra dereference which is bad as we had found out
earlier. But I guess slab will have to do node local allocations for
other applications.