Hmmm..I knew from some experiments earlier that access to per cpu versionsNo, not fast path. But it can happen a few thousand times. The slab implementation failed due to heavy internal fragmentation. If your code runs fine with a few thousand users, then there shouldn't be a problem.
of memory was slow with the slab based implementation -- which this patch
addresses, but I didn't know allocs themselves were slow...
Creation of a disk should not be a fast path no?
For non-NUMA systems, I would use get_free_pages() to allocate a multi-page area instead of map_vm_area(). Typically, get_free_pages() is backed by large pte memory and map_vm_area() by normal virtual memory.That means no large pte entries for the per-cpu allocations, right?
I think that's a bad idea for non-numa systems. What about a fallback to simple getfreepages() for non-numa systems?
Can we have large pte entries with PAGE_SIZEd pages?