Re: [PATCHSET x86/core/percpu] improve the first percpu chunk allocation

From: Nick Piggin
Date: Tue Feb 24 2009 - 10:31:09 EST


On Wednesday 25 February 2009 02:19:20 Ingo Molnar wrote:
> * Tejun Heo <tj@xxxxxxxxxx> wrote:
> > Hi,
> >
> > Ingo Molnar wrote:
> > --snip--
> >
> > > So what i'm saying is that these are strong reasons for us to
> > > want to make the unit size to be something like 2MB - on 64-bit
> > > x86 at least.
> > >
> > > ( Using a 2MB unit size will also have another advantage: _iff_
> > > we can still allocate a hugepage at that point we can map it
> > > straight there when extending the dynamic area. )
> >
> > Thanks for the explanation. Yeap, it would be nice to have
> > units aligned on 2MB boundary. We'll need to add @align to vm
> > area alloc function to do it correctly. As for using large
> > page, it would be nice if we can do that automatically.
> > Upfront 2MB unit allocation is probably too expensive but
> > merging 4k pages into a large page (if we can get them) will
> > add a lot of irregular latency too. Hmmm...
>
> Yeah, largepage support - if we ever get there (the chances of
> finding a proper 2MB aligned 2MB sized chunk of physical memory
> are not very good except the first few minutes of uptime),
> should indeed be automatic to all get_vm_area() users -
> vmalloc(), ioremap() and now percpu.c.

The problem is that it doesn't always know what the callers want.
It would be trivial (there is already support in the virtual
address allocator) to specify alignment. Then kernel page table
setup could presumably use larger size mappings if it is given
contiguous memory.


> I think a far more realistic angle to utilize more of the 2MB
> TLB will be to gradually increase PERCPU_ENOUGH_ROOM, as we
> observe more and more percpu_alloc() sites in the kernel. Right
> now it's pretty rare so going beyond the 8K we do for modules
> would probably be a waste of RAM.

It might possibly be useful for smaller NUMA machines using
hashdist for the big early hashes (hash sizes scale log, but
number of nodes tends to scale linearly with memory size, so
2MB per node for hashes would probably be far too much on big
machines).

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/