Re: [PATCHSET x86/core/percpu] improve the first percpu chunkallocation

From: Ingo Molnar
Date: Tue Feb 24 2009 - 07:41:18 EST

Next message: Mark Brown: "Re: [patch 2.6.29-rc3-git 1/2] regulator: twl4030 regulators"
Previous message: Carlos R. Mafra: "Re: iwlagn warning in 2.6.29-rc6"
In reply to: Tejun Heo: "Re: [PATCHSET x86/core/percpu] improve the first percpu chunk allocation"
Next in thread: Tejun Heo: "Re: [PATCHSET x86/core/percpu] improve the first percpu chunk allocation"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

* Tejun Heo <tj@xxxxxxxxxx> wrote:

> Hello, Ingo.
>
> Ingo Molnar wrote:
> > Hm, i think there still must be some basic misunderstanding
> > somewhere here. Let me describe the design i described in the
> > previous mail in more detail.
> >
> > In one of your changelogs you state:
> >
> > | On NUMA, embedding allocator can't be used as different
> > | units can't be made to fall in the correct NUMA nodes.
> >
> > This is a direct consequence of the unit/chunk abstraction,
>
> Not at all. That's an optimization for !NUMA. The remap
> allocator is what can be done on NUMA. Chunking or not
> doesn't make any difference in this regard. The only
> difference between chunking and not chunking is whether
> separately allocated percpu offsets have more or less holes
> inbetween them, which is irrelevant for all purposes.

It's not an optimization, it's a pessimisation :)

Please read what i wrote to you. We want the percpu static and
dynamic areas to be _one and the same thing_. (With just the
different that static allocations have a handy compile-time
offset shortcut - but the access is still the same.)

Right now, with your latest code we still have this:

* Use this to get to a cpu's version of the per-cpu object
* dynamically allocated. Non-atomic access to the current CPU's
* version should probably be combined with get_cpu()/put_cpu().
*/
#define per_cpu_ptr(ptr, cpu) SHIFT_PERCPU_PTR((ptr), per_cpu_offset((cpu)))

This slows down per_cpu_ptr() and makes the dynamic percpu case
a second-class citizen because most actual usages are for the
current CPU, still have to go via the per_cpu_offset()
indirection.

I.e. we have things like:

const int cpu = get_cpu();

u8 *scratch = *per_cpu_ptr(ipcomp_scratches, cpu);

Instead of a straight:

u8 *scratch = *this_cpu_ptr(ipcomp_scratches);

We cannot do that optimization due to the NUMA and SMP
assymetry. If NUMA and SMP had the same linear structure, as i
suggested we do, we could do it.

Currently you rely on per_cpu_offset() indirection basically as
a soft-TLB entry covering all dynamic allocations. That sucks.

Ok?

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Mark Brown: "Re: [patch 2.6.29-rc3-git 1/2] regulator: twl4030 regulators"
Previous message: Carlos R. Mafra: "Re: iwlagn warning in 2.6.29-rc6"
In reply to: Tejun Heo: "Re: [PATCHSET x86/core/percpu] improve the first percpu chunk allocation"
Next in thread: Tejun Heo: "Re: [PATCHSET x86/core/percpu] improve the first percpu chunk allocation"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]