Re: [RFC 00/15] x86_64: Optimize percpu accesses

From: Eric W. Biederman
Date: Thu Jul 10 2008 - 14:52:23 EST


Mike Travis <travis@xxxxxxx> writes:

> Eric W. Biederman wrote:
> ...
>> Another alternative that almost fares better then a segment with
>> a base of zero is a base of -32K or so. Only trouble that would get us
>> manually managing the per cpu area size again.
>
> One thing to remember is the eventual goal is implementing the cpu_alloc
> functions which I think we've agreed has to be "growable". This means that
> the addresses will need to be virtual to allow the same offsets for all cpus.
> The patchset I have uses 2Mb pages. This "little" twist might figure into the
> implementation issues that are being discussed.

I had not heard that.

However if you are going to use 2MB pages you might was well just use a
physical address at the start of a node. 2MB is so much larger then
the size of the per cpu memory we need today it isn't even funny.

To get 32K I had to round up on my current system, and honestly it is
important that per cpu data stay relatively small as otherwise the system
won't have memory to use for anything interesting.

I just took a quick look at our alloc_percpu calls. At a first glance
they all appear to be for relatively small data structures. So we can
just about get away with doing what we do today for modules for everything.
The question is what to do when we fill up our preallocated size for percpu
data.

I think we can get away with just simply realloc'ing the percpu area
on each cpu. No fancy table manipulations required. Just update
the base pointer in %gs and in someplace global.

If you do use virtual addresses really requires using 4K pages, so we
can benefit from non-contiguous allocations. I just can't imagine
the per cpu area getting up to 2MB in size, where you would need
multiple 2MB pages. That is a huge jump from the 32KB I see today.


For the rest mostly I have been making a list of things that we can do
that could work. A zero based percpu area is great if you can
eliminate it from suspicion of your weird random failures.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/