Re: [RFC 00/15] x86_64: Optimize percpu accesses

From: Jeremy Fitzhardinge
Date: Thu Jul 10 2008 - 13:48:32 EST


Christoph Lameter wrote:
Jeremy Fitzhardinge wrote:
The base address of the percpu area and the offsets from that base are
completely independent values.

Definitely.


The addressing modes:

* ABS
* off(%rip)

Are exactly equivalent in what offsets they can generate, so long as *at
link time* the percpu *symbols* are within 2G of the code addressing
them. *After* the addressing mode has generated an effective address
(by whatever means it likes), the %gs: override applies the segment
base, which can therefore offset the effective address to anywhere at all.

Right. The problem is with the percpu area handled by the linker. That percpu area is used by the boot cpu and later we setup other additional per cpu areas. Those can be placed in an arbitrary way if one goes through a table of pointers to these areas.

Yes, but the offset is the same either way. When you want a cpu to refer to its own percpu memory, regardless of where it is in memory, you just reload the gs base. The offsets are the same everywhere, and are computed by the linker with out knowledge or reference to where the final address will end up.

In other words, at source level:

a = x86_read_percpu(foo)

will generate

mov %gs:percpu__foo, %rax

where the linker decides the value of percpu__foo, which can be up to 4G. Or if we use rip-relative:

mov %gs:percpu__foo(%rip), %rax

we end up with the same result, except that the generated instruction is a bit more compact.

In the final generated assembly, it ends up being a hardcoded constant address. Say, 0x7838.

Now if we allocate cpu 43 percpu data at 0xfffffffff7198000, we load %gs base with that value, and then the instruction is still

mov %gs:0x7838, %rax

and the computed address will be 0xfffffffff7198000 + 0x7838 = 0xfffffffff719f838.

And cpu 62 has its percpu data at 0xffffffffe3819000, and the instruction is still

mov %gs:0x7838, %rax

and the computed address for it's version of percpu__foo is 0xffffffffe3819000 + 0x7838 = 0xffffffffe3820838.

Note that it doesn't matter how you decide to place the percpu data, so long as you can load the address into the %gs base.

However, that does not work if one calculates the virtual address instead of looking up a physical address.

Calculate a virtual address for what? Physical address for what? If you have a large virtual region allocating 256M of percpu space, er, per cpu, then you just load %gs base with percpu_region_base + cpuid * 256M. It has no effect on the instructions accessing that percpu space.

J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/