Re: [PATCH 5/7] Use %gs for per-cpu sections in kernel

From: Jeremy Fitzhardinge
Date: Sun Sep 24 2006 - 21:36:24 EST

Rusty Russell wrote:
You're thinking of it in a convoluted way, by converting to offsets
from the per-cpu section, then converting it back. How about this
explanation: the local cpu's versions are offset from where the compiler
thinks they are by __per_cpu_offset[cpu]. We set the segment base to
__per_cpu_offset[cpu], so "%gs:per_cpu__foo" gets us straight to the
local cpu version. __per_cpu_offset[cpu] is always positive (kernel
image sits at bottom of kernel address space).

We're talking kernel virtual addresses, so the physical load address doesn't matter, of course.

So, take this kernel I have here as an explicit example:

$ nm -n vmlinux
c0431100 A __per_cpu_start
c0433800 D per_cpu__cpu_gdt_descr
c0433880 D per_cpu__cpu_tlbstate

And say that this CPU has its percpu data allocated at 0xc100000.

So, in this case the %gs base will be loaded with 0xc100000-0xc0431100 = 0x4bccef00
The offset of per_cpu__cpu_gdt_descr is 0xc0433800, so %gs:per_cpu__cpu_gdt_descr will compute 0x4bccef00+0xc0433800 to get the final linear address. Since 0xc0433800 is negative, this is actually a subtraction, and it therefore requires the segment to have a 4G limit. Which makes Xen sad.

Especially since "__per_cpu_start" is actually very large, and so this scheme pretty much relies on being able to wrap around the segment limit, and will be very bad for Xen.

__per_cpu_start is large, yes. But there's no reason to use it in
address calculation. The second half of your statement is not correct.

__per_cpu_start is added to all per_cpu__* addresses.

An alternative is to put the "-__per_cpu_start" into the addressing mode when constructing the address of the per-cpu variable.

I think you're thinking of TLS relocations? I don't use them...

No, but this is just as bad.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at