This is interesting, since even by doing a non-elegant
current->... --> struct task_struct *tsk = current;
replacement for excessive uses of current, I was able to gain almost 10kB
within a single file already!
I guess it's due to having tried that on an older installation with gcc 3.2,
which probably does less efficient opcode merging of current_thread_info()
requests compared to a current gcc version.
IOW, .text segment reduction could be quite a bit higher for older gcc:s.
This uses the x86 segmentation stuff in a way similar to NPTL's way of
implementing Thread-Local Storage. It relies on the fact that each CPU
has its own Global Descriptor Table (GDT), which is basically an array
of base-length pairs (with some extra stuff). When a segment register
is loaded with a descriptor (approximately, an index in the GDT), and
you use that segment register for memory access, the address has the
base added to it, and the resulting address is used.
Not a problem for more daring user-space apps (i.e. Wine), I hope?