unistack: intel kernel stack hack against 2.1.22

Ingo Molnar (mingo@pc5829.hil.siemens.at)
Fri, 24 Jan 1997 22:06:22 +0100 (MET)


This patch unifies kernel stack pages and 'struct task_struct'.
Effectively this moves all information about a thread onto one page. The
memory layout is something like this [ascending addresses]:

kernel_stack_page
...
[kernel stack_area]
...
kernel_stack_page + 4096 - sizeof(struct struct_task)
...
[struct task_struct]
...
kernel_stack_page + 4096

This saves us a kmalloc in fork(), a kfree() in exit(), moves critical
data closer to each other (thus cuts a bit down on TLB misses on platforms
where this is an issue).

the dark side: it's ugly. The stack gets a bit smaller. There is a
'chicken and eggs' problem in exit(), which is now worked around. Not
really critical, but we'd like to have fault statistics really accurate i
guess.

but the most important plus IMHO, if this method is cool enough, it opens
up a new set of possibilities to further optimize entry.S. 'current' can
now be calculated based on the kernel ESP value, ie. without any memory
load operation. And Intel SMP can now avoid a few APIC reads to get the
cpuid.

[ is there anything i've missed? i'd like to know the bad news
before putting work into entry.S :))) ]

i'm just posting from a kernel with this patch, but be careful anyways.
The patch is against vanila 2.1.22.

-- mingo

--- linux-2.1.22_orig/kernel/fork.c Wed Jan 1 15:20:45 1997
+++ linux/kernel/fork.c Fri Jan 24 21:28:49 1997
@@ -221,12 +221,12 @@
unsigned long new_stack;
struct task_struct *p;

- p = (struct task_struct *) kmalloc(sizeof(*p), GFP_KERNEL);
- if (!p)
- goto bad_fork;
new_stack = alloc_kernel_stack();
if (!new_stack)
- goto bad_fork_free_p;
+ goto bad_fork;
+ p = (struct task_struct *) (((char *)new_stack) + PAGE_SIZE - sizeof(*p));
+ if (!p)
+ goto bad_fork;
error = -EAGAIN;
nr = find_empty_process();
if (nr < 0)
@@ -309,8 +309,6 @@
nr_tasks--;
bad_fork_free_stack:
free_kernel_stack(new_stack);
-bad_fork_free_p:
- kfree(p);
bad_fork:
return error;
}
--- linux-2.1.22_orig/kernel/exit.c Mon Dec 30 12:03:13 1996
+++ linux/kernel/exit.c Fri Jan 24 21:25:40 1997
@@ -127,11 +127,10 @@
release_thread(p);
if (STACK_MAGIC != *(unsigned long *)p->kernel_stack_page)
printk(KERN_ALERT "release: %s kernel stack corruption. Aiee\n", p->comm);
- free_kernel_stack(p->kernel_stack_page);
current->cmin_flt += p->min_flt + p->cmin_flt;
current->cmaj_flt += p->maj_flt + p->cmaj_flt;
current->cnswap += p->nswap + p->cnswap;
- kfree(p);
+ free_kernel_stack(p->kernel_stack_page);
return;
}
panic("trying to release non-existent task");
--- linux-2.1.22_orig/arch/i386/kernel/process.c Fri Jan 3 10:54:16 1997
+++ linux/arch/i386/kernel/process.c Fri Jan 24 21:24:17 1997
@@ -469,9 +469,9 @@
p->tss.fs = USER_DS;
p->tss.gs = KERNEL_DS;
p->tss.ss0 = KERNEL_DS;
- p->tss.esp0 = p->kernel_stack_page + PAGE_SIZE;
+ p->tss.esp0 = p->kernel_stack_page + PAGE_SIZE - sizeof(struct task_struct);
p->tss.tr = _TSS(nr);
- childregs = ((struct pt_regs *) (p->kernel_stack_page + PAGE_SIZE)) - 1;
+ childregs = ((struct pt_regs *) (p->kernel_stack_page + PAGE_SIZE - sizeof(struct task_struct))) - 1;
p->tss.esp = (unsigned long) childregs;
p->tss.eip = (unsigned long) ret_from_sys_call;
p->tss.ebx = (unsigned long) p;