Re: Intel P6 vs P7 system call performance

From: Jamie Lokier (lk@tantalophile.demon.co.uk)
Date: Sat Dec 21 2002 - 12:18:08 EST


Linus Torvalds wrote:
> Yes, you can make the "clobbers %eax/%edx/%ecx" argument, but the fact is,
> we quite fundamentally need to save %edx/%ecx _anyway_.

On the kernel side, yes. In the userspace trampoline, it's not required.

> but once we start using sysexit for the signal handler return path too, we
> will need to restore %edx and %ecx too, otherwise our restarted system
> call will have crap in the registers. I already wrote the code, but
> decided that as long as we don't do that kind of restarting, we shouldn't
> have the overhead in the trampoline. But basically the trampoline then
> will become
>
> system_call_trampoline:
> pushfl
> pushl %ecx
> pushl %edx
> pushl %ebp
> movl %esp,%ebp
> syscall
> 0:
> movl %esp,%ebp
> movl 4(%ebp),%edx
> movl 8(%ebp),%ecx
> syscall
>
> restart:
> jmp 0b
> sysenter_return_point:
> popl %ebp
> popl %edx
> popl %ecx
> popfl
> ret

> see? So you _have_ to really save the arguments anyway, because you cannot
> do a sysexit-based system call restart otherwise. And once you save them,
> you might as well restore them too.
>
> And since you have to restore them for system call restart anyway, you
> might as well just make it part of the calling convention.
>
> Yes, I'm thinking ahead. Sue me.

You're optimising the _rare_ case.

The correct [;)] trampoline looks like this:

         system_call_trampoline:
                 pushl %ebp
                movl %esp,%ebp
                 sysenter
         sysenter_return_point:
                popl %ebp
                 ret
         sysenter_restart:
                popl %edx
                popl %ecx
                movl %esp,%ebp
                 sysenter

This is accompanied by changing this line in arch/i386/kernel/signal.c:

        regs->eip -= 2;

To this (best moved to an inline function):

        if (likely (regs->eip == sysenter_return_point)) {
                unsigned long * esp = (unsigned long *) regs->esp - 8;
                if (!access_ok(VERIFY_WRITE, esp, 8)
                    || __put_user(regs->edx, esp)
                    || __put_user(regs->ecx, esp+4)) {
                        if (sig == SIGSEGV)
                                ka->sa.sa_handler = SIG_DFL;
                        force_sig(SIGSEGV, current);
                }
                regs->esp = (long) esp;
                regs->eip = sysenter_restart;
        } else {
                regs->eip -= 2;
        }

Thus the common case, system calls, are optimised. The uncommon case,
signal interrupts system call, is penalised, though it's not a large
penalty. (Much less than the gain from using sysexit!) Which is more
common?

By the way, this works with the AMD syscall instruction too.
Then the trampoline is:

        system_call_trampoline:
                pushl %ebp
                movl %ecx,%ebp
                syscall
        syscall_return_point:
                popl %ebp
                ret
        syscall_restart:
                popl %edx
                popl %ebp
                syscall

Finally, there is no need to restore %ebp in the kernel sysexit/sysret
paths, because it will always be restored in the trampoline. So you
save 1 memory access there too.

(ps. I have a question: why does your trampoline save & restore
the flags?)

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Mon Dec 23 2002 - 22:00:29 EST