Re: [GIT pull] x86 vdso updates

From: Andrew Lutomirski
Date: Fri May 27 2011 - 10:55:23 EST


On Fri, May 27, 2011 at 7:36 AM, Andrew Lutomirski <luto@xxxxxxx> wrote:
> 3. Add int 0xcc and use it from vgettimeofday.  It will SIGSEGV if
> called from a user address (so it has no risk of ever becoming ABI)
> and it will do gettimeofday if called from the right address.  (I like
> 0xcc better than 0x81 because then I don't have to wonder whether any
> syscall-like instructions start with 0x81.)  I'm not convinced that
> the existing syscall entries are usable, because syscall itself has a
> different calling convention and int 0x80 is a compat syscall.
>

I started looking at what needs to be done and I wanted to get your
opinion before I wrote a bunch of code that you'd reject. Here are
three ideas for how the int 0xcc / int 0x81 entry could work:

*** Idea 1 ***

Make it a real syscall but with extra constraints. It would have the
same calling convention as the syscall instruction, but it would turn
into SIGKILL if the calling address isn't in the VSYSCALL page or if
the syscall number isn't __NR_clock_gettimeofday. It would BUG() if
called from kernel mode. There are two ways to implement this:

1. Have the interrupt entry check constraints, twiddle its stack frame
to look like a syscall instruction, and jump to the syscall entry.
This way there's little code duplication. (Is it safe to sysret back
to userspace from an interrupt gate? I don't see why not, but it
seems to violate the spirit of the thing.)

2. Duplicate the syscall entry. Ugly.

(int 0x80 is ia32_syscall which is unworkable because it's not there
on !COMPAT and because it calls the compat wrapper which would make
the whole thing a mess.)

Pros:
- ptrace, audit, seccomp, etc. still work. (Although what happens if
ptrace changes the syscall number?)

Cons:
- If we ever want to emulate the whole vsyscall instead of just the
fallback (i.e. stick the int 0xcc instruction at the vsyscall entry)
then it's back to the drawing board.

*** Idea 2 ***

Write the whole thing in C.

Pros: Easy to write and easy to maintain.

Cons:
- We'd have to actually think about ptrace, audit, and seccomp semantics.
- A touch slow. Probably doesn't matter.
- If we let ptrace see the entry and think it's a syscall, then
ptrace might think it can emulate the syscall and things will break
unless we're very careful.

I'm inclined to go with idea 2 with these elaborations:
- If seccomp is enabled, SIGKILL. Might as well match vDSO behavior.
- Don't audit or call ptrace. These things aren't real syscalls and
that would just be confusing. In any case, audit will never see the
non-fallback paths for the vDSO.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/