Re: [PATCH 1/5] x86: Refactor syscall_wrapper.h

From: Brian Gerst
Date: Fri Feb 21 2020 - 09:06:11 EST


On Fri, Feb 21, 2020 at 2:07 AM Dominik Brodowski
<linux@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> > + * Instead of the generic __SYSCALL_DEFINEx() definition, this macro takes
>
> If you move the description to the top (the reason for that is
> understandable), you can't talk about "this macro" any more --
> as "this macro" is "__SYSCALL_DEFINEx()" which is defined far further
> below.
>
> > + * struct pt_regs *regs as the only argument of the syscall stub named
> > + * __x64_sys_*(). It decodes just the registers it needs and passes them on to
> > + * the __se_sys_*() wrapper performing sign extension and then to the
> > + * __do_sys_*() function doing the actual job. These wrappers and functions
> > + * are inlined (at least in very most cases), meaning that the assembly looks
> > + * as follows (slightly re-ordered for better readability):
> > + *
> > + * <__x64_sys_recv>: <-- syscall with 4 parameters
> > + * callq <__fentry__>
> > + *
> > + * mov 0x70(%rdi),%rdi <-- decode regs->di
> > + * mov 0x68(%rdi),%rsi <-- decode regs->si
> > + * mov 0x60(%rdi),%rdx <-- decode regs->dx
> > + * mov 0x38(%rdi),%rcx <-- decode regs->r10
> > + *
> > + * xor %r9d,%r9d <-- clear %r9
> > + * xor %r8d,%r8d <-- clear %r8
> > + *
> > + * callq __sys_recvfrom <-- do the actual work in __sys_recvfrom()
> > + * which takes 6 arguments
> > + *
> > + * cltq <-- extend return value to 64-bit
> > + * retq <-- return
> > + *
> > + * This approach avoids leaking random user-provided register content down
> > + * the call chain.
>
> ... maybe add the rationale for x86-32 here (in patch 3/5)?
>
> > + * If IA32_EMULATION is enabled, this macro generates an additional wrapper
> > + * named __ia32_sys_*() which decodes the struct pt_regs *regs according
> > + * to the i386 calling convention (bx, cx, dx, si, di, bp).
>
> ... and this likely needs an update in patch 3/5 as well

I will update that comment section. The code for 32-bit will follow
the same pattern, just with different register names.

> > -#define __IA32_COMPAT_SYS_STUBx(x, name, ...) \
> > - asmlinkage long __ia32_compat_sys##name(const struct pt_regs *regs);\
> > - ALLOW_ERROR_INJECTION(__ia32_compat_sys##name, ERRNO); \
> > - asmlinkage long __ia32_compat_sys##name(const struct pt_regs *regs)\
> > +#define __SYS_STUB0(abi, name) \
> > + asmlinkage long __##abi##_##name(const struct pt_regs *regs); \
> > + ALLOW_ERROR_INJECTION(__##abi##_##name, ERRNO); \
> > + asmlinkage long __##abi##_##name(const struct pt_regs *regs) \
> > { \
> > - return __se_compat_sys##name(SC_IA32_REGS_TO_ARGS(x,__VA_ARGS__));\
> > + return __do_##name(); \
> > }
>
> Instead of __se_compat_sys##name, now __do_comapt_sys##name is called. This
> should be equivalent for zero-argument syscalls, but maybe make that change
> explicit.

The __se_* layer (sign-extension) is unnecessary when there are no
arguments to extend. But I can make that a separate patch if that
makes it clearer.

> In general terms, I am not sure that the new code is more readable (but I
> may be biased) as the conversion to the proper calling convention for the
> ABI has to be done by the callee and is not done in the caller. Also, there
> are now three levels of macros instead of two.

I added the __STUBx layer to avoid 4 copies of the same boilerplate
code. The middle layer is needed because you can't have #ifdefs
inside a macro.

>
>
> > +#ifdef CONFIG_X86_64
> > +#define __X64_SYS_STUBx(x, name, ...) \
> > + __SYS_STUBx(x64, name, SC_X86_64_REGS_TO_ARGS(x, __VA_ARGS__))
>
> s/name/sys_name/g please -- this isn't the name of the syscall any more, but
> appended by sys (applies to a number of macros)
>
> > - * named __ia32_sys_*()
> > + * As the generic SYSCALL_DEFINE0() macro does not decode any parameters for
> > + * obvious reasons, and passing struct pt_regs *regs to it in %rdi does not
> > + * hurt, we only need to re-define it here to keep the naming congruent to
> > + * SYSCALL_DEFINEx() -- which is essential for the COND_SYSCALL() and SYS_NI()
> > + * macros to work correctly.
> > */
> > +#define SYSCALL_DEFINE0(name) \
> > + SYSCALL_METADATA(_##name, 0); \
> > + static inline long __do_sys_##name(void); \
> > + __X64_SYS_STUB0(sys_##name) \
> > + __IA32_SYS_STUB0(sys_##name) \
> > + static inline long __do_sys_##name(void)
> >
> > -#define SYSCALL_DEFINE0(sname) \
> > - SYSCALL_METADATA(_##sname, 0); \
> > - asmlinkage long __x64_sys_##sname(const struct pt_regs *__unused);\
> > - ALLOW_ERROR_INJECTION(__x64_sys_##sname, ERRNO); \
> > - SYSCALL_ALIAS(__ia32_sys_##sname, __x64_sys_##sname); \
> > - asmlinkage long __x64_sys_##sname(const struct pt_regs *__unused)
>
> OK, this is a bit more complex change -- you don't use SYSCALL_ALIAS any
> more, but define two asmlinakge functions which then call __do_sys_##name().
> Maybe needs an explanation in the changelog...

IIRC this was because in the 32-bit native case the __x64_* variant
wasn't present to alias to. I'll see if I can come up with a
different solution.

> > +#define __IA32_COMPAT_SYS_STUBx(x, name, ...) \
> > + __SYS_STUBx(ia32, name, SC_IA32_REGS_TO_ARGS(x, __VA_ARGS__))
>
> s/name/compat_sys_name/g/ -- please make it a bit more explicit what "name"
> we are talking about here,
>
> > +#define __IA32_COMPAT_SYS_STUB0(name) \
> > + __SYS_STUB0(ia32, name)
>
> and here,
>
> > +#define __IA32_COMPAT_COND_SYSCALL(name) \
> > + __COND_SYSCALL(ia32, name)
>
> and here,
>
> > +#define __IA32_COMPAT_SYS_NI(name) \
> > + SYSCALL_ALIAS(__ia32_compat_sys_##name, sys_ni_posix_timers)
>
> ... but not here, as here "name" is actually "name".
>
> Thanks,
> Dominik

--
Brian Gerst