Re: [PATCH 1/3] x86: Create and use a TOP_OF_KERNEL_STACK_PADDING macro

From: Ingo Molnar
Date: Mon Mar 16 2015 - 04:56:44 EST



* Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:

> x86_32, unlike x86_64, pads the top of the kernel stack. Document
> this padding and give it a name.
>
> This should make no change whatsoever to the compiled kernel image.
> It also doesn't fix any of the current bugs in this area.
>
> Signed-off-by: Andy Lutomirski <luto@xxxxxxxxxxxxxx>
> ---
> arch/x86/include/asm/processor.h | 3 ++-
> arch/x86/include/asm/thread_info.h | 30 ++++++++++++++++++++++++++++++
> 2 files changed, 32 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
> index 48a61c1c626e..88d9aa745898 100644
> --- a/arch/x86/include/asm/processor.h
> +++ b/arch/x86/include/asm/processor.h
> @@ -849,7 +849,8 @@ extern unsigned long thread_saved_pc(struct task_struct *tsk);
> #define task_pt_regs(task) \
> ({ \
> struct pt_regs *__regs__; \
> - __regs__ = (struct pt_regs *)(KSTK_TOP(task_stack_page(task))-8); \
> + __regs__ = (struct pt_regs *)(KSTK_TOP(task_stack_page(task)) - \
> + TOP_OF_KERNEL_STACK_PADDING); \
> __regs__ - 1; \
> })
>
> diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
> index 7740edd56fed..74fd74ca50d3 100644
> --- a/arch/x86/include/asm/thread_info.h
> +++ b/arch/x86/include/asm/thread_info.h
> @@ -49,6 +49,36 @@ struct thread_info {
> #define init_thread_info (init_thread_union.thread_info)
> #define init_stack (init_thread_union.stack)
>
> +#ifdef CONFIG_X86_32
> +
> +/*
> + * TOP_OF_KERNEL_STACK_PADDING is a number of unused bytes that we
> + * reserve at the top of the kernel stack. We do it because of a nasty
> + * 32-bit corner case. On x86_32, the hardware stack frame is
> + * variable-length. Except for vm86 mode, struct pt_regs assumes a
> + * maximum-length frame. If we enter from CPL 0, the top 8 bytes of
> + * pt_regs don't actually exist. Ordinarily this doesn't matter, but it
> + * does in at least one case:
> + *
> + * If we take an NMI early enough in sysenter, the we can end up with

s/the/then

I fixed this up in the commit.

> + * pt_regs that extends above sp0. On the way out, in the espfix code,
> + * we can read the saved SS value, but that value will be above sp0.
> + * Without this offset, that can result in a page fault. (We are
> + * careful that, in this case, the value we read doesn't matter.)
> + *
> + * In vm86 mode, the hardware frame is much longer still, but we neither
> + * access the extra members from NMI context, nor do we write such a
> + * frame at sp0 at all.
> + */

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/