Re: [PATCH v3 04/11] x86/kthread,dumpstack: Set task_pt_regs->cs.RPL=3 for kernel threads

From: Ira Weiny
Date: Thu Mar 04 2021 - 15:20:52 EST


On Thu, Mar 04, 2021 at 11:05:57AM -0800, Andy Lutomirski wrote:
> For kernel threads, task_pt_regs is currently all zeros, a valid user state
> (if kernel_execve() has been called), or some combination thereof during
> execution of kernel_execve(). If a stack trace is printed, the unwinder
> might get confused and treat task_pt_regs as a kernel state. Indeed,
> forcing a stack dump results in an attempt to display _kernel_ code bytes
> from a bogus address at the very top of kernel memory.
>
> Fix the confusion by setting cs=3 so that user_mode(task_pt_regs(...)) ==
> true for kernel threads.
>
> Also improve the error when failing to fetch code bytes to show which type
> of fetch failed. This makes it much easier to understand what's happening.
>
> Cc: Josh Poimboeuf <jpoimboe@xxxxxxxxxx>
> Signed-off-by: Andy Lutomirski <luto@xxxxxxxxxx>
> ---
> arch/x86/kernel/dumpstack.c | 4 ++--
> arch/x86/kernel/process.c | 13 +++++++++++++
> 2 files changed, 15 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
> index 55cf3c8325c6..9b7b69bb12e5 100644
> --- a/arch/x86/kernel/dumpstack.c
> +++ b/arch/x86/kernel/dumpstack.c
> @@ -128,8 +128,8 @@ void show_opcodes(struct pt_regs *regs, const char *loglvl)
> /* No access to the user space stack of other tasks. Ignore. */
> break;
> default:
> - printk("%sCode: Unable to access opcode bytes at RIP 0x%lx.\n",
> - loglvl, prologue);
> + printk("%sCode: Unable to access %s opcode bytes at RIP 0x%lx.\n",
> + loglvl, user_mode(regs) ? "user" : "kernel", prologue);
> break;
> }
> }
> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> index 145a7ac0c19a..f6f16df04cb9 100644
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -163,6 +163,19 @@ int copy_thread(unsigned long clone_flags, unsigned long sp, unsigned long arg,
> /* Kernel thread ? */
> if (unlikely(p->flags & PF_KTHREAD)) {
> memset(childregs, 0, sizeof(struct pt_regs));
> +
> + /*
> + * Even though there is no real user state here, these
> + * are were user regs belong, and kernel_execve() will
^^^^^
where?

Ira

> + * overwrite them with user regs. Put an obviously
> + * invalid value that nonetheless causes user_mode(regs)
> + * to return true in CS.
> + *
> + * This also prevents the unwinder from thinking that there
> + * is invalid kernel state at the top of the stack.
> + */
> + childregs->cs = 3;
> +
> kthread_frame_init(frame, sp, arg);
> return 0;
> }
> --
> 2.29.2
>