Re: [PATCH V2 2/7] x86/entry: Switch the stack after error_entry() returns

From: Peter Zijlstra
Date: Thu Mar 03 2022 - 03:00:52 EST


On Thu, Mar 03, 2022 at 11:54:29AM +0800, Lai Jiangshan wrote:
> From: Lai Jiangshan <jiangshan.ljs@xxxxxxxxxxxx>
>
> error_entry() calls sync_regs() to settle/copy the pt_regs and switches
> the stack directly after sync_regs(). But error_entry() itself is also
> a function call, the switching has to handle the return address of it
> together, which causes the work complicated and tangly.
>
> Switching to the stack after error_entry() makes the code simpler and
> intuitive.
>
> Signed-off-by: Lai Jiangshan <jiangshan.ljs@xxxxxxxxxxxx>
> ---
> arch/x86/entry/entry_64.S | 16 ++++++----------
> 1 file changed, 6 insertions(+), 10 deletions(-)
>
> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> index 24846284eda5..a51cad2b7fff 100644
> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -323,6 +323,8 @@ SYM_CODE_END(ret_from_fork)
> .macro idtentry_body cfunc has_error_code:req
>
> call error_entry
> + movq %rax, %rsp /* switch stack settled by sync_regs() */
> + ENCODE_FRAME_POINTER
> UNWIND_HINT_REGS
>
> movq %rsp, %rdi /* pt_regs pointer into 1st argument*/
> @@ -980,14 +982,10 @@ SYM_CODE_START_LOCAL(error_entry)
> /* We have user CR3. Change to kernel CR3. */
> SWITCH_TO_KERNEL_CR3 scratch_reg=%rax
>
> + leaq 8(%rsp), %rdi /* arg0 = pt_regs pointer */
> .Lerror_entry_from_usermode_after_swapgs:
> /* Put us onto the real thread stack. */
> - popq %r12 /* save return addr in %12 */
> - movq %rsp, %rdi /* arg0 = pt_regs pointer */
> call sync_regs
> - movq %rax, %rsp /* switch stack */
> - ENCODE_FRAME_POINTER
> - pushq %r12
> RET
>
> /*
> @@ -1019,6 +1017,7 @@ SYM_CODE_START_LOCAL(error_entry)
> */
> .Lerror_entry_done_lfence:
> FENCE_SWAPGS_KERNEL_ENTRY
> + leaq 8(%rsp), %rax /* return pt_regs pointer */
> RET
>
> .Lbstep_iret:
> @@ -1039,12 +1038,9 @@ SYM_CODE_START_LOCAL(error_entry)
> * Pretend that the exception came from user mode: set up pt_regs
> * as if we faulted immediately after IRET.
> */
> - popq %r12 /* save return addr in %12 */
> - movq %rsp, %rdi /* arg0 = pt_regs pointer */
> + leaq 8(%rsp), %rdi /* arg0 = pt_regs pointer */
> call fixup_bad_iret
> - mov %rax, %rsp
> - ENCODE_FRAME_POINTER
> - pushq %r12
> + mov %rax, %rdi
> jmp .Lerror_entry_from_usermode_after_swapgs
> SYM_CODE_END(error_entry)

I can't seem to follow; perhaps I need more wake-up-juice?

Suppose we have .Lerror_bad_iret, then the code flow is something like:


old new

.Lerror_bad_iret
SWAWPGS business

popq %r12 leaq 8(%rsp), %rsi
movq %rsp, %rdi
call fixup_bad_iret call fixup_bad_iret
mov %rax, %rsp mov %rax, %rdi
ENCODE_FRAME_POINTER
pushq %r12

jmp .Lerror_entry_from_usermode_after_swapgs

.Lerror_entry_from_usermode_after_swapgs
pop %r12
movq %rsp, %rdi
call sync_regs call sync_regs
mov %rax, %rsp
ENCODE_FRAME_POINTER
push %r12
RET RET


The thing that appears to me is that syn_regs is called one stack switch
short. This isn't mentioned in the Changelog nor in the comments. Why is
this OK?