Re: [PATCH v2 10/11] x86/alternatives: Simplify ALTERNATIVE_n()

From: Peter Zijlstra
Date: Thu Sep 07 2023 - 11:40:44 EST


On Thu, Sep 07, 2023 at 10:31:58AM +0200, Borislav Petkov wrote:
> On Mon, Aug 14, 2023 at 01:44:36PM +0200, Peter Zijlstra wrote:
> > Instead of making increasingly complicated ALTERNATIVE_n()
> > implementations, use a nested alternative expression.
> >
> > The only difference between:
> >
> > ALTERNATIVE_2(oldinst, newinst1, flag1, newinst2, flag2)
> >
> > and
> >
> > ALTERNATIVE(ALTERNATIVE(oldinst, newinst1, flag1),
> > newinst2, flag2)
>
> Hmm, one more problem I see with this. You're handling it, it seems, but
> the whole thing doesn't feel clean to me.
>
> Here's an exemplary eval:
>
> > #APP
> > # 53 "./arch/x86/include/asm/page_64.h" 1
> > # ALT: oldnstr
> > 661:
> > # ALT: oldnstr
> > 661:
>
> <--- X
>
> > call clear_page_orig #
> > 662:
> > # ALT: padding
> > .skip -(((665f-664f)-(662b-661b)) > 0) * ((665f-664f)-(662b-661b)),0x90

665f-664f = 5 (rep)
662b-661b = 5 (orig)

5-5 > 0 = 0

so no padding

> > 663:
> > .pushsection .altinstructions,"a"
> > .long 661b - .
> > .long 664f - .
> > .4byte ( 3*32+16)
> > .byte 663b-661b
> > .byte 665f-664f
> > .popsection
> > .pushsection .altinstr_replacement, "ax"
> > # ALT: replacement
> > 664:
> > call clear_page_rep #
> > 665:
> > .popsection
> >
> > 662:
> > # ALT: padding
> > .skip -(((665f-664f)-(662b-661b)) > 0) * ((665f-664f)-(662b-661b)),0x90
> > 663:
>
> <--- Z
>
> So here it would add the padding again, unnecessarily.

665f-664f = 5 (erms)
662b-661b = 5 (orig + padding)

5-5 > 0 = 0

no padding, also since, as you note 661b is the first, we include all
previous padding, and the skip will only add additional padding if the
new sequence is longer still.

So, no I'm not seeing it. Doubly not with this example where all 3
variants are 5 bytes.

Notably, the following nonsense alternative with 1 2 and 3 bytes
instructions:

asm volatile (
ALTERNATIVE_2("push %rbp",
"push %r12", X86_FEATURE_ALWAYS,
"mov %rsp,%rbp", X86_FEATURE_ALWAYS));

ends up as:

0004 204: 55 push %rbp
0005 205: 90 nop
0006 206: 90 nop

If you flip the 3 and 2 byte instructions the result is the same. No
extra padding.

And no, I had not actually tested this before, because clearly this is
all obvious ;-)

Anyway, the 1,3,2 variant spelled out reads like:

#APP
# 1563 "../arch/x86/kernel/alternative.c" 1
# ALT: oldnstr
661:
# ALT: oldnstr
661:
push %rbp
662:
# ALT: padding
.skip -(((665f-664f)-(662b-661b)) > 0) * ((665f-664f)-(662b-661b)),0x90

# Which evaluates like:
# 665f-664f = 3
# 662b-661b = 1
# 3-1 > 0 = -1
# --1 * (3-1) = 2
#
# so two single byte nops get emitted here.

663:
.pushsection .altinstructions,"a"
.long 661b - .
.long 664f - .
.4byte ( 3*32+21)
.byte 663b-661b
.byte 665f-664f
.popsection
.pushsection .altinstr_replacement, "ax"
# ALT: replacement
664:
mov %rsp,%rbp
665:
.popsection

662:
# ALT: padding
.skip -(((665f-664f)-(662b-661b)) > 0) * ((665f-664f)-(662b-661b)),0x90

# And this evaluates to:
# 665f-664f = 2
# 662b-661b = 3 (because it includes the original 1 byte instruction and 2 bytes padding)
# 3-1 > 0 = 0
# 0 * (3-1) = 0
#
# so no extra padding

663:
.pushsection .altinstructions,"a"
.long 661b - .
.long 664f - .
.4byte ( 3*32+21)
.byte 663b-661b
.byte 665f-664f
.popsection
.pushsection .altinstr_replacement, "ax"
# ALT: replacement
664:
push %r12
665:
.popsection

# 0 "" 2
# ../arch/x86/kernel/alternative.c:1569: int3_selftest();
#NO_APP