RE: [RFC PATCH] x86/retpolines: Prevent speculation after RET

From: David Laight
Date: Fri Feb 19 2021 - 04:30:08 EST


From: Peter Zijlstra
> Sent: 18 February 2021 19:03
>
> On Thu, Feb 18, 2021 at 07:46:39PM +0100, Borislav Petkov wrote:
> > Both vendors speculate after a near RET in some way:
> >
> > Intel:
> >
> > "Unlike near indirect CALL and near indirect JMP, the processor will not
> > speculatively execute the next sequential instruction after a near RET
> > unless that instruction is also the target of a jump or is a target in a
> > branch predictor."
>
> Right, the way I read that means it's not a problem for us here.

They got a lawyer to write that sentence :-)
What on earth is that 'unless' clause about?
Either:
1) The instructions might be speculatively executed for some entirely
different reason.
or:
2) The cpu might use the BTB to determine the instruction that follows the
RET - and so might happen to execute the instruction that follows it.

I can't manage to read it in any way that suggests that the cpu will
ignore the fact it is a RET and start executing the instruction that
follows.
(Unlike some ARM cpus which do seem to do that.)

> > AMD:
> >
> > "Some AMD processors when they first encounter a branch do not stall
> > dispatch and use the branches dynamic execution to determine the target.
> > Therefore, they will speculatively dispatch the sequential instructions
> > after the branch. This happens for near return instructions where it is
> > not clear what code may exist sequentially after the return instruction.

Sounds like the conditional branch prediction (and the BTB?) get used for RET
instructions when the 'return address stack' is invalid.

> > This behavior also occurs with jmp/call instructions with indirect
> > targets. Software should place a LFENCE or another dispatch serializing
> > instruction after the return or jmp/call indirect instruction to prevent
> > this sequential speculation."
> >
> > The AMD side doesn't really need the LFENCE because it'll do LFENCE;
> > JMP/CALL <target> due to X86_FEATURE_RETPOLINE_AMD, before it reaches
> > the RET.
>
> It never reached the RET.
>
> So all in all, I really don't see why we'd need this.

I read that as implying that some AMD cpu can sometimes treat the RET as
a conditional branch and so speculatively assume it isn't taken.
So you need an LFENCE (or ???) following the RET at the end of every function.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)