RE: the x86 sysret_rip test fails on the Intel FRED architecture

From: H. Peter Anvin
Date: Sat Jan 21 2023 - 22:29:27 EST


On January 21, 2023 7:01:53 PM PST, "Li, Xin3" <xin3.li@xxxxxxxxx> wrote:
>> > >> If not intentional, it might be something that can still be fixed.
>> > >> If it is intentional and is going to be with us for a while we have
>> > >> a few options. If userspace is _really_ depending on this
>> > >> behavior, we could just clobber r11 ourselves in the FRED entry
>> > >> path. If not, we can remove the assertion in the selftest.
>> > > We can't clobber it in the FRED entry path, since it is common for
>> > > all events, but we could do it in the syscall dispatch.
>> > >
>> > > However, it doesn't seem to make sense to do so to me. The current
>> > > behavior is much more of an artifact than desired behavior.
>> > I guess the SDM statements really are for the kernel's benefit and not
>> > for userspace. Userspace _should_ be treating SYSCALL like a CALL and
>> > r11 like any old register that can be clobbered. Right now, the
>> > kernel just happens to clobber it with RFLAGS.
>> >
>> > I do the the odds of anyone relying on this behavior are pretty small.
>> > Let's just zap the check from the selftest, document what we did in
>> > the FRED docs and changelog and move on.
>>
>> Keep the selftest check, but also accept preserved RCX/R11. What really matters is
>> that the kernel isn't leaking data.
>
>I feel it the same way, it looks to me that the check is to make sure
>R11 doesn't leak any kernel data because the Linux kernel deliberately
>overwrites R11 with the value of user level flags just before returning
>to user level.
>
>I wanted to zap the check, but as HPA said, this is an artifact to not leak
>any kernel data. I guess it doesn't make a difference if the kernel sets
>R11 to 0.
>
>Maybe it's still reasonable to keep such a check for IDT. However, it makes
>no sense for FRED systems, because all GP registers are saved/restored upon
>event delivery/return.
>
>Thanks!
> Xin
>
>>
>> --
>> Brian Gerst
>

The big thing is that the system calls that return with sysret v iret on IDT systems need to be consistent, in order to not leak kernel state.