Re: Wake-up from suspend to RAM broken under `retbleed=stuff`

From: Andrew Cooper
Date: Wed Jan 11 2023 - 06:42:42 EST


On 11/01/2023 11:20 am, Peter Zijlstra wrote:
> On Mon, Jan 09, 2023 at 04:05:31AM +0000, Joan Bruguera wrote:
>> This fixes wakeup for me on both QEMU and real HW
>> (just a proof of concept, don't merge)
>>
>> diff --git a/arch/x86/kernel/callthunks.c b/arch/x86/kernel/callthunks.c
>> index ffea98f9064b..8704bcc0ce32 100644
>> --- a/arch/x86/kernel/callthunks.c
>> +++ b/arch/x86/kernel/callthunks.c
>> @@ -7,6 +7,7 @@
>> #include <linux/memory.h>
>> #include <linux/moduleloader.h>
>> #include <linux/static_call.h>
>> +#include <linux/suspend.h>
>>
>> #include <asm/alternative.h>
>> #include <asm/asm-offsets.h>
>> @@ -150,6 +151,10 @@ static bool skip_addr(void *dest)
>> if (dest >= (void *)hypercall_page &&
>> dest < (void*)hypercall_page + PAGE_SIZE)
>> return true;
>> +#endif
>> +#ifdef CONFIG_PM_SLEEP
>> + if (dest == restore_processor_state)
>> + return true;
>> #endif
>> return false;
>> }
>> diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
>> index 236447ee9beb..e667894936f7 100644
>> --- a/arch/x86/power/cpu.c
>> +++ b/arch/x86/power/cpu.c
>> @@ -281,6 +281,9 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
>> /* Needed by apm.c */
>> void notrace restore_processor_state(void)
>> {
>> + /* Restore GS before calling anything to avoid crash on call depth accounting */
>> + native_wrmsrl(MSR_GS_BASE, saved_context.kernelmode_gs_base);
>> +
>> __restore_processor_state(&saved_context);
>> }
> Yeah, I can see why, but I'm not really comfortable with this. TBH, I
> don't see how the whole resume code is correct to begin with. At the
> very least it needs a heavy dose of noinstr.
>
> Rafael, what cr3 is active when we call restore_processor_state()?
>
> Specifically, the problem is that I don't feel comfortable doing any
> sort of weird code until all the CR and segment registers have been
> restored, however, write_cr*() are paravirt functions that result in
> CALL, which then gives us a bit of a checken and egg problem.
>
> I'm also wondering how well retbleed=stuff works on Xen, if at all. If
> we can ignore Xen, things are a little earier perhaps.

I really would like retbleed=stuff to work under Xen PV, because then we
can arrange to start turning off some even more expensive mitigations
that Xen does on behalf of guests.

I have a suspicion that these paths will be used under Xen PV, even if
only for dom0.  The abstraction for host S3/4/5 are not great.  That
said, at all points that guest code is executing, even after a logical
S3 resume, it will have a good GS_BASE  (Assuming the teardown logic
doesn't self-clobber.)

The bigger issue with stuff accounting is that nothing AFAICT accounts
for the fact that any hypercall potentially empties the RSB in otherwise
synchronous program flow.

~Andrew