Re: Wake-up from suspend to RAM broken under `retbleed=stuff`

From: Juergen Gross
Date: Wed Jan 11 2023 - 07:22:47 EST


On 11.01.23 12:20, Peter Zijlstra wrote:
On Mon, Jan 09, 2023 at 04:05:31AM +0000, Joan Bruguera wrote:
This fixes wakeup for me on both QEMU and real HW
(just a proof of concept, don't merge)

diff --git a/arch/x86/kernel/callthunks.c b/arch/x86/kernel/callthunks.c
index ffea98f9064b..8704bcc0ce32 100644
--- a/arch/x86/kernel/callthunks.c
+++ b/arch/x86/kernel/callthunks.c
@@ -7,6 +7,7 @@
#include <linux/memory.h>
#include <linux/moduleloader.h>
#include <linux/static_call.h>
+#include <linux/suspend.h>
#include <asm/alternative.h>
#include <asm/asm-offsets.h>
@@ -150,6 +151,10 @@ static bool skip_addr(void *dest)
if (dest >= (void *)hypercall_page &&
dest < (void*)hypercall_page + PAGE_SIZE)
return true;
+#endif
+#ifdef CONFIG_PM_SLEEP
+ if (dest == restore_processor_state)
+ return true;
#endif
return false;
}
diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
index 236447ee9beb..e667894936f7 100644
--- a/arch/x86/power/cpu.c
+++ b/arch/x86/power/cpu.c
@@ -281,6 +281,9 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
/* Needed by apm.c */
void notrace restore_processor_state(void)
{
+ /* Restore GS before calling anything to avoid crash on call depth accounting */
+ native_wrmsrl(MSR_GS_BASE, saved_context.kernelmode_gs_base);
+
__restore_processor_state(&saved_context);
}

Yeah, I can see why, but I'm not really comfortable with this. TBH, I
don't see how the whole resume code is correct to begin with. At the
very least it needs a heavy dose of noinstr.

Rafael, what cr3 is active when we call restore_processor_state()?

Specifically, the problem is that I don't feel comfortable doing any
sort of weird code until all the CR and segment registers have been
restored, however, write_cr*() are paravirt functions that result in
CALL, which then gives us a bit of a checken and egg problem.

Under Xen the hypervisor should do all needed low level stuff when
resuming. Even dom0 is just a guest, so I don't see a problem with
PV guests. CR and segment registers should be in the vcpu structure(s)
in Xen and they should be restored by Xen when activating the vcpus
after suspend. There shouldn't be any need to do this again in the
kernel.

I have verified that the suspend path when running as Xen PV guest
is NOT calling restore_processor_state().

I'm also wondering how well retbleed=stuff works on Xen, if at all. If
we can ignore Xen, things are a little earier perhaps.

In restore_processor_state() you should be able to ignore Xen, unless
there are other code paths calling it (kexec can be ignored, too, as
that isn't supported in Xen PV, and suspend to disk shouldn't work
either).


Juergen

Attachment: OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key

Attachment: OpenPGP_signature
Description: OpenPGP digital signature