Re: [PATCH 3/7] x86/cpu: Disable kernel LASS when patching kernel alternatives

From: Chen, Yian
Date: Tue Jan 10 2023 - 20:02:18 EST




On 1/10/2023 1:04 PM, Peter Zijlstra wrote:
On Mon, Jan 09, 2023 at 09:52:00PM -0800, Yian Chen wrote:
Most of the kernel is mapped at virtual addresses
in the upper half of the address range. But kernel
deliberately initialized a temporary mm area
within the lower half of the address range
for text poking, see commit 4fc19708b165
("x86/alternatives: Initialize temporary mm
for patching").

LASS stops access to a lower half address in kernel,
and this can be deactivated if AC bit in EFLAGS
register is set. Hence use stac and clac instructions
around access to the address to avoid triggering a
LASS #GP fault.

Kernel objtool validation warns if the binary calls
to a non-whitelisted function that exists outside of
the stac/clac guard, or references any function with a
dynamic function pointer inside the guard; see section
9 in the document tools/objtool/Documentation/objtool.txt.

For these reasons, also considering text poking size is
usually small, simple modifications have been done
in function text_poke_memcpy() and text_poke_memset() to
avoid non-whitelisted function calls inside the stac/clac
guard.

Gcc may detect and replace the target with its built-in
functions. However, the replacement would break the
objtool validation criteria. Hence, add compiler option
-fno-builtin for the file.

Please reflow to 72 characters consistently, this is silly.

Sure. I will format the commit msg guideline.

Co-developed-by: Tony Luck <tony.luck@xxxxxxxxx>
Signed-off-by: Tony Luck <tony.luck@xxxxxxxxx>
Signed-off-by: Yian Chen <yian.chen@xxxxxxxxx>
---
arch/x86/include/asm/smap.h | 13 +++++++++++++
arch/x86/kernel/Makefile | 2 ++
arch/x86/kernel/alternative.c | 21 +++++++++++++++++++--
tools/objtool/arch/x86/special.c | 2 ++
4 files changed, 36 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/smap.h b/arch/x86/include/asm/smap.h
index bab490379c65..6f7ac0839b10 100644
--- a/arch/x86/include/asm/smap.h
+++ b/arch/x86/include/asm/smap.h
@@ -39,6 +39,19 @@ static __always_inline void stac(void)
alternative("", __ASM_STAC, X86_FEATURE_SMAP);
}
+/* Deactivate/activate LASS via AC bit in EFLAGS register */
+static __always_inline void low_addr_access_begin(void)
+{
+ /* Note: a barrier is implicit in alternative() */
+ alternative("", __ASM_STAC, X86_FEATURE_LASS);
+}
+
+static __always_inline void low_addr_access_end(void)
+{
+ /* Note: a barrier is implicit in alternative() */
+ alternative("", __ASM_CLAC, X86_FEATURE_LASS);
+}

Can't say I like the name.
Indeed, there are alternative ways to name the functions. for example,
enable_kernel_lass()/disable_kernel_lass(), or simply keep no change to use stac()/clac().

I choose this name because it is straight forward to the purpose and helps in understanding when to use the functions.

Also if you look at bit 63 as a sign bit,
it's actively wrong since -1 is lower than 0.
This could be a trade-off choice. While considering address manipulation
and calculation, it is likely an unsigned. I would be happy to get input for better naming.

+
static __always_inline unsigned long smap_save(void)
{
unsigned long flags;
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 96d51bbc2bd4..f8a455fc56a2 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -7,6 +7,8 @@ extra-y += vmlinux.lds
CPPFLAGS_vmlinux.lds += -U$(UTS_MACHINE)
+CFLAGS_alternative.o += -fno-builtin
+
ifdef CONFIG_FUNCTION_TRACER
# Do not profile debug and lowlevel utilities
CFLAGS_REMOVE_tsc.o = -pg
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 7d8c3cbde368..4de8b54fb5f2 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -1530,14 +1530,31 @@ __ro_after_init unsigned long poking_addr;
static void text_poke_memcpy(void *dst, const void *src, size_t len)
{
- memcpy(dst, src, len);
+ const char *s = src;
+ char *d = dst;
+
+ /* The parameter dst ends up referencing to the global variable
+ * poking_addr, which is mapped to the low half address space.
+ * In kernel, accessing the low half address range is prevented
+ * by LASS. So relax LASS prevention while accessing the memory
+ * range.
+ */
+ low_addr_access_begin();
+ while (len-- > 0)
+ *d++ = *s++;
+ low_addr_access_end();
}
static void text_poke_memset(void *dst, const void *src, size_t len)
{
int c = *(const int *)src;
+ char *d = dst;
- memset(dst, c, len);
+ /* The same comment as it is in function text_poke_memcpy */
+ low_addr_access_begin();
+ while (len-- > 0)
+ *d++ = c;
+ low_addr_access_end();
}

This is horrific tinkering :/

This part seems difficult to have a perfect solution since function call or function pointer inside the guard of instruction stac and clac will trigger objtool warning (stated the reasons in the commit msg)

To avoid the warning, I considered this might be okay since the poking text usually seems a few bytes.

Also, what about the EFI mm? IIRC EFI also lives in the user address
space.

I didn't encounter EFI mm related problem while I tested the implementation. I will update you later after I investigate more around the EFI mm.

Thanks,
Yian