Re: [PATCH v1 RFC Zisslpcfi 11/20] mmu: maybe_mkwrite updated to manufacture shadow stack PTEs

From: David Hildenbrand
Date: Tue Feb 14 2023 - 07:11:04 EST


On 13.02.23 21:01, Deepak Gupta wrote:
On Mon, Feb 13, 2023 at 03:56:22PM +0100, David Hildenbrand wrote:
On 13.02.23 15:37, Deepak Gupta wrote:
On Mon, Feb 13, 2023 at 01:05:16PM +0100, David Hildenbrand wrote:
On 13.02.23 05:53, Deepak Gupta wrote:
maybe_mkwrite creates PTEs with WRITE encodings for underlying arch if
VM_WRITE is turned on in vma->vm_flags. Shadow stack memory is a write-
able memory except it can only be written by certain specific
instructions. This patch allows maybe_mkwrite to create shadow stack PTEs
if vma is shadow stack VMA. Each arch can define which combination of VMA
flags means a shadow stack.

Additionally pte_mkshdwstk must be provided by arch specific PTE
construction headers to create shadow stack PTEs. (in arch specific
pgtable.h).

This patch provides dummy/stub pte_mkshdwstk if CONFIG_USER_SHADOW_STACK
is not selected.

Signed-off-by: Deepak Gupta <debug@xxxxxxxxxxxx>
---
include/linux/mm.h | 23 +++++++++++++++++++++--
include/linux/pgtable.h | 4 ++++
2 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 8f857163ac89..a7705bc49bfe 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1093,6 +1093,21 @@ static inline unsigned long thp_size(struct page *page)
void free_compound_page(struct page *page);
#ifdef CONFIG_MMU
+
+#ifdef CONFIG_USER_SHADOW_STACK
+bool arch_is_shadow_stack_vma(struct vm_area_struct *vma);
+#endif
+
+static inline bool
+is_shadow_stack_vma(struct vm_area_struct *vma)
+{
+#ifdef CONFIG_USER_SHADOW_STACK
+ return arch_is_shadow_stack_vma(vma);
+#else
+ return false;
+#endif
+}
+
/*
* Do pte_mkwrite, but only if the vma says VM_WRITE. We do this when
* servicing faults for write access. In the normal case, do always want
@@ -1101,8 +1116,12 @@ void free_compound_page(struct page *page);
*/
static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma)
{
- if (likely(vma->vm_flags & VM_WRITE))
- pte = pte_mkwrite(pte);
+ if (likely(vma->vm_flags & VM_WRITE)) {
+ if (unlikely(is_shadow_stack_vma(vma)))
+ pte = pte_mkshdwstk(pte);
+ else
+ pte = pte_mkwrite(pte);
+ }
return pte;

Exactly what we are trying to avoid in the x86 approach right now.
Please see the x86 series on details, we shouldn't try reinventing the
wheel but finding a core-mm approach that fits multiple architectures.

https://lkml.kernel.org/r/20230119212317.8324-1-rick.p.edgecombe@xxxxxxxxx

Thanks David for comment here. I looked at x86 approach. This patch
actually written in a way which is not re-inventing wheel and is following
a core-mm approach that fits multiple architectures.

Change above checks `is_shadow_stack_vma` and if it returns true then only
it manufactures shadow stack pte else it'll make a regular writeable mapping.

Now if we look at `is_shadow_stack_vma` implementation, it returns false if
`CONFIG_USER_SHADOW_STACK` is not defined. If `CONFIG_USER_SHADOW_STACK is
defined then it calls `arch_is_shadow_stack_vma` which should be implemented
by arch specific code. This allows each architecture to define their own vma
flag encodings for shadow stack (riscv chooses presence of only `VM_WRITE`
which is analogous to choosen PTE encodings on riscv W=1,R=0,X=0)

Additionally pte_mkshdwstk will be nop if not implemented by architecture.

Let me know if this make sense. If I am missing something here, let me know.

See the discussion in that thread. The idea is to pass a VMA to
pte_mkwrite() and let it handle how to actually set it writable.


Thanks. I see. Instances where `pte_mkwrite` is directly invoked by checking
VM_WRITE and thus instead of fixing all those instance, make pte_mkwrite itself
take vma flag or vma.

I'll revise.

Thanks, it would be great to discuss in the other threads what else you would need to make it work for you. I assume Rick will have something to play with soonish (Right, Rick? :) ).

--
Thanks,

David / dhildenb