Re: [PATCH v6 14/41] x86/mm: Introduce _PAGE_SAVED_DIRTY

From: David Hildenbrand
Date: Tue Feb 21 2023 - 03:40:02 EST


On 20.02.23 22:38, Edgecombe, Rick P wrote:
On Mon, 2023-02-20 at 12:32 +0100, David Hildenbrand wrote:
On 18.02.23 22:14, Rick Edgecombe wrote:
Some OSes have a greater dependence on software available bits in
PTEs than
Linux. That left the hardware architects looking for a way to
represent a
new memory type (shadow stack) within the existing bits. They chose
to
repurpose a lightly-used state: Write=0,Dirty=1. So in order to
support
shadow stack memory, Linux should avoid creating memory with this
PTE bit
combination unless it intends for it to be shadow stack.

The reason it's lightly used is that Dirty=1 is normally set by HW
_before_ a write. A write with a Write=0 PTE would typically only
generate
a fault, not set Dirty=1. Hardware can (rarely) both set Dirty=1
*and*
generate the fault, resulting in a Write=0,Dirty=1 PTE. Hardware
which
supports shadow stacks will no longer exhibit this oddity.

So that leaves Write=0,Dirty=1 PTEs created in software. To achieve
this,
in places where Linux normally creates Write=0,Dirty=1, it can use
the
software-defined _PAGE_SAVED_DIRTY in place of the hardware
_PAGE_DIRTY.
In other words, whenever Linux needs to create Write=0,Dirty=1, it
instead
creates Write=0,SavedDirty=1 except for shadow stack, which is
Write=0,Dirty=1. Further differentiated by VMA flags, these PTE bit
combinations would be set as follows for various types of memory:

I would simplify (see below) and not repeat what the patch contains
as
comments already that detailed.

This verbiage has had quite a bit of x86 maintainer attention already.
I hear what you are saying, but I'm a bit hesitant to take style
suggestions at this point for fear of the situation where people ask
for changes back and forth across different versions. Unless any x86
maintainers want to chime in again? More responses below.

Sure, for my taste this is (1) too repetitive (2) too verbose (3) to specialized. But whatever x86 maintainers prefer.

[...]

"
However, there are valid cases where the kernel might create read-
only
PTEs that are dirty (e.g., fork(), mprotect(), uffd-wp(), soft-dirty
tracking). In this case, the _PAGE_SAVED_DIRTY bit is used instead
of
the HW-dirty bit, to avoid creating a wrong "shadow stack" PTEs.
Such
PTEs have (Write=0,SavedDirty=1,Dirty=0) set.

Note that on processors without shadow stack support, the
_PAGE_SAVED_DIRTY remains unused.
"

The I would simply drop below (which is also too COW-specific I
think).

COW is the main situation where shadow stacks become read-only. So, as
an example it is nice in that COW covers all the scenarios discussed.
Again, do any x86 maintainers want to weigh in here?

Again, I'd not specialize on COW in all patches to much (IMHO, it creates more confusion than it actually helps for understanding what's happening) and just call it a read-only PTE that is dirty. Simple as that. And it's easy to see why that's problematic, because read-only PTEs that are dirty would be identified as shadow stack PTEs, which we want to work around.

Again, just my 2 cents. I'm not an x86 maintainer ;)

--
Thanks,

David / dhildenb