[PATCH 0/9] [REVIEW-REQUEST] [v4] System Calls for Memory Protection Keys

From: Dave Hansen
Date: Thu Jul 07 2016 - 08:47:41 EST


I'm resending these because Ingo has said that he'd "love to have
some high level MM review & ack for these syscall ABI extensions."
The only changes to the code in months have been in the selftests.
So, if anyone has been putting off taking a look at these, I'd
appreciate a look now.

I also feel compelled to mention this, since I haven't before and
it gives me confidence that these interfaces are good enough:

Among other things, this feature was designed to help fix a class
of bugs in long-running applications where data corruption is
detected long after it occurs. Today, applications either live
with the corruption, or eat a huge performance penalty from
calling mprotect() frequently. The developers of these
applications are already running *this* *code* and are very eager
to see this feature merged and picked up in future distributions
where their customers can use it.

Other than this message, a good place to start with a review
is in the pkey(7) manpage, which I've published in HTML form here:

https://www.sr71.net/~dave/intel/manpages/

--

Memory Protection Keys for User pages (pkeys) is a CPU feature
which will first appear on Skylake Servers, but will also be
supported on future non-server parts. It provides a mechanism
for enforcing page-based protections, but without requiring
modification of the page tables when an application changes
wishes to change permissions.

Among other things, this feature was designed to help fix a class
of bugs in long-running applications where data corruption is
detected long after it occurs. Applications today either live
with the corruption, or eat a huge performance penalty from
calling mprotect() frequently. The developers of these
applications are already running this code and are very eager to
see this feature merged and picked up in future distributions
where their customers can use it.

Patches to implement execute-only mapping support using pkeys
were merged in to 4.6. But, to do anything else useful with
pkeys, an application needs to be able to set the pkey field in
the PTE (obviously has to be done in-kernel) and make changes to
the "rights" register (using unprivileged instructions).

An application also needs to have an an allocator for the keys
themselves. If two different parts of an application both want
to protect their data with pkeys, they first need to know which
key to use for their individual purposes.

This set introduces 5 system calls, in 3 logical groups:

1. PTE pkey setting (sys_pkey_mprotect(), patches #1-3)
2. Key allocation (sys_pkey_alloc() / sys_pkey_free(), patch #4)
3. Rights register manipulation (sys_pkey_set/get(), patch #5)

I have manpages written for some of these syscalls, and have
had multiple rounds of reviews on the manpages list.

This set is also available here:

git://git.kernel.org/pub/scm/linux/kernel/git/daveh/x86-pkeys.git pkeys-v040

I've written a set of unit tests for these interfaces, which is
available as the last patch in the series and integrated in to
kselftests.

=== diffstat ===

Dave Hansen (9):
x86, pkeys: add fault handling for PF_PK page fault bit
mm: implement new pkey_mprotect() system call
x86, pkeys: make mprotect_key() mask off additional vm_flags
x86: wire up mprotect_key() system call
x86, pkeys: allocation/free syscalls
x86, pkeys: add pkey set/get syscalls
generic syscalls: wire up memory protection keys syscalls
pkeys: add details of system call use to Documentation/
x86, pkeys: add self-tests

Documentation/x86/protection-keys.txt | 63 +
arch/alpha/include/uapi/asm/mman.h | 5 +
arch/mips/include/uapi/asm/mman.h | 5 +
arch/parisc/include/uapi/asm/mman.h | 5 +
arch/x86/entry/syscalls/syscall_32.tbl | 5 +
arch/x86/entry/syscalls/syscall_64.tbl | 5 +
arch/x86/include/asm/mmu.h | 8 +
arch/x86/include/asm/mmu_context.h | 25 +-
arch/x86/include/asm/pgtable.h | 13 +-
arch/x86/include/asm/pgtable_64.h | 26 +-
arch/x86/include/asm/pgtable_types.h | 6 -
arch/x86/include/asm/pkeys.h | 80 +-
arch/x86/kernel/fpu/xstate.c | 73 +-
arch/x86/mm/fault.c | 9 +
arch/x86/mm/pkeys.c | 38 +-
arch/xtensa/include/uapi/asm/mman.h | 5 +
include/linux/pkeys.h | 39 +-
include/linux/syscalls.h | 8 +
include/uapi/asm-generic/mman-common.h | 5 +
include/uapi/asm-generic/unistd.h | 12 +-
mm/mprotect.c | 134 +-
tools/testing/selftests/x86/Makefile | 3 +-
tools/testing/selftests/x86/pkey-helpers.h | 191 +++
tools/testing/selftests/x86/protection_keys.c | 1316 +++++++++++++++++
24 files changed, 2012 insertions(+), 67 deletions(-)

=== changelog ===

Changes from v3:
* added generic syscalls declarations to include/linux/syscalls.h
to fix arm64 compile issue.

Changes from v2:
* selftest updates:
* formatting changes like what Ingo asked for with MPX
* actually call WRPKRU in __wrpkru()
* once __wrpkru() was fixed, revealed a bug in the ptrace
test where we were testing against the wrong pointer during
the "baseline" test
* Man-pages that match this set are here:
http://marc.info/?l=linux-man&m=146540723525616&w=2

Changes from v1:
* updates to alloc/free patch description calling out that
"in-use" pkeys may still be pkey_free()'d successfully.
* Fixed a bug in the selftest where the 'flags' argument was
not passed to pkey_get().
* Added all syscalls to generic syscalls header
* Added extra checking to selftests so it doesn't fall over
when 1G pages are made the hugetlbfs default.

Cc: linux-api@xxxxxxxxxxxxxxx
Cc: linux-arch@xxxxxxxxxxxxxxx
Cc: linux-mm@xxxxxxxxx
Cc: x86@xxxxxxxxxx
Cc: torvalds@xxxxxxxxxxxxxxxxxxxx
Cc: akpm@xxxxxxxxxxxxxxxxxxxx
Cc: Arnd Bergmann <arnd@xxxxxxxx>
Cc: mgorman@xxxxxxxxxxxxxxxxxxx
Cc: hughd@xxxxxxxxxx
Cc: viro@xxxxxxxxxxxxxxxxxx