[PATCH v2 2/2] Add user-mode only option to unprivileged_userfaultfd sysctl knob

From: Lokesh Gidra
Date: Fri Aug 21 2020 - 21:40:43 EST


A third option is added to 'unprivileged_userfaultfd' sysctl knob.
When the knob is set to 2, it allows unprivileged users to call
userfaultfd, like when it is set to 1, but with the restriction that
page faults from only user-mode can be handled. In this mode,
an unprivileged user (without SYS_CAP_PTRACE capability) must pass
UFFD_USER_MODE_ONLY to userfaultd or the API will fail with EPERM.

This facility allows administrators to reduce the likelihood that
an attacker with access to userfaultfd can delay faulting kernel
code to widen timing windows for other exploits.

Signed-off-by: Daniel Colascione <dancol@xxxxxxxxxx>
Signed-off-by: Lokesh Gidra <lokeshgidra@xxxxxxxxxx>
---
Documentation/admin-guide/sysctl/vm.rst | 10 +++++++---
fs/userfaultfd.c | 10 ++++++++--
kernel/sysctl.c | 2 +-
3 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst
index 4b9d2e8e9142..23d6feb79f5c 100644
--- a/Documentation/admin-guide/sysctl/vm.rst
+++ b/Documentation/admin-guide/sysctl/vm.rst
@@ -872,9 +872,13 @@ unprivileged_userfaultfd
========================

This flag controls whether unprivileged users can use the userfaultfd
-system calls. Set this to 1 to allow unprivileged users to use the
-userfaultfd system calls, or set this to 0 to restrict userfaultfd to only
-privileged users (with SYS_CAP_PTRACE capability).
+system calls. Set this to 0 to restrict userfaultfd to only privileged
+users (with SYS_CAP_PTRACE capability), set this to 1 to allow unprivileged
+users to use the userfaultfd system calls, or set this to 2 to restrict
+unprivileged users to handle page faults in user mode only. In the last case,
+users without SYS_CAP_PTRACE must pass UFFD_USER_MODE_ONLY in order for
+userfaultfd to succeed. Prohibiting use of userfaultfd for handling faults
+from kernel mode may make certain vulnerabilities more difficult to exploit.

The default value is 1.

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 3e4ae6145112..2fcdeb28c960 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -1973,8 +1973,14 @@ SYSCALL_DEFINE1(userfaultfd, int, flags)
struct userfaultfd_ctx *ctx;
int fd;

- if (!sysctl_unprivileged_userfaultfd && !capable(CAP_SYS_PTRACE))
- return -EPERM;
+ switch (sysctl_unprivileged_userfaultfd) {
+ case 2:
+ if (flags & UFFD_USER_MODE_ONLY)
+ break;
+ case 0:
+ if (!capable(CAP_SYS_PTRACE))
+ return -EPERM;
+ }

BUG_ON(!current->mm);

diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 287862f91717..7e94215dfff5 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -3119,7 +3119,7 @@ static struct ctl_table vm_table[] = {
.mode = 0644,
.proc_handler = proc_dointvec_minmax,
.extra1 = SYSCTL_ZERO,
- .extra2 = SYSCTL_ONE,
+ .extra2 = &two,
},
#endif
{ }
--
2.28.0.297.g1956fa8f8d-goog