[PATCH v1] mm, sysctl: Add sysctl for controlling VM_MAYEXEC taint

From: robert . foss
Date: Fri Aug 26 2016 - 12:30:19 EST


From: Will Drewry <wad@xxxxxxxxxxxx>

This patch proposes a sysctl knob that allows a privileged user to
disable ~VM_MAYEXEC tainting when mapping in a vma from a MNT_NOEXEC
mountpoint. It does not alter the normal behavior resulting from
attempting to directly mmap(PROT_EXEC) a vma (-EPERM) nor the behavior
of any other subsystems checking MNT_NOEXEC.

It is motivated by a common /dev/shm, /tmp usecase. There are few
facilities for creating a shared memory segment that can be remapped in
the same process address space with different permissions. Often, a
file in /tmp provides this functionality. However, on distributions
that are more restrictive/paranoid, world-writeable directories are
often mounted "noexec". The only workaround to support software that
needs this behavior is to either not use that software or remount /tmp
exec. (E.g., https://bugs.gentoo.org/350336?id=350336) Given that
the only recourse is using SysV IPC, the application programmer loses
many of the useful ABI features that they get using a mmap'd file.

With this patch, it would be possible to change the sysctl variable
such that mprotect(PROT_EXEC) would succeed. In cases like the example
above, an additional userspace mmap-wrapper would be needed, but in
other cases, like how code.google.com/p/nativeclient mmap()s then
mprotect()s, the behavior would be unaffected.

The tradeoff is a loss of defense in depth, but it seems reasonable when
the alternative is frequently to disable the defense entirely.

(There are many other ways to approach this problem, but this seemed to
be the most practical and feel the least like a hack or a major change.)

Signed-off-by: Will Drewry <wad@xxxxxxxxxxxx>
Signed-off-by: Robert Foss <robert.foss@xxxxxxxxxxxxx>
Tested-by: Robert Foss <robert.foss@xxxxxxxxxxxxx>
---
include/linux/mm.h | 2 ++
kernel/sysctl.c | 9 +++++++++
mm/Kconfig | 17 +++++++++++++++++
mm/mmap.c | 3 ++-
mm/util.c | 1 +
5 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 08ed53e..e2090c5 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -108,6 +108,8 @@ extern int mmap_rnd_compat_bits __read_mostly;

extern int sysctl_max_map_count;

+extern int sysctl_mmap_noexec_taint;
+
extern unsigned long sysctl_user_reserve_kbytes;
extern unsigned long sysctl_admin_reserve_kbytes;

diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index b43d0b2..ab1d714 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1564,6 +1564,15 @@ static struct ctl_table vm_table[] = {
.mode = 0644,
.proc_handler = mmap_min_addr_handler,
},
+ {
+ .procname = "mmap_noexec_taint",
+ .data = &sysctl_mmap_noexec_taint,
+ .maxlen = sizeof(sysctl_mmap_noexec_taint),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_minmax,
+ .extra1 = &zero,
+ .extra2 = &one,
+ },
#endif
#ifdef CONFIG_NUMA
{
diff --git a/mm/Kconfig b/mm/Kconfig
index 78a23c5..08d9bc8 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -353,6 +353,23 @@ config DEFAULT_MMAP_MIN_ADDR
This value can be changed after boot using the
/proc/sys/vm/mmap_min_addr tunable.

+config MMAP_NOEXEC_TAINT
+ int "Turns on tainting of mmap()d files from noexec mountpoints"
+ depends on MMU
+ default 1
+ help
+ By default, the ability to change the protections of a virtual
+ memory area to allow execution depend on if the vma has the
+ VM_MAYEXEC flag. When mapping regions from files, VM_MAYEXEC
+ will be unset if the containing mountpoint is mounted MNT_NOEXEC.
+ By setting the value to 0, any mmap()d region may be later
+ mprotect()d with PROT_EXEC.
+
+ If unsure, keep the value set to 1.
+
+ This value can be changed after boot using the
+ /proc/sys/vm/mmap_noexec_taint tunable.
+
config ARCH_SUPPORTS_MEMORY_FAILURE
bool

diff --git a/mm/mmap.c b/mm/mmap.c
index ca9d91b..b8be093 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1246,7 +1246,8 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
if (path_noexec(&file->f_path)) {
if (vm_flags & VM_EXEC)
return -EPERM;
- vm_flags &= ~VM_MAYEXEC;
+ if (sysctl_mmap_noexec_taint)
+ vm_flags &= ~VM_MAYEXEC;
}

if (!file->f_op->mmap)
diff --git a/mm/util.c b/mm/util.c
index 662cddf..701f0a3 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -430,6 +430,7 @@ int sysctl_overcommit_memory __read_mostly = OVERCOMMIT_GUESS;
int sysctl_overcommit_ratio __read_mostly = 50;
unsigned long sysctl_overcommit_kbytes __read_mostly;
int sysctl_max_map_count __read_mostly = DEFAULT_MAX_MAP_COUNT;
+int sysctl_mmap_noexec_taint __read_mostly = CONFIG_MMAP_NOEXEC_TAINT;
unsigned long sysctl_user_reserve_kbytes __read_mostly = 1UL << 17; /* 128MB */
unsigned long sysctl_admin_reserve_kbytes __read_mostly = 1UL << 13; /* 8MB */

--
2.7.4