Re: [PATCH v27 12/12] landlock: Add user and kernel documentation

From: Mickaël Salaün
Date: Fri Jan 22 2021 - 05:51:14 EST



On 22/01/2021 09:33, Michael Kerrisk (man-pages) wrote:
> Hello Mickaël,

Hi Michael,

>
> It would be great to have some manual pages for these system calls
> before release... Can you prepare something?

Yes, I will start some pages based on the current documentation, but it
will be some time before seeing Landlock in a (stable) release.

Could you please take a look at the UAPI (patch 8/12)?

>
> Thanks,
>
> Michael
>
> On Thu, 21 Jan 2021 at 21:51, Mickaël Salaün <mic@xxxxxxxxxxx> wrote:
>>
>> From: Mickaël Salaün <mic@xxxxxxxxxxxxxxxxxxx>
>>
>> This documentation can be built with the Sphinx framework.
>>
>> Cc: James Morris <jmorris@xxxxxxxxx>
>> Cc: Jann Horn <jannh@xxxxxxxxxx>
>> Cc: Kees Cook <keescook@xxxxxxxxxxxx>
>> Cc: Serge E. Hallyn <serge@xxxxxxxxxx>
>> Signed-off-by: Mickaël Salaün <mic@xxxxxxxxxxxxxxxxxxx>
>> Reviewed-by: Vincent Dagonneau <vincent.dagonneau@xxxxxxxxxxx>
>> ---
>>
>> Changes since v25:
>> * Explain the behavior of layered access rights.
>> * Explain how bind mounts and overayfs mounts are handled by Landlock:
>> merged overlayfs mount points have their own inodes, which makes these
>> hierarchies independent from its upper and lower layers, unlike bind
>> mounts which share the same inodes between the source hierarchy and
>> the mount point hierarchy.
>> New overlayfs mount and bind mount tests check these behaviors.
>> * Synchronize with the new syscalls.c file and update syscall names.
>> * Fix spelling.
>> * Remove Reviewed-by Jann Horn because of the above changes.
>>
>> Changes since v24:
>> * Add Reviewed-by Jann Horn.
>> * Add a paragraph to explain how the ruleset layers work.
>> * Bump date.
>>
>> Changes since v23:
>> * Explain limitations for the maximum number of stacked ruleset, and the
>> memory usage restrictions.
>>
>> Changes since v22:
>> * Fix spelling and remove obsolete sentence (spotted by Jann Horn).
>> * Bump date.
>>
>> Changes since v21:
>> * Move the user space documentation to userspace-api/landlock.rst and
>> the kernel documentation to security/landlock.rst .
>> * Add license headers.
>> * Add last update dates.
>> * Update MAINTAINERS file.
>> * Add (back) links to git.kernel.org .
>> * Fix spelling.
>>
>> Changes since v20:
>> * Update examples and documentation with the new syscalls.
>>
>> Changes since v19:
>> * Update examples and documentation with the new syscalls.
>>
>> Changes since v15:
>> * Add current limitations.
>>
>> Changes since v14:
>> * Fix spelling (contributed by Randy Dunlap).
>> * Extend documentation about inheritance and explain layer levels.
>> * Remove the use of now-removed access rights.
>> * Use GitHub links.
>> * Improve kernel documentation.
>> * Add section for tests.
>> * Update example.
>>
>> Changes since v13:
>> * Rewrote the documentation according to the major revamp.
>>
>> Previous changes:
>> https://lore.kernel.org/lkml/20191104172146.30797-8-mic@xxxxxxxxxxx/
>> ---
>> Documentation/security/index.rst | 1 +
>> Documentation/security/landlock.rst | 79 ++++++
>> Documentation/userspace-api/index.rst | 1 +
>> Documentation/userspace-api/landlock.rst | 306 +++++++++++++++++++++++
>> MAINTAINERS | 2 +
>> 5 files changed, 389 insertions(+)
>> create mode 100644 Documentation/security/landlock.rst
>> create mode 100644 Documentation/userspace-api/landlock.rst
>>
>> diff --git a/Documentation/security/index.rst b/Documentation/security/index.rst
>> index 8129405eb2cc..16335de04e8c 100644
>> --- a/Documentation/security/index.rst
>> +++ b/Documentation/security/index.rst
>> @@ -16,3 +16,4 @@ Security Documentation
>> siphash
>> tpm/index
>> digsig
>> + landlock
>> diff --git a/Documentation/security/landlock.rst b/Documentation/security/landlock.rst
>> new file mode 100644
>> index 000000000000..244e616d3d7a
>> --- /dev/null
>> +++ b/Documentation/security/landlock.rst
>> @@ -0,0 +1,79 @@
>> +.. SPDX-License-Identifier: GPL-2.0
>> +.. Copyright © 2017-2020 Mickaël Salaün <mic@xxxxxxxxxxx>
>> +.. Copyright © 2019-2020 ANSSI
>> +
>> +==================================
>> +Landlock LSM: kernel documentation
>> +==================================
>> +
>> +:Author: Mickaël Salaün
>> +:Date: January 2021
>> +
>> +Landlock's goal is to create scoped access-control (i.e. sandboxing). To
>> +harden a whole system, this feature should be available to any process,
>> +including unprivileged ones. Because such process may be compromised or
>> +backdoored (i.e. untrusted), Landlock's features must be safe to use from the
>> +kernel and other processes point of view. Landlock's interface must therefore
>> +expose a minimal attack surface.
>> +
>> +Landlock is designed to be usable by unprivileged processes while following the
>> +system security policy enforced by other access control mechanisms (e.g. DAC,
>> +LSM). Indeed, a Landlock rule shall not interfere with other access-controls
>> +enforced on the system, only add more restrictions.
>> +
>> +Any user can enforce Landlock rulesets on their processes. They are merged and
>> +evaluated according to the inherited ones in a way that ensures that only more
>> +constraints can be added.
>> +
>> +User space documentation can be found here: :doc:`/userspace-api/landlock`.
>> +
>> +Guiding principles for safe access controls
>> +===========================================
>> +
>> +* A Landlock rule shall be focused on access control on kernel objects instead
>> + of syscall filtering (i.e. syscall arguments), which is the purpose of
>> + seccomp-bpf.
>> +* To avoid multiple kinds of side-channel attacks (e.g. leak of security
>> + policies, CPU-based attacks), Landlock rules shall not be able to
>> + programmatically communicate with user space.
>> +* Kernel access check shall not slow down access request from unsandboxed
>> + processes.
>> +* Computation related to Landlock operations (e.g. enforcing a ruleset) shall
>> + only impact the processes requesting them.
>> +
>> +Tests
>> +=====
>> +
>> +Userspace tests for backward compatibility, ptrace restrictions and filesystem
>> +support can be found here: `tools/testing/selftests/landlock/`_.
>> +
>> +Kernel structures
>> +=================
>> +
>> +Object
>> +------
>> +
>> +.. kernel-doc:: security/landlock/object.h
>> + :identifiers:
>> +
>> +Ruleset and domain
>> +------------------
>> +
>> +A domain is a read-only ruleset tied to a set of subjects (i.e. tasks'
>> +credentials). Each time a ruleset is enforced on a task, the current domain is
>> +duplicated and the ruleset is imported as a new layer of rules in the new
>> +domain. Indeed, once in a domain, each rule is tied to a layer level. To
>> +grant access to an object, at least one rule of each layer must allow the
>> +requested action on the object. A task can then only transit to a new domain
>> +that is the intersection of the constraints from the current domain and those
>> +of a ruleset provided by the task.
>> +
>> +The definition of a subject is implicit for a task sandboxing itself, which
>> +makes the reasoning much easier and helps avoid pitfalls.
>> +
>> +.. kernel-doc:: security/landlock/ruleset.h
>> + :identifiers:
>> +
>> +.. Links
>> +.. _tools/testing/selftests/landlock/:
>> + https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/tools/testing/selftests/landlock/
>> diff --git a/Documentation/userspace-api/index.rst b/Documentation/userspace-api/index.rst
>> index acd2cc2a538d..01f1748ab569 100644
>> --- a/Documentation/userspace-api/index.rst
>> +++ b/Documentation/userspace-api/index.rst
>> @@ -18,6 +18,7 @@ place where this information is gathered.
>>
>> no_new_privs
>> seccomp_filter
>> + landlock
>> unshare
>> spec_ctrl
>> accelerators/ocxl
>> diff --git a/Documentation/userspace-api/landlock.rst b/Documentation/userspace-api/landlock.rst
>> new file mode 100644
>> index 000000000000..06c16f2b038b
>> --- /dev/null
>> +++ b/Documentation/userspace-api/landlock.rst
>> @@ -0,0 +1,306 @@
>> +.. SPDX-License-Identifier: GPL-2.0
>> +.. Copyright © 2017-2020 Mickaël Salaün <mic@xxxxxxxxxxx>
>> +.. Copyright © 2019-2020 ANSSI
>> +
>> +=====================================
>> +Landlock: unprivileged access control
>> +=====================================
>> +
>> +:Author: Mickaël Salaün
>> +:Date: January 2021
>> +
>> +The goal of Landlock is to enable to restrict ambient rights (e.g. global
>> +filesystem access) for a set of processes. Because Landlock is a stackable
>> +LSM, it makes possible to create safe security sandboxes as new security layers
>> +in addition to the existing system-wide access-controls. This kind of sandbox
>> +is expected to help mitigate the security impact of bugs or
>> +unexpected/malicious behaviors in user space applications. Landlock empowers
>> +any process, including unprivileged ones, to securely restrict themselves.
>> +
>> +Landlock rules
>> +==============
>> +
>> +A Landlock rule describes an action on an object. An object is currently a
>> +file hierarchy, and the related filesystem actions are defined in `Access
>> +rights`_. A set of rules is aggregated in a ruleset, which can then restrict
>> +the thread enforcing it, and its future children.
>> +
>> +Defining and enforcing a security policy
>> +----------------------------------------
>> +
>> +We first need to create the ruleset that will contain our rules. For this
>> +example, the ruleset will contain rules that only allow read actions, but write
>> +actions will be denied. The ruleset then needs to handle both of these kind of
>> +actions.
>> +
>> +.. code-block:: c
>> +
>> + int ruleset_fd;
>> + struct landlock_ruleset_attr ruleset_attr = {
>> + .handled_access_fs =
>> + LANDLOCK_ACCESS_FS_EXECUTE |
>> + LANDLOCK_ACCESS_FS_WRITE_FILE |
>> + LANDLOCK_ACCESS_FS_READ_FILE |
>> + LANDLOCK_ACCESS_FS_READ_DIR |
>> + LANDLOCK_ACCESS_FS_REMOVE_DIR |
>> + LANDLOCK_ACCESS_FS_REMOVE_FILE |
>> + LANDLOCK_ACCESS_FS_MAKE_CHAR |
>> + LANDLOCK_ACCESS_FS_MAKE_DIR |
>> + LANDLOCK_ACCESS_FS_MAKE_REG |
>> + LANDLOCK_ACCESS_FS_MAKE_SOCK |
>> + LANDLOCK_ACCESS_FS_MAKE_FIFO |
>> + LANDLOCK_ACCESS_FS_MAKE_BLOCK |
>> + LANDLOCK_ACCESS_FS_MAKE_SYM,
>> + };
>> +
>> + ruleset_fd = landlock_create_ruleset(&ruleset_attr, sizeof(ruleset_attr), 0);
>> + if (ruleset_fd < 0) {
>> + perror("Failed to create a ruleset");
>> + return 1;
>> + }
>> +
>> +We can now add a new rule to this ruleset thanks to the returned file
>> +descriptor referring to this ruleset. The rule will only allow reading the
>> +file hierarchy ``/usr``. Without another rule, write actions would then be
>> +denied by the ruleset. To add ``/usr`` to the ruleset, we open it with the
>> +``O_PATH`` flag and fill the &struct landlock_path_beneath_attr with this file
>> +descriptor.
>> +
>> +.. code-block:: c
>> +
>> + int err;
>> + struct landlock_path_beneath_attr path_beneath = {
>> + .allowed_access =
>> + LANDLOCK_ACCESS_FS_EXECUTE |
>> + LANDLOCK_ACCESS_FS_READ_FILE |
>> + LANDLOCK_ACCESS_FS_READ_DIR,
>> + };
>> +
>> + path_beneath.parent_fd = open("/usr", O_PATH | O_CLOEXEC);
>> + if (path_beneath.parent_fd < 0) {
>> + perror("Failed to open file");
>> + close(ruleset_fd);
>> + return 1;
>> + }
>> + err = landlock_add_rule(ruleset_fd, LANDLOCK_RULE_PATH_BENEATH,
>> + &path_beneath, 0);
>> + close(path_beneath.parent_fd);
>> + if (err) {
>> + perror("Failed to update ruleset");
>> + close(ruleset_fd);
>> + return 1;
>> + }
>> +
>> +We now have a ruleset with one rule allowing read access to ``/usr`` while
>> +denying all other handled accesses for the filesystem. The next step is to
>> +restrict the current thread from gaining more privileges (e.g. thanks to a SUID
>> +binary).
>> +
>> +.. code-block:: c
>> +
>> + if (prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)) {
>> + perror("Failed to restrict privileges");
>> + close(ruleset_fd);
>> + return 1;
>> + }
>> +
>> +The current thread is now ready to sandbox itself with the ruleset.
>> +
>> +.. code-block:: c
>> +
>> + if (landlock_enforce_ruleset_self(ruleset_fd, 0)) {
>> + perror("Failed to enforce ruleset");
>> + close(ruleset_fd);
>> + return 1;
>> + }
>> + close(ruleset_fd);
>> +
>> +If the `landlock_enforce_ruleset_self` system call succeeds, the current thread
>> +is now restricted and this policy will be enforced on all its subsequently
>> +created children as well. Once a thread is landlocked, there is no way to
>> +remove its security policy; only adding more restrictions is allowed. These
>> +threads are now in a new Landlock domain, merge of their parent one (if any)
>> +with the new ruleset.
>> +
>> +Full working code can be found in `samples/landlock/sandboxer.c`_.
>> +
>> +Layers of file path access rights
>> +---------------------------------
>> +
>> +Each time a thread enforces a ruleset on itself, it updates its Landlock domain
>> +with a new layer of policy. Indeed, this complementary policy is stacked with
>> +the potentially other rulesets already restricting this thread. A sandboxed
>> +thread can then safely add more constraints to itself with a new enforced
>> +ruleset.
>> +
>> +One policy layer grants access to a file path if at least one of its rules
>> +encountered on the path grants the access. A sandboxed thread can only access
>> +a file path if all its enforced policy layers grant the access as well as all
>> +the other system access controls (e.g. filesystem DAC, other LSM policies,
>> +etc.).
>> +
>> +Bind mounts and OverlayFS
>> +-------------------------
>> +
>> +Landlock enables to restrict access to file hierarchies, which means that these
>> +access rights can be propagated with bind mounts (cf.
>> +:doc:`/filesystems/sharedsubtree`) but not with :doc:`/filesystems/overlayfs`.
>> +
>> +A bind mount mirrors a source file hierarchy to a destination. The destination
>> +hierarchy is then composed of the exact same files, on which Landlock rules can
>> +be tied, either via the source or the destination path. These rules restrict
>> +access when they are encountered on a path, which means that they can restrict
>> +access to multiple file hierarchies at the same time, whether these hierarchies
>> +are the result of bind mounts or not.
>> +
>> +An OverlayFS mount point consists of upper and lower layers. These layers are
>> +combined in a merge directory, result of the mount point. This merge hierarchy
>> +may include files from the upper and lower layers, but modifications performed
>> +on the merge hierarchy only reflects on the upper layer. From a Landlock
>> +policy point of view, each OverlayFS layers and merge hierarchies are
>> +standalone and contains their own set of files and directories, which is
>> +different from bind mounts. A policy restricting an OverlayFS layer will not
>> +restrict the resulted merged hierarchy, and vice versa.
>> +
>> +Inheritance
>> +-----------
>> +
>> +Every new thread resulting from a :manpage:`clone(2)` inherits Landlock domain
>> +restrictions from its parent. This is similar to the seccomp inheritance (cf.
>> +:doc:`/userspace-api/seccomp_filter`) or any other LSM dealing with task's
>> +:manpage:`credentials(7)`. For instance, one process's thread may apply
>> +Landlock rules to itself, but they will not be automatically applied to other
>> +sibling threads (unlike POSIX thread credential changes, cf.
>> +:manpage:`nptl(7)`).
>> +
>> +When a thread sandboxes itself, we have the guarantee that the related security
>> +policy will stay enforced on all this thread's descendants. This allows
>> +creating standalone and modular security policies per application, which will
>> +automatically be composed between themselves according to their runtime parent
>> +policies.
>> +
>> +Ptrace restrictions
>> +-------------------
>> +
>> +A sandboxed process has less privileges than a non-sandboxed process and must
>> +then be subject to additional restrictions when manipulating another process.
>> +To be allowed to use :manpage:`ptrace(2)` and related syscalls on a target
>> +process, a sandboxed process should have a subset of the target process rules,
>> +which means the tracee must be in a sub-domain of the tracer.
>> +
>> +Kernel interface
>> +================
>> +
>> +Access rights
>> +-------------
>> +
>> +.. kernel-doc:: include/uapi/linux/landlock.h
>> + :identifiers: fs_access
>> +
>> +Creating a new ruleset
>> +----------------------
>> +
>> +.. kernel-doc:: security/landlock/syscalls.c
>> + :identifiers: sys_landlock_create_ruleset
>> +
>> +.. kernel-doc:: include/uapi/linux/landlock.h
>> + :identifiers: landlock_ruleset_attr
>> +
>> +Extending a ruleset
>> +-------------------
>> +
>> +.. kernel-doc:: security/landlock/syscalls.c
>> + :identifiers: sys_landlock_add_rule
>> +
>> +.. kernel-doc:: include/uapi/linux/landlock.h
>> + :identifiers: landlock_rule_type landlock_path_beneath_attr
>> +
>> +Enforcing a ruleset
>> +-------------------
>> +
>> +.. kernel-doc:: security/landlock/syscalls.c
>> + :identifiers: sys_landlock_enforce_ruleset_self
>> +
>> +Current limitations
>> +===================
>> +
>> +Ruleset layers
>> +--------------
>> +
>> +There is a limit of 64 layers of stacked rulesets. This can be an issue for a
>> +task willing to enforce a new ruleset in complement to its 64 inherited
>> +rulesets. Once this limit is reached, sys_landlock_enforce_ruleset_self()
>> +returns E2BIG. It is then strongly suggested to carefully build rulesets once
>> +in the life of a thread, especially for applications able to launch other
>> +applications that may also want to sandbox themselves (e.g. shells, container
>> +managers, etc.).
>> +
>> +Memory usage
>> +------------
>> +
>> +Kernel memory allocated to create rulesets is accounted and can be restricted
>> +by the :doc:`/admin-guide/cgroup-v1/memory`.
>> +
>> +File renaming and linking
>> +-------------------------
>> +
>> +Because Landlock targets unprivileged access controls, it is needed to properly
>> +handle composition of rules. Such property also implies rules nesting.
>> +Properly handling multiple layers of ruleset, each one of them able to restrict
>> +access to files, also implies to inherit the ruleset restrictions from a parent
>> +to its hierarchy. Because files are identified and restricted by their
>> +hierarchy, moving or linking a file from one directory to another implies to
>> +propagate the hierarchy constraints. To protect against privilege escalations
>> +through renaming or linking, and for the sack of simplicity, Landlock currently
>> +limits linking and renaming to the same directory. Future Landlock evolutions
>> +will enable more flexibility for renaming and linking, with dedicated ruleset
>> +flags.
>> +
>> +Filesystem layout modification
>> +------------------------------
>> +
>> +As for file renaming and linking, a sandboxed thread cannot modify its
>> +filesystem layout, whether via :manpage:`mount(2)` or :manpage:`pivot_root(2)`.
>> +However, :manpage:`chroot(2)` calls are not denied.
>> +
>> +Special filesystems
>> +-------------------
>> +
>> +Access to regular files and directories can be restricted by Landlock,
>> +according to the handled accesses of a ruleset. However, files that do not
>> +come from a user-visible filesystem (e.g. pipe, socket), but can still be
>> +accessed through /proc/self/fd/, cannot currently be restricted. Likewise,
>> +some special kernel filesystems such as nsfs, which can be accessed through
>> +/proc/self/ns/, cannot currently be restricted. For now, these kind of special
>> +paths are then always allowed. Future Landlock evolutions will enable to
>> +restrict such paths with dedicated ruleset flags.
>> +
>> +Questions and answers
>> +=====================
>> +
>> +What about user space sandbox managers?
>> +---------------------------------------
>> +
>> +Using user space process to enforce restrictions on kernel resources can lead
>> +to race conditions or inconsistent evaluations (i.e. `Incorrect mirroring of
>> +the OS code and state
>> +<https://www.ndss-symposium.org/ndss2003/traps-and-pitfalls-practical-problems-system-call-interposition-based-security-tools/>`_).
>> +
>> +What about namespaces and containers?
>> +-------------------------------------
>> +
>> +Namespaces can help create sandboxes but they are not designed for
>> +access-control and then miss useful features for such use case (e.g. no
>> +fine-grained restrictions). Moreover, their complexity can lead to security
>> +issues, especially when untrusted processes can manipulate them (cf.
>> +`Controlling access to user namespaces <https://lwn.net/Articles/673597/>`_).
>> +
>> +Additional documentation
>> +========================
>> +
>> +* :doc:`/security/landlock`
>> +* https://landlock.io
>> +
>> +.. Links
>> +.. _samples/landlock/sandboxer.c:
>> + https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/samples/landlock/sandboxer.c
>> diff --git a/MAINTAINERS b/MAINTAINERS
>> index 572e4288c60f..68276c73c33e 100644
>> --- a/MAINTAINERS
>> +++ b/MAINTAINERS
>> @@ -9942,6 +9942,8 @@ L: linux-security-module@xxxxxxxxxxxxxxx
>> S: Supported
>> W: https://landlock.io
>> T: git https://github.com/landlock-lsm/linux.git
>> +F: Documentation/security/landlock.rst
>> +F: Documentation/userspace-api/landlock.rst
>> F: include/uapi/linux/landlock.h
>> F: security/landlock/
>> K: landlock
>> --
>> 2.30.0
>>
>
>