Re: [PATCH v2] docs: add system-state document to admin-guide

From: Kate Stewart
Date: Thu Mar 23 2023 - 10:54:02 EST


Looks good Shuah.

On Wed, Mar 22, 2023 at 10:20 AM Shuah Khan <skhan@xxxxxxxxxxxxxxxxxxx> wrote:
>
> Add a new system state document to the admin-guide. This document is
> intended to be used as a guide on how to gather higher level information
> about a system and its run-time activity.
>
> Signed-off-by: Shuah Khan <skhan@xxxxxxxxxxxxxxxxxxx>

Reviewed-by: Kate Stewart <kstewart@xxxxxxxxxxxxxxxxxxx>

> ---
> Changes since v1:
> -- Addressed review comments
>
> Documentation/admin-guide/index.rst | 1 +
> Documentation/admin-guide/system-state.rst | 350 +++++++++++++++++++++
> 2 files changed, 351 insertions(+)
> create mode 100644 Documentation/admin-guide/system-state.rst
>
> diff --git a/Documentation/admin-guide/index.rst b/Documentation/admin-guide/index.rst
> index f475554382e2..541372672c55 100644
> --- a/Documentation/admin-guide/index.rst
> +++ b/Documentation/admin-guide/index.rst
> @@ -66,6 +66,7 @@ subsystems expectations will be found here.
> :maxdepth: 1
>
> workload-tracing
> + system-state
>
> The rest of this manual consists of various unordered guides on how to
> configure specific aspects of kernel behavior to your liking.
> diff --git a/Documentation/admin-guide/system-state.rst b/Documentation/admin-guide/system-state.rst
> new file mode 100644
> index 000000000000..2a6fdf85c35c
> --- /dev/null
> +++ b/Documentation/admin-guide/system-state.rst
> @@ -0,0 +1,350 @@
> +.. SPDX-License-Identifier: (GPL-2.0+ OR CC-BY-4.0)
> +
> +===========================================================
> +Discovering system calls and features supported on a system
> +===========================================================
> +
> +:Author: Shuah Khan <skhan@xxxxxxxxxxxxxxxxxxx>
> +:maintained-by: Shuah Khan <skhan@xxxxxxxxxxxxxxxxxxx>
> +
> +Key Points
> +==========
> +
> + * System state includes system calls, features, static and dynamic
> + modules enabled in the kernel configuration.
> + * Supported system calls and Kernel features are architecture dependent.
> + * auditd, checksyscalls.sh, and get_feat.pl tools can be used to discover
> + static system state.
> + * Understanding Linux kernel hardening configurations options and making
> + sure they are enabled will make a system more secure.
> + * Employing run-time tracing can shed light on the dynamic system state.
> + * Workloads could change the system state by loading and unloading dynamic
> + modules and tuning system parameters.
> +
> +System State Visualization
> +==========================
> +
> +The kernel system state can be viewed as a combination of static and
> +dynamic features and modules. Let’s first define what static and dynamic
> +system states are and then explore how we can visualize the static and
> +dynamic system parts of the kernel.
> +
> +Static System View comprises system calls, features, static and dynamic
> +modules enabled in the kernel configuration. Supported system calls
> +and Kernel features are architecture dependent. System call numbering is
> +different on different architectures. We can get the supported system call
> +information using auditd utilities.
> +
> +ausyscall –dump prints out the supported system calls on a system and allows
> +mapping syscall names and numbers. You can install the auditd package on
> +Debian based systems::
> +
> + sudo apt-get install auditd
> +
> +scripts/checksyscalls.sh can be used to check if current architecture is
> +missing any system calls compared to i386.
> +
> +scripts/get_feat.pl can be used to list the Kernel feature support matrix
> +for an architecture.
> +
> +Dynamic System View comprises system calls, ioctls invoked, and subsystems
> +used during the runtime. A workload could load and unload modules and also
> +change the dynamic system configuration to suit its needs by tuning system
> +parameters.
> +
> +What is the methodology?
> +========================
> +
> +The first step is gathering the default system state such as the dynamic
> +and static modules loaded on the system. lsmod command prints out the
> +dynamically loaded modules on a system. Statically configured modules can
> +be found in the kernel configuration file.
> +
> +The next step is discovering system activity during run-time. You can do so
> +by enabling event tracing and then running your favorite application. After
> +a period of time, gather the event logs, and kernel messages.
> +
> +Once you have the necessary information, you can extract the system call
> +numbers from the event trace log and map them to the supported system calls.
> +
> +Finding supported system calls
> +==============================
> +
> +As mentioned earlier, ausyscall prints out supported system calls
> +on a system and allows mapping syscalls names and numbers::
> +
> + ausyscall --dump
> +
> +You can look for specific system calls as shown in the below::
> +
> + ausyscall open
> + open 2
> + mq_open 240
> + openat 257
> + perf_event_open 298
> + open_by_handle_at 304
> + open_tree 428
> + fsopen 430
> + pidfd_open 434
> + openat2 437
> +
> + ausyscall time
> +
> + getitimer 36
> + setitimer 38
> + gettimeofday 96
> + times 100
> + rt_sigtimedwait 128
> + utime 132
> + adjtimex 159
> + settimeofday 164
> + time 201
> + semtimedop 220
> + timer_create 222
> + timer_settime 223
> + timer_gettime 224
> + timer_getoverrun 225
> + timer_delete 226
> + clock_settime 227
> + clock_gettime 228
> + utimes 235
> + mq_timedsend 242
> + mq_timedreceive 243
> + futimesat 261
> + utimensat 280
> + timerfd_create 283
> + timerfd_settime 286
> + timerfd_gettime 287
> + clock_adjtime 305
> +
> +Finding unsupported system calls
> +================================
> +
> +As mentioned earlier, scripts/checksyscalls.sh checks missing system calls
> +on current architecture compared to i386. Example run::
> +
> + checksyscalls.sh gcc
> + warning: #warning syscall mmap2 not implemented [-Wcpp]
> + warning: #warning syscall truncate64 not implemented [-Wcpp]
> + warning: #warning syscall ftruncate64 not implemented [-Wcpp]
> + warning: #warning syscall fcntl64 not implemented [-Wcpp]
> + warning: #warning syscall sendfile64 not implemented [-Wcpp]
> + warning: #warning syscall statfs64 not implemented [-Wcpp]
> + warning: #warning syscall fstatfs64 not implemented [-Wcpp]
> + warning: #warning syscall fadvise64_64 not implemented [-Wcpp]
> +
> +Let's check this against ausyscall now::
> +
> + ausyscall map
> + mmap 9
> + munmap 11
> + mremap 25
> + remap_file_pages 216
> +
> + ausyscall trunc
> + truncate 76
> + ftruncate 77
> +
> +As you can see, ausyscall shows mmap2, truncate64, and ftruncate64 aren't
> +implemented on this system. This matches what checksyscalls.sh shows.
> +
> +Finding supported features
> +==========================
> +
> +scripts/get_feat.pl can be used to list the Kernel feature support matrix
> +for an architecture::
> +
> + get_feat.pl list
> + get_feat.pl list –arch=arm64 lists
> +
> +This scripts parses Documentation/features to find the support status
> +information. It can be used to validate the contents of the files under
> +Documentation/features or simply list them::
> +
> + --arch Outputs features for an specific architecture, optionally filtering
> + for a single specific feature.
> + --feat or --feature Output features for a single specific feature.
> +
> +Here is how you can find if stackprotector and hread-info-in-task features
> +are supported::
> +
> + scripts/get_feat.pl --arch=arm64 --feat=stackprotector list
> + #
> + # Kernel feature support matrix of the 'arm64' architecture:
> + #
> + debug/ stackprotector : ok | HAVE_STACKPROTECTOR #
> + arch supports compiler driven stack overflow protection
> +
> + scripts/get_feat.pl --feat=thread-info-in-task list
> + #
> + # Kernel feature support matrix of the 'x86' architecture:
> + #
> + core/ thread-info-in-task : ok | THREAD_INFO_IN_TASK #
> + arch makes use of the core kernel facility to embed thread_info in
> + task_struct
> +
> +Finding kernel module status
> +============================
> +
> +lsmod command shows the kernel modules that are currently loaded. This
> +program displays the contents of /proc/modules. Let's pick uvcvideo
> +module which is found on most laptops::
> +
> + lsmod | grep uvc
> + uvcvideo 126976 0
> + videobuf2_vmalloc 20480 1 uvcvideo
> + uvc 16384 1 uvcvideo
> + videobuf2_v4l2 36864 1 uvcvideo
> + videodev 315392 2 videobuf2_v4l2,uvcvideo
> + videobuf2_common 65536 4 videobuf2_vmalloc,videobuf2_v4l2,uvcvideo,videobuf2_memops
> + mc 77824 4 videodev,videobuf2_v4l2,uvcvideo,videobuf2_common
> +
> +You can see that lsmod shows uvcvideo and the modules it depends on and how
> +many modules are using them. videobuf2_common is in use by 4 other modules.
> +In other words, this is the reference count for this module and rmmod will
> +refuse to unload it as long as the reference count is > 0.
> +
> +You can get the same information from /proc.modules::
> +
> + less /proc/modules | grep uvc
> + uvcvideo 126976 0 - Live 0x0000000000000000
> + videobuf2_vmalloc 20480 1 uvcvideo, Live 0x0000000000000000
> + uvc 16384 1 uvcvideo, Live 0x0000000000000000
> + videobuf2_v4l2 36864 1 uvcvideo, Live 0x0000000000000000
> + videodev 315392 2 uvcvideo,videobuf2_v4l2, Live 0x0000000000000000
> + videobuf2_common 65536 4 uvcvideo,videobuf2_vmalloc,videobuf2_memops,videobuf2_v4l2, Live 0x0000000000000000
> + mc 77824 4 uvcvideo,videobuf2_v4l2,videodev,videobuf2_common, Live 0x0000000000000000
> +
> +The information is similar with a few more extra fields. The address is the
> +base address for the module in kernel virtual memory space. When run as a
> +normal user, the address is all zeros. The same command when run as root will
> +be as follows::
> +
> + sudo less /proc/modules | grep uvc
> + uvcvideo 126976 0 - Live 0xffffffffc1c8b000
> + videobuf2_vmalloc 20480 1 uvcvideo, Live 0xffffffffc167f000
> + uvc 16384 1 uvcvideo, Live 0xffffffffc0ab0000
> + videobuf2_v4l2 36864 1 uvcvideo, Live 0xffffffffc0a28000
> + videodev 315392 2 uvcvideo,videobuf2_v4l2, Live 0xffffffffc16e9000
> + videobuf2_common 65536 4 uvcvideo,videobuf2_vmalloc,videobuf2_memops,videobuf2_v4l2, Live 0xffffffffc094d000
> + mc 77824 4 uvcvideo,videobuf2_v4l2,videodev,videobuf2_common, Live 0xffffffffc15eb000
> +
> +Let's check what modinfo shows that is important for us::
> +
> + /sbin/modinfo uvcvideo
> + filename: /lib/modules/6.3.0-rc2/kernel/drivers/media/usb/uvc/uvcvideo.ko
> + license: GPL
> + description: USB Video Class driver
> + depends: videobuf2-v4l2,videodev,mc,uvc,videobuf2-common,videobuf2-vmalloc
> + retpoline: Y
> + intree: Y
> + name: uvcvideo
> + vermagic: 6.3.0-rc2 SMP preempt mod_unload modversions
> + sig_id: PKCS#7
> + signer: Build time autogenerated kernel key
> +
> +This tells us that this module is built intree and the signed with a build
> +time autogenerated key.
> +
> +Let's do one last sanity check on the system to see if the following two
> +command outputs match::
> +
> + ps ax | wc -l
> + ls -d /proc/* | grep [0-9]|wc -l
> +
> +If they don't match, examine your system closely. kernel rootkits install
> +their own ps, find, etc. utilities to mask their activity. The outputs
> +match on my system. Do they on yours?
> +
> +Is my system as secure as it could be?
> +======================================
> +
> +Linux kernel supports several hardening options to make system secure.
> +kconfig-hardened-check tool sanity checks kernel configuration for
> +security. You can clone the latest kconfig-hardened-check repository::
> +
> + git clone https://github.com/a13xp0p0v/kconfig-hardened-check.git
> + cd kconfig-hardened-check
> + bin/kconfig-hardened-check --config <config file> --cmdline /proc/cmdline
> +
> +This will generate detailed report of kernel security configuration and
> +command line options that are enabled (OK) and the ones that aren't (FAIL)
> +and a summary line at the end::
> +
> + [+] Config check is finished: 'OK' - 100 / 'FAIL' - 100
> +
> +You will have to analyze the information to determine which options make
> +sense to enable on your system.
> +
> +Understanding system run-time activity
> +======================================
> +
> +Enabling event tracing gives insight into system run-time activity. This is
> +a good way to identify which parts of the kernel are used at a higher level
> +while system is in and/or while a specific workload/process is running.
> +
> +Event tracing depends on the CONFIG_EVENT_TRACING option enabled. You can
> +enable event tracing before starting workload/process. Event tracing allows
> +you to dynamically enable and disable tracing on supported/available events.
> +You can find available events, tracers, and filter functions in the following
> +files::
> +
> + /sys/kernel/debug/tracing/available_events
> + /sys/kernel/debug/tracing/available_filter_functions
> + /sys/kernel/debug/tracing/available_tracers
> +
> +Now this is how you can enable tracing::
> +
> + sudo echo 1 > /sys/kernel/debug/tracing/events/enable
> +
> +Once the workload/process stops or when you decide you have the status you
> +need, you can disable event tracing::
> +
> + sudo echo 0 > /sys/kernel/debug/tracing/events/enable
> +
> +You can find the tracing information in the file::
> +
> + /sys/kernel/debug/tracing
> +
> +Here is the information shown in this file::
> +
> + cat trace
> + # tracer: nop
> + #
> + # entries-in-buffer/entries-written: 0/0 #P:16
> + #
> + # _-----=> irqs-off/BH-disabled
> + # / _----=> need-resched
> + # | / _---=> hardirq/softirq
> + # || / _--=> preempt-depth
> + # ||| / _-=> migrate-disable
> + # |||| / delay
> + # TASK-PID CPU# ||||| TIMESTAMP FUNCTION
> + # | | | ||||| | |
> +
> +
> +Analyzing traces
> +================
> +
> +You will be able map the functions to system calls and other kernel features
> +to get insight into the overall system activity while a workload/process is
> +running.
> +
> +Map the NR (syscal) numbers from the trace to syscalls from the syscalls dump.
> +Categorize system calls and map them to Linux subsystems.
> +
> +Conclusion
> +==========
> +
> +This document is intended to be used as a guide on how to gather higher level
> +information about a system and its run-time activity. The approach described
> +in this document helps us get insight into supported system calls, features,
> +assess how secure a system is, and its run-time activity.
> +
> +References
> +==========
> +
> + * https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/scripts/checksyscalls.sh
> + * https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/scripts/get_feat.pl
> + * https://github.com/a13xp0p0v/kconfig-hardened-check
> + * https://docs.kernel.org/trace/index.html
> --
> 2.34.1
>