On Mon, Mar 11, 2024 at 04:00:54PM +0100, Vegard Nossum wrote:
+==================================
+Assessing security vulnerabilities
+==================================
+
+:Author: Vegard Nossum <vegard.nossum@xxxxxxxxxx>
+
+This document is intended for distributions and others who want to assess
+the severity of the bugs fixed by Linux kernel patches.
Perhaps add, "... when it is infeasible to track a stable Linux
release."
+We could consider *everything* a security issue until proven otherwise, or we
Who is "we" here (and through-out)?
+What is a vulnerability?
+========================
+
+For the purposes of this document, we consider all bugfixes to be
+potential vulnerabilities. This is because, as stated in
The CVE definition makes a distinction here, instead calling a
software flaw with security considerations a "weakness" rather than
"vulnerability". I find "weakness" more in line with people's thinking
about attack chains.
+Documentation/process/cve.rst, whether a bug is exploitable or not
+depends highly on the context (threat model, platform/hardware,
+kernel configuration, boot parameters, runtime configuration,
+connected peripherals, etc.).
Exploitability is an even higher bar, and tends to be unable to
disprove.
+2. **Common configurations**: assuming kernel defaults, taking into
+ account hardware prevalence, etc.
I'm not sure I'd call this "Common", I'd say "Kernel default configuration"
+3. **Distro-specific configuration** and defaults: This assessment of a
+ bugfix takes into account a specific kernel configuration and the
+ distro's own assumptions about how the system is configured and
+ intended to be used.
And this just "Distro default configuration".
+4. **Specific use case** for a single user or deployment: Here we can make
+ precise assumptions about what kernel features are in use and whether
+ physical access is possible.
i.e. a configuration that differs from distro default.
+Latent vulnerabilities
+----------------------
+
+It is worth mentioning the possibility of latent vulnerabilities:
+These are code "defects" which technically cannot be exploited on any
+current hardware, configuration, or scenario, but which should nevertheless
+be fixed since they represent a possible future vulnerability if other
+parts of the code change.
I take pedantic issue with "cannot be exploited". Again, "exploit" is a
high bar.
Also, why should hardware limit this? If a "latent vulnerability"
becomes part of an attack chain on some future hardware, and we saw it
was a weakness at the time it landed it stable, it should have gotten
a CVE then, yes?
+An example of latent vulnerabilities is the failure to check the return
+value of kmalloc() for small memory allocations: as of early 2024, these
+are `well-known to never fail in practice <https://lwn.net/Articles/627419/>`_
+and are thus not exploitable and not technically vulnerabilities. If this
+rule were to change (because of changes to the memory allocator), then these
+would become true vulnerabilities.
But for kernel prior to that, it IS an issue, yes? And what does "in
practice" mean? Does that include a system under attack that is being
actively manipulated?
+We recommend that a "worst-case scenario" assessment don't consider latent
+vulnerabilities as actual vulnerabilities since this is a slippery slope
I wouldn't use the language "actual", but rather reword this from the
perspective of severity. Triage of severity is what is at issue, yes?
+where eventually all changes can be considered a vulnerability in some sense
+or another; in that case, we've thrown the baby out with the bath water and
+rendered assessment completely useless.
I don't find this to be true at all. Distro triage of kernel bug fixes
isn't binary: it'll always be severity based. Many will be 0, yes, but
it is up to the specific deployment to figure out where their cut line
is if they're not just taking all fixes.
+Types of bugs
+=============
+
+There are many ways to classify types of bugs into broad categories. Two
+ways that we'll cover here are in terms of the outcome (i.e. what an
+attacker could do in the worst case) and in terms of the source defect.
Before breaking this down into examples, I would start with a summary of
the more basic security impact categories: Confidentiality, Integrity,
and Availability, as mapping example back to these can be useful in
understanding what a bug is, or can be expanded to.
+
+In terms of outcome:
+
+- **local DoS** (Denial of Service): a local user is able to crash the
+ machine or make it unusable in some way
+
+- **remote DoS**: another machine on the network is able to crash the
+ machine or make it unusable in some way
+
+- **local privilege escalation**: a local user is able to become root,
+ gain capabilities, or more generally become another user with more
+ privileges than the original user
+
+- **kernel code execution**: the attacker is able to run arbitrary code
+ in a kernel context; this is largely equivalent to a privilege escalation
+ to root
Yes, uid 0 and kernel context are distinct. I don't think I'd say
"largely equivalent" though. Perhaps "Note that root access in many
configurations is equivalent to kernel code execution".
+- **information leak**: the attacker is able to obtain sensitive information
Instead of "leak", please use the less ambiguous word for this, which is
"exposure". The word "leak" is often confused with resource leaks. This
is especially true for language like "memory leak" (... is this content
exposure or resource drain?)
+ (secret keys, access to another user's memory or processes, etc.)
+
+- **kernel address leak**: a subset of information leaks; this can lead to
+ KASLR bypass, usually just one step as part of an exploit chain.
Again, "exposure".
+
+In terms of source defect:
These are also very specific. Perhaps a summary of higher level issues:
Spatial safety, temporal safety, arithmetic safety, logic errors, etc.
+A useful rule of thumb is that anything that can cause invalid memory
+dereferences is a potential privilege escalation bug.
Even an "unexpected" dereference. :)
+To calculate a final CVSS score (value from 0 to 10), use a calculator
+such as `<https://www.first.org/cvss/calculator/>`_ (also includes detailed
+explanations of each metric and its possible values).
Why not NIST's website directly?
+A distro may wish to start by checking whether the file(s) being patched
+are even compiled into their kernel; if not, congrats! You're not vulnerable
+and don't really need to carry out a more detailed analysis.
+
+For things like loadable modules (e.g. device drivers for obscure hardware)
+and runtime parameters you might have a large segment of users who are not
+vulnerable by default.
These 2 paragraphs seem more suited for the Reachability section?
+Reachability analysis
+=====================
+
+One of the most frequent tasks for evaluating a security issue will be to
+figure out how the buggy code can be triggered. Usually this will mean
+starting with the function(s) being patched and working backwards through
+callers to figure out where the code is ultimately called from. Sometimes
+this will be a system call, but may also be timer callbacks, workqueue
+items, interrupt handlers, etc. Tools like `cscope <https://en.wikipedia.org/wiki/Cscope>`_
+(or just plain ``git grep``) can be used to help untangle these callchains.
Before even this, is just simply looking at whether it was built,
whether it was shipped, if a CONFIG exposed the feature, etc.
+Examples
+========
+
+In the following examples, we give scores from a "worst case" context,
...for an generic distro...
+i.e. assuming the hardware/platform is in use, the driver is compiled
+in, mitigations are disabled or otherwise ineffective, etc.
+
+**Commit 72d9b9747e78 ("ACPI: extlog: fix NULL pointer dereference check")**:
+
+ The first thing to notice is that the code here is in an ``__exit``
+ function, meaning that it can only run when the module is unloaded
+ (the ``mod->exit()`` call in the delete_module() system call) --
+ inspecting this function reveals that it is restricted to processes
+ with the ``CAP_SYS_MODULE`` capability, meaning you already need
+ quite high privileges to trigger the bug.
+
+ The bug itself is that a pointer is dereferenced before it has been
+ checked to be non-NULL. Without deeper analysis we can't really know
+ whether it is even possible for the pointer to be NULL at this point,
+ although the presence of a check is a good indication that it may be.
+ By grepping for ``extlog_l1_addr``, we see that it is assigned in the
+ corresponding module_init() function and moreover that the only way
+ it can be NULL is if the module failed to load in the first place.
+ Since module_exit() functions are not called on module_init() failure
+ we can conclude that this is not a vulnerability.
Sounds right.
+**Commit 27e56f59bab5 ("UBSAN: array-index-out-of-bounds in dtSplitRoot")**:
+
+ Right away we notice that this is a filesystem bug in jfs. There is a
+ stack trace showing that the code is coming from the mkdirat() system
+ call. This means you can likely trigger this as a regular user, given
+ that a suitable jfs filesystem has been mounted. Since this is a bug
+ found by syzkaller, we can follow the link in the changelog and find
+ the reproducer. By looking at the reproducer we can see that it almost
+ certainly mounts a corrupted filesystem image.
+
+ When filesystems are involved, the most common scenario is probably
+ when a user has privileges to mount filesystem images in the context
+ of a desktop environment that allows the logged-in user to mount
+ attached USB drives, for example. In this case, physical access would
+ also be necessary, which would make this Attack Vector **Physical**
+ and User Interaction **Required**.
+
+ Another scenario is where a malicious filesystem image is passed to a
+ legitimate user who then unwittingly mounts it and runs
+ mkdir()/mkdirat() to trigger the bug. This would clearly be User
+ Interaction **Required**, but it's not so clear what the Attack Vector
+ would be -- let's call it **Physical**, which is the least severe of
+ the options given to us by CVSS, even though it's not a true physical
+ attack.
"let's call it" -> "For a distro that doesn't have tools that will mount
filesystem images"... I'm not sure if "Physical" is "worst case" :)
+ This is an out-of-bounds memory access, so without doing a much deeper
+ analysis we should assume it could potentially lead to privilege
+ escalation, so Scope **Changed**, Confidentiality **High**, Integrity
+ **High**, and Availability **High**.
+
+ Since regular users can't normally mount arbitrary filesystems, we can
+ set Attack Complexity **High** and Privileges **Required**.
Why not? Many distros ship without automounters for inserted media. Some
docker tooling will mount filesystem images.
+ If we also set Exploit Code Maturity **Unproven**, we end up with the
+ following CVSSv3.1 vector:
+
+ - CVSS:3.1/AV:P/AC:H/PR:H/UI:R/S:C/C:H/I:H/A:H/E:U (6.2 - Medium)
+
+ If this score seems high, keep in mind that this is a worst case
+ scenario. In a more specific scenario, jfs might be disabled in the
+ kernel config or there is no way for non-root users to mount any
+ filesystem.
Your worst and mine are very different. ;)
+**Commit b988b1bb0053 ("KVM: s390: fix setting of fpc register")**:
+
+ From the changelog: "corruption of the fpc register of the host process"
+ and "the host process will incorrectly continue to run with the value
+ that was supposed to be used for a guest cpu".
+
+ This makes it clear that a guest can partially take control of the
+ host process (presumably the host process running the KVM), which would
+ be a privilege escalation of sorts -- however, since this is corruption
+ of floating-point registers and not a memory error, it is highly
+ unlikely to be exploitable beyond DoS in practice (even then, it is
+ questionable whether the DoS impacts anything beyond the KVM process
+ itself).
+
+ Because an attack would be difficult to pull off, we propose Attack
+ Complexity **High**, and because there isn't a clear or likely path to
+ anything beyond DoS, we'll select Confidentiality **None**, Integrity
+ **Low** and Availability **Low**.
+
+ We suggest the following CVSSv3.1 vector:
+
+ - CVSS:3.1/AV:L/AC:H/PR:N/UI:N/S:U/C:N/I:L/A:L/E:U (3.7 - Low)
Though for many distros this issue will be a non-issue unless they ship
s390...