[PATCH 1/1] docs: Revamp tainted-kernels.rst to make it more comprehensible

From: Thorsten Leemhuis
Date: Mon Dec 17 2018 - 10:20:48 EST


Add a section about /proc/sys/kernel/tainted and a command that decodes it to
Documentation/admin-guide/tainted-kernels.rst. While at it introduce a table
that shows the various bits as well as the letters used in oops and panic
messages. Make the document more user focused and easier to understand, too.

Backstory: While working updating reporting-bugs.rst I noticed there is no easy
comprehensible document showing how to check if or why the running kernel might
be tainted. That's why I wrote a section with a small python command to decodes
/proc/sys/kernel/tainted. I suspect there is a more elegant and shorter command
to archive the same, which still works on common machines out of the box;
please let me know if you know such a command.

While putting that section in place I ended up writing an easier understandable
intro and a hopefully better explanation for the tainted flags in bugs, oops or
panics messages. Only thing missing then was a table that quickly describes the
various bits and the taint flags before going into more detail, so I added that
as well.

Signed-off-by: Thorsten Leemhuis <linux@xxxxxxxxxxxxx>
---
Documentation/admin-guide/tainted-kernels.rst | 105 ++++++++++++++++--
1 file changed, 96 insertions(+), 9 deletions(-)

diff --git a/Documentation/admin-guide/tainted-kernels.rst b/Documentation/admin-guide/tainted-kernels.rst
index 28a869c509a0..aabd307a178a 100644
--- a/Documentation/admin-guide/tainted-kernels.rst
+++ b/Documentation/admin-guide/tainted-kernels.rst
@@ -1,10 +1,102 @@
Tainted kernels
---------------

-Some oops reports contain the string **'Tainted: '** after the program
-counter. This indicates that the kernel has been tainted by some
-mechanism. The string is followed by a series of position-sensitive
-characters, each representing a particular tainted value.
+The kernel will mark itself as 'tainted' when something occurs that
+might be relevant later when investigating problems. Don't worry
+yourself too much about this, most of the time it's not a problem to run
+a tainted kernel; the information is mainly of interest once someone
+wants to investigate some problem, as its real cause might be the event
+that got the kernel tainted. That's why the kernel will remain tainted
+even after you undo what caused the taint (i.e. unload a proprietary
+kernel module), to indicate the kernel remains not trustworthy. That's
+also why the kernel will print the tainted state when it noticed
+ainternal problem (a 'kernel bug'), a recoverable error ('kernel oops')
+or a nonrecoverable error ('kernel panic') and writes debug information
+about this to the logs ``dmesg`` outputs. It's also possible to check
+the tainted state at runtime through a file in ``/proc/``.
+
+
+Tainted flag in bugs, oops or panics messages
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+You find the tainted state near the top after the list of loaded
+modules. The state is part of the line that begins with mentioning CPU
+('CPU:'), Process ID ('PID:'), and a shorted name of the executed
+command ('Comm:') that triggered the event. When followed by **'Not
+tainted: '** the kernel was not tainted at the time of the event; if it
+was, then it will print **'Tainted: '** and characters either letters or
+blanks. The meaning of those characters is explained in below table. The
+output for example might state '``Tainted: P WO``' when the kernel got
+tainted earlier because a proprietary Module (``P``) was loaded, a
+warning occurred (``W``), and an externally-built module was loaded
+(``O``). To decode other letters use below table.
+
+
+Decoding tainted state at runtime
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+At runtime, you can query the tainted state by reading
+``/proc/sys/kernel/tainted``. If that returns ``0``, the kernel is not
+tainted; any other number indicates the reasons why it is. You might
+find that number in below table if there was only one reason that got
+the kernel tainted. If there were multiple reasons you need to decode
+the number, as it is a bitfield, where each bit indicates the absence or
+presence of a particular type of taint. You can use the following python
+command to decode::
+
+ $ python3 -c 'from pprint import pprint; from itertools import zip_longest; pprint(list(zip_longest(range(1,17), reversed(bin(int(open("/proc/sys/kernel/tainted").read()))[2:]),fillvalue="0")))'
+ [(1, '1'),
+ (2, '0'),
+ (3, '0'),
+ (4, '0'),
+ (5, '0'),
+ (6, '0'),
+ (7, '0'),
+ (8, '0'),
+ (9, '0'),
+ (10, '1'),
+ (11, '0'),
+ (12, '0'),
+ (13, '1'),
+ (14, '0'),
+ (15, '0'),
+ (16, '0')]
+
+In this case ``/proc/sys/kernel/tainted`` contained ``4609``, as the
+kernel got tainted because a proprietary Module (Bit 1) got loaded, a
+warning occurred (Bit 10), and an externally-built module got loaded
+(Bit 13). To decode other bits use below table.
+
+
+Table for decoding tainted state
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+=== === ====== ========================================================
+Bit Log Int Reason that got the kernel tainted
+=== === ====== ========================================================
+ 1) G/P 0 proprietary module got loaded
+ 2) _/F 2 module was force loaded
+ 3) _/S 4 SMP kernel oops on a officially SMP incapable processor
+ 4) _/R 8 module was force unloaded
+ 5) _/M 16 processor reported a Machine Check Exception (MCE)
+ 6) _/B 32 bad page referenced or some unexpected page flags
+ 7) _/U 64 taint requested by userspace application
+ 8) _/D 128 kernel died recently, i.e. there was an OOPS or BUG
+ 9) _/A 256 ACPI table overridden by user
+10) _/W 512 kernel issued warning
+11) _/C 1024 staging driver got loaded
+12) _/I 2048 workaround for bug in platform firmware in use
+13) _/O 4096 externally-built ("out-of-tree") module got loaded
+14) _/E 8192 unsigned module was loaded
+15) _/L 16384 soft lockup occurred
+16) _/K 32768 Kernel live patched
+=== === ====== ========================================================
+
+Note: To make reading easier ``_`` is representing a blank in this
+table.
+
+More detailed explanation for tainting
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

1) ``G`` if all modules loaded have a GPL or compatible license, ``P`` if
any proprietary module has been loaded. Modules without a
@@ -52,8 +144,3 @@ characters, each representing a particular tainted value.

16) ``K`` if the kernel has been live patched.

-The primary reason for the **'Tainted: '** string is to tell kernel
-debuggers if this is a clean kernel or if anything unusual has
-occurred. Tainting is permanent: even if an offending module is
-unloaded, the tainted value remains to indicate that the kernel is not
-trustworthy.
--
2.18.1