Re: [PATCH v2] docs: document linked lists
From: Jonathan Corbet
Date: Thu Jul 03 2025 - 10:12:44 EST
Thanks for doing this!
I have a few comments, most of which are just nits. I think we should
be able to get this in for 6.17.
Nicolas Frattaroli <nicolas.frattaroli@xxxxxxxxxxxxx> writes:
> The kernel contains various generic data structures that should ideally
> not be reinvented. However, it often fails to document the usage of
> these in the in-tree kernel documentation beyond just a listing of
> header symbols in the very lengthy kernel-api docs page. This is fine
> for things that have simple invocations, but occasionally things devolve
> into several layers of concatenating macros, which are subpar for humans
> to parse.
We don't really need the above in the changelog - just say what you're
doing.
> Begin making a small impact by adding some rudimentary example-driven
> documentation for the linked list functions. Many aspects are covered,
> though it is not an exhaustive listing of the entire set of list
> operations. We also direct readers towards further documentation should
> they be interested in concurrency.
>
> Signed-off-by: Nicolas Frattaroli <nicolas.frattaroli@xxxxxxxxxxxxx>
> ---
> This series started out with a much more ambitious goal of documenting
> every undocumented generic kernel data structure. About 6 hours in I had
> to tell myself that maybe it's better to do this piece-meal. So here is
> the first of what will hopefully be many similar series: documenting
> linked lists.
>
> As an aside note, I didn't set out to exclusively use Swiss clowns in my
> examples, but as I reached for more names to illustrate longer lists, I
> discovered that the tiny nation of Switzerland has a rich history of
> clowning. You learn something new every day.
>
> For future en-beautification, we could add integration for svgbob
> (<https://github.com/ivanceras/svgbob>) into the kernel docs sphinx in
> the future, which would let us generate drawn diagrams from the text
> doodles, therefore serving both audiences a satisfactory viewing
> experience.
>
> Based against lwn/docs-next, but b4 should let you know already.
> ---
> Changes in v2:
> - drop unrelated sphinx theme patch.
> - replace graphviz diagrams with simpler ascii art in literal blocks,
> which omit the back edges for clarity and are readable as plain text.
> - add explanatory paragraphs after every diagram that describe what is
> depicted in plain English. This is to help users that use assistive
> technologies like screen readers.
> - add examples illustrating the use of list_cut_position and
> list_cut_before.
> - add examples illustrating the use of list_move, list_move_tail and
> list_bulk_move_tail.
> - add examples illustrating the use of list_rotate_left and
> list_rotate_to_front.
> - add example illustrating the use of list_swap.
> - add example illustrating the use of list_splice, and its caveats with
> regards to stale pointers.
> - Link to v1: https://lore.kernel.org/r/20250520-linked-list-docs-v1-0-db74f7449785@xxxxxxxxxxxxx
> ---
> Documentation/core-api/index.rst | 1 +
> Documentation/core-api/list.rst | 847 +++++++++++++++++++++++++++++++++++++++
> 2 files changed, 848 insertions(+)
>
> diff --git a/Documentation/core-api/index.rst b/Documentation/core-api/index.rst
> index 7a4ca18ca6e2d37644553cfcb0c41c9f05513294..a03a99c2cac56f4ade418f43da723e49216b1062 100644
> --- a/Documentation/core-api/index.rst
> +++ b/Documentation/core-api/index.rst
> @@ -54,6 +54,7 @@ Library functionality that is used throughout the kernel.
> union_find
> min_heap
> parser
> + list
>
> Low level entry and exit
> ========================
> diff --git a/Documentation/core-api/list.rst b/Documentation/core-api/list.rst
> new file mode 100644
> index 0000000000000000000000000000000000000000..b0586056abb04d2bcc4518f7238ff9a94d3dd774
> --- /dev/null
> +++ b/Documentation/core-api/list.rst
> @@ -0,0 +1,847 @@
> +.. SPDX-License-Identifier: GPL-2.0+
> +
> +=====================
> +Linked Lists in Linux
> +=====================
> +
> +:Author: Nicolas Frattaroli <nicolas.frattaroli@xxxxxxxxxxxxx>
> +
> +.. contents::
> +
> +Introduction
> +============
> +
> +Linked lists are one of the most basic data structures used in many programs.
> +The Linux kernel implements several different flavours of linked lists. The
> +purpose of this document is not to explain linked lists in general, but to show
> +new kernel developers how to use the Linux kernel implementations of linked
> +lists.
> +
> +Please note that while linked lists certainly are ubiquitous, they may not
> +always be the best data structure to use in cases where a simple array doesn't
> +already suffice. Familiarizing oneself with other in-kernel generic data
> +structures, especially for concurrent accesses, is highly encouraged.
I wonder if this should be made just a bit stronger, given that linked
lists are positively out of favor for situations where cache behavior
matters.
> +Linux implementation of doubly linked lists
> +===========================================
> +
> +Linux's linked list implementations can be used by including the header file
> +``<linux/list.h>``.
> +
> +The doubly-linked list will likely be the most familiar to many readers. It's a
> +list that can efficiently be traversed forwards and backwards.
> +
> +The Linux kernel's doubly-linked list is circular in nature. This means that to
> +get from the head node to the tail, we can just travel one edge backwards.
> +Similarly, to get from the tail node to the head, we can simply travel forwards
> +"beyond" the tail and arrive back at the head.
> +
> +Declaring a node
> +----------------
> +
> +A node in a doubly-linked list is declared by adding a ``struct list_head``
> +member to the struct you wish to be contained in the list:
> +
> +.. code-block:: c
> +
> + struct clown {
> + unsigned long long shoe_size;
> + const char *name;
> + struct list_head node; /* the aforementioned member */
> + };
> +
> +This may be an unfamiliar approach to some, as the classical explanation of a
> +linked list is a list node struct with pointers to the previous and next list
> +node, as well the payload data. Linux chooses this approach because it allows
> +for generic list modification code regardless of what data struct is contained
data *structure*
> +within the list. Since the ``struct list_head`` member is not a pointer but part
Better to just say "struct list_head", and the automarkup logic should
take care of the rest.
> +of the struct proper, the ``container_of`` pattern can be used by the list
Here too - just container_of(). I note we don't have documentation for
that macro, which seems like a big oversight.
> +implementation to access the payload data regardless of its type, while staying
> +oblivious to what said type actually is.
I do wonder if you should start by showing the list_head structure
itself? It is simple enough and not a secret that needs to be kept.
> +Declaring and initializing a list
> +---------------------------------
> +
> +A doubly-linked list can then be declared as just another ``struct list_head``,
> +and initialised with the LIST_HEAD_INIT() macro during initial assignment, or
> +with the INIT_LIST_HEAD() function later:
> +
> +.. code-block:: c
> +
> + struct clown_car {
> + int tyre_pressure[4];
> + struct list_head clowns; /* Looks like a node! */
> + };
> +
> + /* ... Somewhere later in our driver ... */
> +
> + static int circus_init(struct circus_priv *circus)
> + {
> + struct clown_car other_car = {
> + .tyre_pressure = {10, 12, 11, 9},
> + .clowns = LIST_HEAD_INIT(other_car.clowns)
> + };
> +
> + circus->car.clowns = INIT_LIST_HEAD(&circus->car.clowns);
> +
> + return 0;
> + }
> +
> +A further point of confusion to some may be that the list itself doesn't really
> +have its own type. The concept of the entire linked list and a
> +``struct list_head`` member that points to other entries in the list are one and
> +the same.
> +
> +Adding nodes to the list
> +------------------------
> +
> +Adding a node to the linked list is done through the list_add() function.
Nitly, but list_add() is a macro
> +We'll return to our clown car example to illustrate how nodes get added to the
> +list:
> +
> +.. code-block:: c
> +
> + static int circus_fill_car(struct circus_priv *circus)
> + {
> + struct clown_car *car = &circus->car;
> + struct clown *grock;
> + struct clown *dimitri;
> +
> + /* State 1 */
> +
> + grock = kzalloc(sizeof(*grock), GFP_KERNEL);
> + if (!grock)
> + return -ENOMEM;
> + grock->name = "Grock";
> + grock->shoe_size = 1000;
> +
> + /* Note that we're adding the "node" member */
> + list_add(&grock->node, &car->clowns);
> +
> + /* State 2 */
> +
> + dimitri = kzalloc(sizeof(*dimitri), GFP_KERNEL);
> + if (!dimitri)
> + return -ENOMEM;
> + dimitri->name = "Dimitri";
> + dimitri->shoe_size = 50;
> +
> + list_add(&dimitri->node, &car->clowns);
> +
> + /* State 3 */
> +
> + return 0;
> + }
> +
> +In State 1, our list of clowns is still empty::
> +
> + .------.
> + v |
> + .--------. |
> + | clowns |--'
> + '--------'
> +
> +This diagram shows the singular "clowns" node pointing at itself. In this
> +diagram, and all following diagrams, only the forward edges are shown, to aid in
> +clarity.
> +
> +In State 2, we've added Grock after the list head::
> +
> + .--------------------.
> + v |
> + .--------. .-------. |
> + | clowns |---->| Grock |--'
> + '--------' '-------'
> +
> +This diagram shows the "clowns" node pointing at a new node labeled "Grock".
> +The Grock node is pointing back at the "clowns" node.
> +
> +In State 3, we've added Dimitri after the list head, resulting in the following::
> +
> + .------------------------------------.
> + v |
> + .--------. .---------. .-------. |
> + | clowns |---->| Dimitri |---->| Grock |--'
> + '--------' '---------' '-------'
> +
> +This diagram shows the "clowns" node pointing at a new node labeled "Dimitri",
> +which then points at the node labeled "Grock". The "Grock" node still points
> +back at the "clowns" node.
> +
> +If we wanted to have Dimitri inserted at the end of the list instead, we'd use
> +list_add_tail(). Our code would then look like this:
> +
> +.. code-block:: c
> +
> + static int circus_fill_car(struct circus_priv *circus)
> + {
> + /* ... */
> +
> + list_add_tail(&dimitri->node, &car->clowns);
> +
> + /* State 3b */
> +
> + return 0;
> + }
> +
> +This results in the following list::
> +
> + .------------------------------------.
> + v |
> + .--------. .-------. .---------. |
> + | clowns |---->| Grock |---->| Dimitri |--'
> + '--------' '-------' '---------'
> +
> +This diagram shows the "clowns" node pointing at the node labeled "Grock",
> +which points at the new node labeled "Dimitri". The node labeled "Dimitri"
> +points back at the "clowns" node.
> +
> +Traversing the list
> +-------------------
> +
> +To iterate the list, we can loop through all nodes within the list with
> +list_for_each().
> +
> +In our clown example, this results in the following somewhat awkward code:
> +
> +.. code-block:: c
> +
> + static unsigned long long circus_get_max_shoe_size(struct circus_priv *circus)
> + {
> + unsigned long long res = 0;
> + struct clown *e;
> + struct list_head *cur;
> +
> + list_for_each(cur, &circus->car.clowns) {
> + e = list_entry(cur, struct clown, node);
> + if (e->shoe_size > res)
> + res = e->shoe_size;
> + }
> +
> + return res;
> + }
> +
> +Note how the additional ``list_entry`` call is a little awkward here. It's only
> +there because we're iterating through the ``node`` members, but we really want
> +to iterate through the payload, i.e. the ``struct clown`` that contains each
> +node's ``struct list_head``. For this reason, there is a second macro:
> +list_for_each_entry()
You don't say what list_entry() is actually doing - that seems worth a
sentence.
> +Using it would change our code to something like this:
> +
> +.. code-block:: c
> +
> + static unsigned long long circus_get_max_shoe_size(struct circus_priv *circus)
> + {
> + unsigned long long res = 0;
> + struct clown *e;
> +
> + list_for_each_entry(e, &circus->car.clowns, node) {
> + if (e->shoe_size > res)
> + res = e->shoe_size;
> + }
> +
> + return res;
> + }
> +
> +This eliminates the need for the ``list_entry`` step, and our loop cursor is now
> +of the type of our payload. The macro is given the member name that corresponds
> +to the list's ``struct list_head`` within the clown struct so that it can still
> +walk the list.
> +
> +Removing nodes from the list
> +----------------------------
> +
> +The list_del() function can be used to remove entries from the list. It not only
> +removes the given entry from the list, but poisons the entry's ``prev`` and
> +``next`` pointers, so that unintended use of the entry after removal does not
> +go unnoticed.
> +
> +We can extend our previous example to remove one of the entries:
> +
> +.. code-block:: c
> +
> + static int circus_fill_car(struct circus_priv *circus)
> + {
> + /* ... */
> +
> + list_add(&dimitri->node, &car->clowns);
> +
> + /* State 3 */
> +
> + list_del(&dimitri->node);
> +
> + /* State 4 */
> +
> + return 0;
> + }
> +
> +The result of this would be this::
> +
> + .--------------------.
> + v |
> + .--------. .-------. | .---------.
> + | clowns |---->| Grock |--' | Dimitri |
> + '--------' '-------' '---------'
> +
> +This diagram shows the "clowns" node pointing at the node labeled "Grock",
> +which points back at the "clowns" node. Off to the side is a lone node labeled
> +"Dimitri", which has no arrows pointing anywhere.
> +
> +Note how the Dimitri node does not point to itself; its pointers are
> +intentionally set to a "poison" value that the list code refuses to traverse.
> +
> +If we wanted to reinitialize the removed node instead to make it point at itself
> +again like an empty list head, we can use list_del_init() instead:
> +
> +.. code-block:: c
> +
> + static int circus_fill_car(struct circus_priv *circus)
> + {
> + /* ... */
> +
> + list_add(&dimitri->node, &car->clowns);
> +
> + /* State 3 */
> +
> + list_del_init(&dimitri->node);
> +
> + /* State 4b */
> +
> + return 0;
> + }
> +
> +This results in the deleted node pointing to itself again::
> +
> + .--------------------. .-------.
> + v | v |
> + .--------. .-------. | .---------. |
> + | clowns |---->| Grock |--' | Dimitri |--'
> + '--------' '-------' '---------'
> +
> +This diagram shows the "clowns" node pointing at the node labeled "Grock",
> +which points back at the "clowns" node. Off to the side is a lone node labeled
> +"Dimitri", which points to itself.
> +
> +Traversing whilst removing nodes
> +--------------------------------
> +
> +Deleting entries while we're traversing the list will cause problems if we use
> +list_for_each() and list_for_each_entry(), as deleting the current entry would
> +modify the ``next`` pointer of it, which means the traversal can't properly
> +advance to the next list entry.
> +
> +There is a solution to this however: list_for_each_safe() and
> +list_for_each_entry_safe(). These take an additional parameter of a pointer to
> +a ``struct list_head`` to use as temporary storage for the next entry during,
> +iteration, solving the issue.
> +
> +An example of how to use it:
> +
> +.. code-block:: c
> +
> + static void circus_eject_insufficient_clowns(struct circus_priv *circus)
> + {
> + struct clown *e;
> + struct clown *n; /* temporary storage for safe iteration */
> +
> + list_for_each_entry_safe(e, n, &circus->car.clowns, node) {
> + if (e->shoe_size < 500)
> + list_del(&e->node);
> + }
> + }
> +
> +Proper memory management (i.e. freeing the deleted node while making sure
> +nothing still references it) in this case is left up as an exercise to the
> +reader.
> +
> +Cutting a list
> +--------------
> +
> +There are two helper functions to cut lists with. Both take elements from the
> +list ``head``, and replace the contents of the list ``list``.
> +
> +The first such function is list_cut_position(). It removes all list entries from
> +``head`` up to and including ``entry``, placing them in ``list`` instead.
> +
> +In this example, it's assumed we start with the following list::
> +
> + .----------------------------------------------------------------.
> + v |
> + .--------. .-------. .---------. .-----. .---------. |
> + | clowns |---->| Grock |---->| Dimitri |---->| Pic |---->| Alfredo |--'
> + '--------' '-------' '---------' '-----' '---------'
> +
> +This diagram depicts the list head "clowns" pointing to a node labeled "Grock",
> +which points to a node labeled "Dimitri", which points to a node labeled
> +"Pic", which points to a node labeled "Alfredo", which points back to the
> +"clowns" list head.
There comes a point, I think, where the (very nice) diagrams speak for
themselves; I'm not entirely sure it needs to be spelled out in this
much detail. By this point, I would expect readers to be just skipping
over these descriptions.
> +With the following code, every clown up to and including "Pic" is moved from
> +the "clowns" list head to a separate ``struct list_head`` initialized at local
> +stack variable ``retirement``:
[...]
> +Further reading
> +---------------
> +
> +* `How does the kernel implements Linked Lists? - KernelNewbies <https://kernelnewbies.org/FAQ/LinkedLists>`_
I do still think you should move the kerneldoc for lists over from
kernel-api.rst; just tack it onto the end here.
Thanks,
jon