Re: [PATCH v4 00/30] Live Update Orchestrator

From: Pasha Tatashin

Date: Fri Oct 10 2025 - 08:46:46 EST

On Thu, Oct 9, 2025 at 6:58 PM Pratyush Yadav <pratyush@xxxxxxxxxx> wrote:
>
> On Tue, Oct 07 2025, Pasha Tatashin wrote:
>
> > On Sun, Sep 28, 2025 at 9:03 PM Pasha Tatashin
> > <pasha.tatashin@xxxxxxxxxx> wrote:
> >>
> [...]
> > 4. New File-Lifecycle-Bound Global State
> > ----------------------------------------
> > A new mechanism for managing global state was proposed, designed to be
> > tied to the lifecycle of the preserved files themselves. This would
> > allow a file owner (e.g., the IOMMU subsystem) to save and retrieve
> > global state that is only relevant when one or more of its FDs are
> > being managed by LUO.
>
> Is this going to replace LUO subsystems? If yes, then why? The global
> state will likely need to have its own lifecycle just like the FDs, and
> subsystems are a simple and clean abstraction to control that. I get the
> idea of only "activating" a subsystem when one or more of its FDs are
> participating in LUO, but we can do that while keeping subsystems
> around.
>
> >
> > The key characteristics of this new mechanism are:
> > The global state is optionally created on the first preserve() call
> > for a given file handler.
> > The state can be updated on subsequent preserve() calls.
> > The state is destroyed when the last corresponding file is unpreserved
> > or finished.
> > The data can be accessed during boot.
> >
> > I am thinking of an API like this.
> >
> > 1. Add three more callbacks to liveupdate_file_ops:
> > /*
> > * Optional. Called by LUO during first get global state call.
> > * The handler should allocate/KHO preserve its global state object and return a
> > * pointer to it via 'obj'. It must also provide a u64 handle (e.g., a physical
> > * address of preserved memory) via 'data_handle' that LUO will save.
> > * Return: 0 on success.
> > */
> > int (*global_state_create)(struct liveupdate_file_handler *h,
> > void **obj, u64 *data_handle);
> >
> > /*
> > * Optional. Called by LUO in the new kernel
> > * before the first access to the global state. The handler receives
> > * the preserved u64 data_handle and should use it to reconstruct its
> > * global state object, returning a pointer to it via 'obj'.
> > * Return: 0 on success.
> > */
> > int (*global_state_restore)(struct liveupdate_file_handler *h,
> > u64 data_handle, void **obj);
> >
> > /*
> > * Optional. Called by LUO after the last
> > * file for this handler is unpreserved or finished. The handler
> > * must free its global state object and any associated resources.
> > */
> > void (*global_state_destroy)(struct liveupdate_file_handler *h, void *obj);
> >
> > The get/put global state data:
> >
> > /* Get and lock the data with file_handler scoped lock */
> > int liveupdate_fh_global_state_get(struct liveupdate_file_handler *h,
> > void **obj);
> >
> > /* Unlock the data */
> > void liveupdate_fh_global_state_put(struct liveupdate_file_handler *h);
>
> IMHO this looks clunky and overcomplicated. Each LUO FD type knows what
> its subsystem is. It should talk to it directly. I don't get why we are
> adding this intermediate step.
>
> Here is how I imagine the proposed API would compare against subsystems
> with hugetlb as an example (hugetlb support is still WIP, so I'm still
> not clear on specifics, but this is how I imagine it will work):
>
> - Hugetlb subsystem needs to track its huge page pools and which pages
> are allocated and free. This is its global state. The pools get
> reconstructed after kexec. Post-kexec, the free pages are ready for
> allocation from other "regular" files and the pages used in LUO files
> are reserved.

Thinking more about this, HugeTLB is different from iommufd/iommu-core
vfiofd/pci because it supports many types of FDs, such as memfd and
guest_memfd (1G support is coming soon!). Also, since not all memfds
or guest_memfd instances require HugeTLB, binding their lifecycles to
HugeTLB doesn't make sense here. I agree that a subsystem is more
appropriate for this use case.

Pasha