Re: [RFC v2 05/16] luo: luo_core: integrate with KHO

From: Pasha Tatashin
Date: Wed Jun 18 2025 - 13:02:02 EST


On Wed, Jun 18, 2025 at 12:40 PM Mike Rapoport <rppt@xxxxxxxxxx> wrote:
>
> On Wed, Jun 18, 2025 at 10:48:09AM -0400, Pasha Tatashin wrote:
> > On Wed, Jun 18, 2025 at 9:12 AM Pratyush Yadav <pratyush@xxxxxxxxxx> wrote:
> > >
> > > On Tue, Jun 17 2025, Pasha Tatashin wrote:
> > >
> > > > On Tue, Jun 17, 2025 at 11:24 AM Jason Gunthorpe <jgg@xxxxxxxx> wrote:
> > > >>
> > > >> On Fri, Jun 13, 2025 at 04:58:27PM +0200, Pratyush Yadav wrote:
> > > >> > On Sat, Jun 07 2025, Pasha Tatashin wrote:
> > > >> > [...]
> > > >> > >>
> > > >> > >> This weirdness happens because luo_prepare() and luo_cancel() control
> > > >> > >> the KHO state machine, but then also get controlled by it via the
> > > >> > >> notifier callbacks. So the relationship between then is not clear.
> > > >> > >> __luo_prepare() at least needs access to struct kho_serialization, so it
> > > >> > >> needs to come from the callback. So I don't have a clear way to clean
> > > >> > >> this all up off the top of my head.
> > > >> > >
> > > >> > > On production machine, without KHO_DEBUGFS, only LUO can control KHO
> > > >> > > state, but if debugfs is enabled, KHO can be finalized manually, and
> > > >> > > in this case LUO transitions to prepared state. In both cases, the
> > > >> > > path is identical. The KHO debugfs path is only for
> > > >> > > developers/debugging purposes.
> > > >> >
> > > >> > What I meant is that even without KHO_DEBUGFS, LUO drives KHO, but then
> > > >> > KHO calls into LUO from the notifier, which makes the control flow
> > > >> > somewhat convoluted. If LUO is supposed to be the only thing that
> > > >> > interacts directly with KHO, maybe we should get rid of the notifier and
> > > >> > only let LUO drive things.
> > > >>
> > > >> Yes, we should. I think we should consider the KHO notifiers and self
> > > >> orchestration as obsoleted by LUO. That's why it was in debugfs
> > > >> because we were not ready to commit to it.
> > > >
> > > > We could do that, however, there is one example KHO user
> > > > `reserve_mem`, that is also not liveupdate related. So, it should
> > > > either be removed or modified to be handled by LUO.
> > >
> > > It still depends on kho_finalize() being called, so it still needs
> > > something to trigger its serialization. It is not automatic. And with
> > > your proposed patch to make debugfs interface optional, it can't even be
> > > used with the config disabled.
> >
> > At least for now, it can still be used via LUO going into prepare
> > state, since LUO changes KHO into finalized state and reserve_mem is
> > registered to be called back from KHO.
> >
> > > So if it must be explicitly triggered to be preserved, why not let the
> > > trigger point be LUO instead of KHO? You can make reservemem a LUO
> > > subsystem instead.
> >
> > Yes, LUO can do that, the only concern I raised is that `reserve_mem`
> > is not really live update related.
>
> I only now realized what bothered me about "liveupdate". It's the name of
> the driving usecase rather then the name of the technology it implements.
> In the end what LUO does is a (more) sophisticated control for KHO.
>
> But essentially it's not that it actually implements live update, it
> provides kexec handover control plane that enables live update.
>
> And since the same machinery can be used regardless of live update, and I'm
> sure other usecases will appear as soon as the technology will become more
> mature, it makes me think that we probably should just
> s/liveupdate_/kho_control/g or something along those lines.

I disagree, LUO is for liveupdate flows, and is designed specifically
around the live update flows: brownout/blackout/post-liveupdate, it
should not be generalized to anticipate some other random states, and
it should only support participants that are related to live update:
iommufd/vfiofd/kvmfd/memfd/eventfd and controled via "liveupdated" the
userspace agent.

KHO is for preserving memory, LUO uses KHO as a backbone for Live Update.

> > > Although to be honest, things like reservemem (or IMA perhaps?) don't
> > > really fit well with the explicit trigger mechanism. They can be carried
> >
> > Agreed. Another example I was thinking about is "kexec telemetry":
> > precise time information about kexec, including shutdown, purgatory,
> > boot. We are planning to propose kexec telemetry, and it could be LUO
> > subsystem. On the other hand, it could be useful even without live
> > update, just to measure precise kexec reboot time.
> >
> > > across kexec without needing userspace explicitly driving it. Maybe we
> > > allow LUO subsystems to mark themselves as auto-preservable and LUO will
> > > preserve them regardless of state being prepared? Something to think
> > > about later down the line I suppose.
> >
> > We can start with adding `reserve_mem` as regular subsystem, and make
> > this auto-preserve option a future expansion, when if needed.
> > Presumably, `luoctl prepare` would work for whoever plans to use just
> > `reserve_mem`.
>
> I think it would be nice to support auto-preserve sooner than later.

Makes sense.

> reserve_mem can already be useful for ftrace and pstore folks and if it
> would survive a kexec without any userspace intervention it would be great.

The pstore use case is only potential, correct? Or can it already use
reserve_mem?

Pasha