Re: [PATCH v3 2/2] livepatch: add atomic replace

From: Miroslav Benes
Date: Wed Oct 18 2017 - 05:10:19 EST


On Tue, 17 Oct 2017, Jason Baron wrote:

>
>
> On 10/17/2017 09:50 AM, Miroslav Benes wrote:
> > On Tue, 17 Oct 2017, Miroslav Benes wrote:
> >
> >> On Tue, 10 Oct 2017, Jason Baron wrote:
> >>
> >>>
> >>>
> >>> On 10/06/2017 06:32 PM, Josh Poimboeuf wrote:
> >>>> On Wed, Sep 27, 2017 at 11:41:30PM -0400, Jason Baron wrote:
> >>>>> Since 'atomic replace' has completely replaced all previous livepatch
> >>>>> modules, it explicitly disables all previous livepatch modules. However,
> >>>>> previous livepatch modules that have been replaced, can be re-enabled
> >>>>> if they have the 'replace' flag set. In this case the set of 'nop'
> >>>>> functions is re-calculated, such that it replaces all others.
> >>>>>
> >>>>> For example, if kpatch-a.ko and kpatch-b.ko have the 'replace' flag set
> >>>>> then:
> >>>>>
> >>>>> # insmod kpatch-a.ko
> >>>>> # insmod kpatch-b.ko
> >>>>>
> >>>>> At this point we have:
> >>>>>
> >>>>> # cat /sys/kernel/livepatch/kpatch-a/enabled
> >>>>> 0
> >>>>>
> >>>>> # cat /sys/kernel/livepatch/kpatch-b/enabled
> >>>>> 1
> >>>>>
> >>>>> To revert to the kpatch-a state we can do:
> >>>>>
> >>>>> echo 1 > /sys/kernel/livepatch/kpatch-a/enabled
> >>>>>
> >>>>> And now we have:
> >>>>>
> >>>>> # cat /sys/kernel/livepatch/kpatch-a/enabled
> >>>>> 1
> >>>>>
> >>>>> # cat /sys/kernel/livepatch/kpatch-b/enabled
> >>>>> 0
> >>>>
> >>>> I don't really like allowing a previously replaced patch to replace the
> >>>> current patch. It's just more unnecessary complexity. If the user
> >>>> wants to atomically revert back to kpatch-a, they should be able to:
> >>>>
> >>>> rmmod kpatch-a
> >>>> insmod kpatch-a.ko
> >>>>
> >>
> >> I agree.
> >>
> >>> Right - that's how I sent v1 (using rmmod/insmod to revert), but it
> >>> didn't account for the fact the patch or some functions may be marked
> >>> 'immediate' and thus its not possible to just do 'rmmod'. Thus, since in
> >>> some cases 'rmmod' was not feasible, I thought it would be simpler from
> >>> an operational pov to just say we always revert by re-enabling a
> >>> previously replaced patch as opposed to rmmod/insmod.
> >>>
> >>>
> >>>>> Note that it may be possible to unload (rmmod) replaced patches in some
> >>>>> cases based on the consistency model, when we know that all the functions
> >>>>> that are contained in the patch may no longer be in used, however its
> >>>>> left as future work, if this functionality is desired.
> >>>>
> >>>> If you don't allow a previously replaced patch to be enabled again, I
> >>>> think it would be trivial to let it be unloaded.
> >>>>
> >>>
> >>> The concern is around being replaced by 'immediate' functions and thus
> >>> not knowing if the code is still in use.
> >>
> >> Hm. Would it make sense to remove immediate and rely only on the
> >> consistency model? At least for the architectures where the model is
> >> implemented (x86_64)?
> >>
> >> If not, then I'd keep such modules there without a possibility to remove
> >> them ever. If its functionality was required again, it would of course
> >> mean to insmod a new module with it.
> >
> > Petr pointed out (offline) that immediate could still be useful. So let me
> > describe how I envision the whole atomic replace semantics.
> >
> > Let's have three functions - a, b, c. Patch 1 is immediate and patches
> > a().
> >
> > func a b c
> > patches 1i
> >
> > Patch 2 is not immediate and patches b()
> >
> > func a b c
> > patches 1i
> > 2
> >
> > Patch 3 is atomic replace, which patches a() and c().
> >
> > func a b c
> > patches 1i
> > 2
> > 3r 3r
> >
> > With 3 applied, 3versions of a() and c() are called, original b() is
> > called. 2 can be rmmoded. 1 cannot, because it is immediate. 1 and 2
> > cannot be reenabled.
>
> Yes, if we change back to allowing the 'atomic replace' patch to drop
> the module reference on previous modules when possible, then yes 2 can
> be rmmoded. I originally started off with that model of being able to do
> 'revert' via rmmod and insmod of previous modules, but then we moved to
> a re-enable model in v4 based on the fact that in some cases we can not
> do the rmmod and insmod thing. For example if patch 3 was an immediate
> patch, then we would replace function b 'immediately' and thus not know
> when it was safe to rmmod patch 2.

Correct.

> And in fact, in your example i think
> its safe to rmmod patch 1 as well, since in the transition to patch 3 we
> would have checked the func stack for the immediately preceding function
> for function a which would have been 1i, and thus since we know its not
> running we can rmmod patch 1 as well.

Correct as well.

> I'm happy to go back to rmmod/insmod for revert which seems to be the
> consensus here - it certainly helps clean up and avoid large stacks of
> modules (in most cases), which is nice.
>
> >
> > 3 can be disabled. Original a() and c() will be called in such case. If
> > one wants to have a() from patch 1 there, he would need to apply patch
> > 4.
> >
> > func a b c
> > patches 1i
> > 2
> > 3r 3r
> > 4i
> >
> > Does it make sense? This is what I would expect and I think it is easier
> > to implement. Whole initialization of nop functions could be done in
> > klp_init_ functions, if I am not mistaken.
> >> Thoughts?
> >
>
> yes, that makes sense to me. It might be worth clarifying the logic
> around when we can drop references on previous modules when an atomic
> revert patch is applied -
>
> If the atomic replace patch has any immediate functions or is marked as
> immediate patch, then we can not drop the reference on the previous
> module, since it may still be in use.

Yes, then we know nothing about tasks in the system.

> If the atomic replace patch does
> not contain any immediates, then we can drop the reference on the
> immediately preceding patch only. That is because there may have been
> previous transitions to immediate functions in the func stack, and the
> transition to the atomic replace patch only checks immediately preceding
> transition. It would be possible to check all of the previous immediate
> function transitions, but this adds complexity and seems like not a
> common pattern. So I would suggest that we just drop the reference on
> the previous patch if the atomic replace patch does not contain any
> immediate functions.

It is even more complicated and it is not connected only to atomic replace
patch (I realized this while reading the first part of your email and
then you confirmed it with this paragraph). The consistency model is
broken with respect to immediate patches.

func a
patches 1i
2i
3

Now, when you're applying 3, only 2i function is checked. But there might
be a task sleeping in 1i. Such task would be migrated to 3, because we do
not check 1 in klp_check_stack_func() at all.

I see three solutions.

1. Say it is an user's fault. Since it is not obvious and it is
easy-to-make mistake, I would not go this way.

2. We can fix klp_check_stack_func() in an exact way you're proposing.
We'd go back in func stack as long as there are immediate patches there.
This adds complexity and I'm not sure if all the problems would be solved
because scenarios how patches are stacked and applied to different
functions may be quite complex.

3. Drop immediate. It causes problems only and its advantages on x86_64
are theoretical. You would still need to solve the interaction with atomic
replace on other architecture with immediate preserved, but that may be
easier. Or we can be aggressive and drop immediate completely. The force
transition I proposed earlier could achieve the same.

Miroslav