Re: [RFC PATCH v2 17/18] livepatch: change to a per-task consistency model

From: Petr Mladek
Date: Thu May 05 2016 - 09:06:50 EST


On Wed 2016-05-04 10:51:21, Josh Poimboeuf wrote:
> On Wed, May 04, 2016 at 10:42:23AM +0200, Petr Mladek wrote:
> > On Thu 2016-04-28 15:44:48, Josh Poimboeuf wrote:
> > > Change livepatch to use a basic per-task consistency model. This is the
> > > foundation which will eventually enable us to patch those ~10% of
> > > security patches which change function or data semantics. This is the
> > > biggest remaining piece needed to make livepatch more generally useful.
> >
> > > --- a/kernel/fork.c
> > > +++ b/kernel/fork.c
> > > @@ -76,6 +76,7 @@
> > > #include <linux/compiler.h>
> > > #include <linux/sysctl.h>
> > > #include <linux/kcov.h>
> > > +#include <linux/livepatch.h>
> > >
> > > #include <asm/pgtable.h>
> > > #include <asm/pgalloc.h>
> > > @@ -1586,6 +1587,8 @@ static struct task_struct *copy_process(unsigned long clone_flags,
> > > p->parent_exec_id = current->self_exec_id;
> > > }
> > >
> > > + klp_copy_process(p);
> >
> > I am in doubts here. We copy the state from the parent here. It means
> > that the new process might still need to be converted. But at the same
> > point print_context_stack_reliable() returns zero without printing
> > any stack trace when TIF_FORK flag is set. It means that a freshly
> > forked task might get be converted immediately. I seems that boot
> > operations are always done when copy_process() is called. But
> > they are contradicting each other.
> >
> > I guess that print_context_stack_reliable() should either return
> > -EINVAL when TIF_FORK is set. Or it should try to print the
> > stack of the newly forked task.
> >
> > Or do I miss something, please?
>
> Ok, I admit it's confusing.
>
> A newly forked task doesn't *have* a stack (other than the pt_regs frame
> it needs for the return to user space), which is why
> print_context_stack_reliable() returns success with an empty array of
> addresses.
>
> For a little background, see the second switch_to() macro in
> arch/x86/include/asm/switch_to.h. When a newly forked task runs for the
> first time, it returns from __switch_to() with no stack. It then jumps
> straight to ret_from_fork in entry_64.S, calls a few C functions, and
> eventually returns to user space. So, assuming we aren't patching entry
> code or the switch_to() macro in __schedule(), it should be safe to
> patch the task before it does all that.

This is great explanation. Thanks for it.

> So, having said all that, I'm really not sure what the best approach is
> for print_context_stack_reliable(). Right now I'm thinking I'll change
> it back to return -EINVAL for a newly forked task, so it'll be more
> future-proof: better to have a false positive than a false negative.
> Either way it will probably need to be changed again if the
> ret_from_fork code gets cleaned up.

I would prefer the -EINVAL. It might safe some hairs when anyone
is working on patching the switch_to stuff. Also it is not that
big loss beacuse most tasks will get migrated on the return to
userspace.

It might help a bit with the newly forked kthreads. But there should
be more safe location where the new kthreads might get migrated,
e.g. right before the main function gets called.


> > > diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c
> > > new file mode 100644
> > > index 0000000..92819bb
> > > --- /dev/null
> > > +++ b/kernel/livepatch/transition.c
> > > +/*
> > > + * This function can be called in the middle of an existing transition to
> > > + * reverse the direction of the target patch state. This can be done to
> > > + * effectively cancel an existing enable or disable operation if there are any
> > > + * tasks which are stuck in the initial patch state.
> > > + */
> > > +void klp_reverse_transition(void)
> > > +{
> > > + struct klp_patch *patch = klp_transition_patch;
> > > +
> > > + klp_target_state = !klp_target_state;
> > > +
> > > + /*
> > > + * Ensure that if another CPU goes through the syscall barrier, sees
> > > + * the TIF_PATCH_PENDING writes in klp_start_transition(), and calls
> > > + * klp_patch_task(), it also sees the above write to the target state.
> > > + * Otherwise it can put the task in the wrong universe.
> > > + */
> > > + smp_wmb();
> > > +
> > > + klp_start_transition();
> > > + klp_try_complete_transition();
> >
> > It is a bit strange that we keep the work scheduled. It might be
> > better to use
> >
> > mod_delayed_work(system_wq, &klp_work, 0);
>
> True, I think that would be better.
>
> > Which triggers more ideas from the nitpicking deparment:
> >
> > I would move the work definition from core.c to transition.c because
> > it is closely related to klp_try_complete_transition();
>
> That could be good, but there's a slight problem: klp_work_fn() requires
> klp_mutex, which is static to core.c. It's kind of nice to keep the use
> of the mutex in core.c only.

I see and am surprised that we take the lock only in core.c ;-)

I do not have a strong opinion then. Just a small one. The lock guards
also operations from the other .c files. I think that it is only a matter
of time when we will need to access it there. But the work is clearly
transition-related. But it is a real nitpicking. I am sorry for it.

> > When on it. I would make it more clear that the work is related
> > to transition.
>
> How would you recommend doing that? How about:
>
> - rename "klp_work" -> "klp_transition_work"
> - rename "klp_work_fn" -> "klp_transition_work_fn"

Yup, sounds better.

> > Also I would call queue_delayed_work() directly
> > instead of adding the klp_schedule_work() wrapper. The delay
> > might be defined using a constant, e.g.
> >
> > #define KLP_TRANSITION_DELAY round_jiffies_relative(HZ)
> >
> > queue_delayed_work(system_wq, &klp_transition_work, KLP_TRANSITION_DELAY);
>
> Sure.
>
> > Finally, the following is always called right after
> > klp_start_transition(), so I would call it from there.
> >
> > if (!klp_try_complete_transition())
> > klp_schedule_work();
>
> Except for when it's called by klp_reverse_transition(). And it really
> depends on whether we want to allow transition.c to use the mutex. I
> don't have a strong opinion either way, I may need to think about it
> some more.

Ah, I had in mind that it could be replaced by that

mod_delayed_work(system_wq, &klp_transition_work, 0);

So, that we would never call klp_try_complete_transition()
directly. Then it could be the same in all situations. But
it might look strange and be ineffective when really starting
the transition. So, maybe forget about it.

Best Regards,
Petr