Re: [PATCH 2/3] livepatch/rcu: Warn when system consistency is broken in RCU code

From: Steven Rostedt
Date: Mon May 08 2017 - 15:13:33 EST


On Mon, 8 May 2017 11:51:08 -0500
Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote:


> > --- a/kernel/livepatch/patch.c
> > +++ b/kernel/livepatch/patch.c
> > @@ -62,6 +62,14 @@ static void notrace klp_ftrace_handler(unsigned long ip,
> > /* RCU may not be watching, make it see us. */
> > rcu_irq_enter_irqson();
> >
> > + /*
> > + * RCU still might not see us if we patch a function inside
> > + * the RCU infrastructure. Then we might see wrong state of
> > + * func->stack and other flags.
> > + */
> > + if (unlikely(!rcu_is_watching()))
> > + WARN_ONCE(1, "Livepatch modified a function that can not be handled a safe way.!");
> > +
> > rcu_read_lock();
> >
> > func = list_first_or_null_rcu(&ops->func_stack, struct klp_func,
>
> So the ftrace stack tracer seems to do this check a little differently.
> It calls rcu_irq_enter_disabled() first, and then calls rcu_irq_enter().
> Any reason we're not doing it that way?

I thought the same. Also can it not return, and not do anything.
Because continuing after the warning, is dangerous.

Not to mention, why the if statement in the first place? And then pass
a 1. Makes no sense.

Although you should have:

if (WARN_ONCE(!rcu_is_watching,
"Livepatch ..."))
return;

or something to not cause any damage.

>
> The warning would be more helpful if it printed a little more
> information, like the ip and the parent_ip. And it should probably
> mention that RCU is broken.

Well, the warning would also print a stack trace. How is RCU broken? It
could simply be that you are patching a function that is in a place
that RCU doesn't "watch". Like going to idle or userspace. Or even in
RCU itself.

>
> Also I wonder if we can constrain the warning somehow. I think the
> warning only applies if a patch is in progress, right? In that case, if
> RCU is broken, would it be feasible to mutex_trylock() the klp mutex to
> try to ensure that no patches are being applied while the ftrace handler
> is running? Then it wouldn't matter if RCU were broken because the func
> stack wouldn't be changing anyway. Then it could only warn if it failed
> to get the mutex.

How would RCU be broken?

>
> Stepping back a bit, the documentation and comments describe patches to
> functions inside the RCU infrastructure. As far as I can tell, only a
> single function would be affected: rcu_dynticks_eqs_enter(). Because
> it's the only function called after the "Breaks tracing momentarily"
> comment in rcu_eqs_enter_common(). Any reason why we couldn't just
> annotate rcu_dynticks_eqs_enter() with notrace?

Note, there's places in the kernel (on the way to idle and userspace)
that rcu is not watching. Ftrace handles this differently than most
places. But anything that requires calling rcu_read_lock(), well, you
need to beware.

>
> Stepping back even further, if I'm understanding this issue correctly,
> this warning can also affect patches to functions which are called from
> NMI context. If the NMI occurs in the part of rcu_eqs_enter_common()
> where rcu_irq_enter() doesn't work, then RCU won't work here, right? If
> so, that worries me because there are a lot of functions which can be
> called (and patched) from NMI context.

Note, the "rcu_dynticks_eqs_enter()" is the only place that can't make
rcu "watch" again.

If rcu is not watching, calling rcu_enter_irq() will have it watch
again. Even in NMI context I believe.

>
> I wonder if there's some way to solve this by changing RCU code, but I'm
> not familiar enough with RCU to have any ideas there.

You don't want to go there.

>
> Another idea would be to figure out a way to stop using RCU in
> klp_ftrace_handler() altogether.
>

That may work if rcu_enter_irq() doesn't. But that's how NMIs use rcu.

-- Steve