Re: There is a Tasks RCU stall warning

From: Steven Rostedt
Date: Wed Apr 12 2017 - 12:49:32 EST


On Wed, 12 Apr 2017 09:26:10 -0700
"Paul E. McKenney" <paulmck@xxxxxxxxxxxxxxxxxx> wrote:

> On Wed, Apr 12, 2017 at 11:53:04AM -0400, Steven Rostedt wrote:
> > On Wed, 12 Apr 2017 08:18:17 -0700
> > "Paul E. McKenney" <paulmck@xxxxxxxxxxxxxxxxxx> wrote:
> >
> >
> > > > Well the trampolines pretty much can, but they are removed before
> > > > calling synchronize_rcu_tasks(), and nothing can enter the trampoline
> > > > when that is called.
> > >
> > > Color me confused...
> > >
> > > So you can have an arbitrary function call within a trampoline?
> >
> > Sorta.
> >
> > When you do register_ftrace_function(ops), where ops has ops->func that
> > points to a function you want to have called when a function is traced,
> > the following happens (if there's no other ops registered). Let's use
> > an example where ops is filtered on just the schedule() function call:
> >
> >
> > <schedule>:
> > call trampoline ---+
> > [..] |
> > +--> <trampoline>:
> > push regs
> > call ops->func
> > pop regs
> > ret
> >
> > But that ops->func() must be very limited in what it can do. Although,
> > it may actually call an rcu_read_lock()! But if that's the case, it
> > must either check if rcu is watching (which perf does), or enable rcu
> > via the rcu_irq_enter() with a check on rcu_irq_enter_disabled(), which
> > my stack tracer does.
> >
> > Now this can be called even from NMI context! Thus what ops->func does
> > must be aware of that. The stack tracer func has an:
> >
> > if (in_nmi())
> > return;
> >
> > Because it needs to grab spin locks.
>
> But preemption is enabled within the trampoline? If so, then if

Yes it can be.

> CONFIG_RCU_BOOST is set, rcu_read_unlock() can call rt_mutex_unlock().
> Which looks OK to me, but I thought I should mention it.

The rt_mutex_unlock() shouldn't call schedule.

>
> > But one thing an op->func() is never allowed to do, is to call
> > schedule() directly, or even a cond_resched(). It may be preempted if
> > preemption was enabled when the trampoline was hit, but it must not
> > assume that it can do a voluntary schedule. That would break the
> > rcu_tasks as well if it did.
>
> OK, so it can call functions, but it is not permitted to call functions
> that voluntarily block. That should work. (Fingers firmly crossed.)

Right, calling a mutex is bad. This is why all spinlocks must be of the
raw form as well, otherwise it breaks on PREEMPT_RT.

Actually, the stack tracer uses arch_spin_lock() because it has the
ability to break lockdep on top of this.

As you pointed out one of my comments. Function tracing is rude.

>
> > > If not, agreed, no problem. Otherwise, it seems like we have a big
> > > problem remaining. Unless the functions called from a trampoline are
> > > guaranteed never to do a context switch.
> >
> > Well, they can be preempted, but they should never do a voluntary
> > schedule. If they did, that would be bad.
>
> OK, feeeling better now. ;-)
>
> > > So what exactly is the trampoline code allowed to do? ;-)
> >
> > Well, it must be able to work in an NMI context, or bail otherwise. And
> > it should never schedule on its own.
>
> Good.
>
> > > My problem is that I have no idea what can and cannot be included in
> > > trampoline code. In absence of that information, my RCU-honed reflexes
> > > jump immediately to the worst case that I can think of. ;-)
> >
> > Lets just say that it can't voluntarily sleep. Would that be good
> > enough? If someday in the future I decide to let it do so, I would add
> > a flag and force that ops not to be able to use a dynamic trampoline.
>
> That would work. Again, feeling much better now. ;-)

Great!

>
> > Currently, without the synchronize_rcu_tasks(), when a dynamic ops is
> > registered, the functions will point to a non dynamic trampoline. That
> > is, one that is never freed. It simply does:
> >
> > preempt_disable_notrace();
> >
> > do_for_each_ftrace_op(op, ftrace_ops_list) {
> > /*
> > * Check the following for each ops before calling their func:
> > * if RCU flag is set, then rcu_is_watching() must be true
> > * if PER_CPU is set, then ftrace_function_local_disable()
> > * must be false
> > * Otherwise test if the ip matches the ops filter
> > *
> > * If any of the above fails then the op->func() is not executed.
> > */
> > if ((!(op->flags & FTRACE_OPS_FL_RCU) || rcu_is_watching()) &&
> > (!(op->flags & FTRACE_OPS_FL_PER_CPU) ||
> > !ftrace_function_local_disabled(op)) &&
> > ftrace_ops_test(op, ip, regs)) {
> >
> > if (FTRACE_WARN_ON(!op->func)) {
> > pr_warn("op=%p %pS\n", op, op);
> > goto out;
> > }
> > op->func(ip, parent_ip, op, regs);
> > }
> > } while_for_each_ftrace_op(op);
> > out:
> > preempt_enable_notrace();
>
> Makes sense!

Thanks,

-- Steve