Re: [PATCH RFC] Make rcu_dereference_raw() safe for NMI etc.

From: Peter Zijlstra
Date: Tue Feb 03 2015 - 06:00:49 EST


On Mon, Feb 02, 2015 at 11:55:33AM -0800, Paul E. McKenney wrote:
> As promised/threatened on IRC.
>
> Thanx, Paul
>
> ------------------------------------------------------------------------
>
> rcu: Reverse rcu_dereference_check() conditions
>
> The rcu_dereference_check() family of primitives evaluates the RCU
> lockdep expression first, and only then evaluates the expression passed
> in. This works fine normally, but can potentially fail in environments
> (such as NMI handlers) where lockdep cannot be invoked. The problem is
> that even if the expression passed in is "1", the compiler would need to
> prove that the RCU lockdep expression (rcu_read_lock_held(), for example)
> is free of side effects in order to be able to elide it. Given that
> rcu_read_lock_held() is sometimes separately compiled, the compiler cannot
> always use this optimization.
>
> This commit therefore reverse the order of evaluation, so that the
> expression passed in is evaluated first, and the RCU lockdep expression is
> evaluated only if the passed-in expression evaluated to false, courtesy
> of the C-language short-circuit boolean evaluation rules. This compells
> the compiler to forego executing the RCU lockdep expression in cases
> where the passed-in expression evaluates to "1" at compile time, so that
> (for example) rcu_dereference_raw() can be guaranteed to execute safely
> withing an NMI handler.

My particular worry yesterday was tracing; I was looking at
rcu_read_{,un}lock_notrace() and wondered what would happen if I used
list_for_each_entry_rcu() under it.

_If_ it would indeed do that call, we can end up in:

list_entry_rcu() -> rcu_dereference_raw() -> rcu_dereference_check()
-> rcu_read_lock_held() -> rcu_lockdep_current_cpu_online()
-> preempt_disable()

And preempt_disable() is a traceable thing -- not to mention half the
callstack above doesn't have notrace annotations and would equally
generate function trace events.

Thereby rendering the rcu list ops unsuitable for using under _notrace()
rcu primitives.

So yes, fully agreed on this patch.

Acked-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>


FWIW I think I won't be needing the rcu _notrace() bits (for now), but
it leading to this patch was worth it anyhow ;-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/