Re: [tip:perf/core] perf: Fix race in perf_event_exec()

From: Paul E. McKenney
Date: Thu Jan 07 2016 - 11:46:36 EST


On Thu, Jan 07, 2016 at 11:36:21AM -0500, Eric Dumazet wrote:
> On Thu, Jan 7, 2016 at 11:26 AM, Paul E. McKenney
> <paulmck@xxxxxxxxxxxxxxxxxx> wrote:
> > On Thu, Jan 07, 2016 at 02:40:08PM +0100, Peter Zijlstra wrote:
> >> On Wed, Jan 06, 2016 at 01:56:56PM -0500, Eric Dumazet wrote:
> >> > On Wed, Jan 6, 2016 at 1:46 PM, tip-bot for Peter Zijlstra
> >> > <tipbot@xxxxxxxxx> wrote:
> >> >
> >> > >
> >> > > This is because context switches can swap the task_struct::perf_event_ctxp[]
> >> > > pointer around. Therefore you have to either disable preemption when looking
> >> > > at current, or hold ctx->lock.
> >> > >
> >> >
> >> >
> >> > >
> >> > > void perf_event_exec(void)
> >> > > {
> >> > > - struct perf_event_context *ctx;
> >> > > int ctxn;
> >> > >
> >> > > rcu_read_lock();
> >> >
> >> > Do we still need this rcu_read_lock(), if perf_event_enable_on_exec()
> >> > uses local_irq_save( ?
> >>
> >> Strictly speaking we should not rely on the fact that RCU grace periods
> >> do not progress with IRQs disabled, so yes.
> >
> > What Peter said!
> >
> > The current implementation would let you get away with IRQs disabled
> > (aside from lockdep splats), but it is easy to imagine implementations
> > that do not, hence the restriction.
> >
> > Is the extra rcu_read_lock() and rcu_read_unlock() causing a performance
> > problem? If so, one option would be use of synchronize_sched() and
> > call_rcu_sched(), which explicitly recognize disabled IRQs as read-side
> > critical sections. Of course, this might well put you in a position
> > where you would like to use IRQ disabling in some cases, BH disabling
> > (or softirq context) in others, and rcu_read_lock() in yet other cases.
>
> I guess I could not see why the rcu_read_lock() was needed around this loop.
>
> rcu_read_lock();
> for ((ctxn) = 0; (ctxn) < perf_nr_task_contexts; (ctxn)++)
> perf_event_enable_on_exec(ctxn);
> rcu_read_unlock();
>
> But apparently it is needed in perf_event_enable_on_exec() and we avoid calling
> rcu_read_lock() and rcu_read_unlock() (perf_nr_task_contexts - 1 ) times

If the overhead of rcu_read_lock() and rcu_read_unlock() really are
a noticeable fraction of the loop overhead, and if perf_nr_task_contexts
is large, then you might well have a problem. I still suggest actually
measuring it, but you know me. ;-)

> > If so, the good news is that RCU actually now supports this efficiently.
> > There is the shiny new synchronize_rcu_mult() macro that can be used
> > as follows to wait for all three types of grace periods:
> >
> > synchronize_rcu_mult(call_rcu, call_rcu_bh, call_rcu_sched);
> >
> > This would wait concurrently for all three grace periods to elapse.
>
> Interesting ;)

;-) ;-) ;-)

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/