Re: [tip:smp/hotplug] trace/perf: Cure hotplug lock ordering issues

From: Steven Rostedt
Date: Thu May 11 2017 - 20:27:17 EST


On Sun, 23 Apr 2017 04:30:01 -0700
tip-bot for Thomas Gleixner <tipbot@xxxxxxxxx> wrote:

> Commit-ID: 24db7a671bd5eea76b17138b976eb9a4072f1b7a
> Gitweb: http://git.kernel.org/tip/24db7a671bd5eea76b17138b976eb9a4072f1b7a
> Author: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> AuthorDate: Sun, 23 Apr 2017 12:17:13 +0200
> Committer: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> CommitDate: Sun, 23 Apr 2017 13:24:34 +0200
>
> trace/perf: Cure hotplug lock ordering issues
>
> The get_online_cpus() rework unearthed lock ordering issues.
>
> Reorder hotpluglock and event_mutex() to avoid circular locking
> dependencies and convert static_key_slow_dec() to the cpuslocked version to
> avoid hotplug lock recursion.
>
> That also requires to protect the call to perf_trace_destroy() against cpu
> hotplug.
>
> Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> Cc: Steven Rostedt <rostedt@xxxxxxxxxxx>

BTW, I only got an email with the tip commit. Was this sent out before
committing to tip?

>
> ---
> kernel/events/core.c | 2 ++
> kernel/trace/trace_events.c | 6 ++++++
> kernel/tracepoint.c | 4 ++--
> 3 files changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index b5b4f52f..997123c 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -7783,7 +7783,9 @@ EXPORT_SYMBOL_GPL(perf_tp_event);
>
> static void tp_perf_event_destroy(struct perf_event *event)
> {
> + get_online_cpus();
> perf_trace_destroy(event);
> + put_online_cpus();
> }

OK, so I merged the branch I had Linus pull into the commit just before
this one in tip. Then I ran all my tracing tests and not one triggers
any lockdep issues. Just to make sure this was still an issue, as soon
as I ran:

# perf record -e sched:sched_switch -a sleep 1

It triggered:

[ 198.228443] ======================================================
[ 198.235892] [ INFO: possible circular locking dependency detected ]
[ 198.243406] 4.11.0-test+ #541 Not tainted
[ 198.248657] -------------------------------------------------------
[ 198.256144] perf/4027 is trying to acquire lock:
[ 198.261972] (event_mutex){+.+.+.}, at: [<ffffffff81195713>] perf_trace_init+0x23/0x2d0
[ 198.271162]
[ 198.271162] but task is already holding lock:
[ 198.279353] (cpu_hotplug_lock.rw_sem){++++++}, at: [<ffffffff811b8280>] SYSC_perf_event_open+0x3c0/0x10f0

The point being, there's no place that holds event_mutex that actually
requires having get_online_cpus(), besides perf. And I don't think perf
needs it here either. It probably needs it for the hardware events,
but does it really need it for the software ones? Is there a way to
separate software event initialization from hardware ones? Or does it
really need to have get_online_cpus() when enabling software events.
Can we have it as a delayed case? Enable them after put_online_cpus()
and disable them before the call to get_online_cpus()?

I'm thinking it will be a hell of a lot easier to make perf not need to
have get_online_cpus() taken when calling the tp events. Have it
separated out if possible. Otherwise, you are going to be playing whack
a mole for a long time if you want to inverse event_mutex with
get_online_cpus().

-- Steve

>
> static int perf_tp_event_init(struct perf_event *event)
> diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
> index 9311654..8156bfd 100644
> --- a/kernel/trace/trace_events.c
> +++ b/kernel/trace/trace_events.c
> @@ -747,9 +747,11 @@ static int __ftrace_set_clr_event(struct trace_array *tr, const char *match,
> {
> int ret;
>
> + get_online_cpus();
> mutex_lock(&event_mutex);
> ret = __ftrace_set_clr_event_nolock(tr, match, sub, event, set);
> mutex_unlock(&event_mutex);
> + put_online_cpus();
>
> return ret;
> }
> @@ -1039,11 +1041,13 @@ event_enable_write(struct file *filp, const char __user *ubuf, size_t cnt,
> case 0:
> case 1:
> ret = -ENODEV;
> + get_online_cpus();
> mutex_lock(&event_mutex);
> file = event_file_data(filp);
> if (likely(file))
> ret = ftrace_event_enable_disable(file, val);
> mutex_unlock(&event_mutex);
> + put_online_cpus();
> break;
>
> default:
> @@ -2902,6 +2906,7 @@ early_event_add_tracer(struct dentry *parent, struct trace_array *tr)
>
> int event_trace_del_tracer(struct trace_array *tr)
> {
> + get_online_cpus();
> mutex_lock(&event_mutex);
>
> /* Disable any event triggers and associated soft-disabled events */
> @@ -2924,6 +2929,7 @@ int event_trace_del_tracer(struct trace_array *tr)
> tr->event_dir = NULL;
>
> mutex_unlock(&event_mutex);
> + put_online_cpus();
>
> return 0;
> }
> diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
> index 685c50a..10ce910 100644
> --- a/kernel/tracepoint.c
> +++ b/kernel/tracepoint.c
> @@ -220,7 +220,7 @@ static int tracepoint_add_func(struct tracepoint *tp,
> */
> rcu_assign_pointer(tp->funcs, tp_funcs);
> if (!static_key_enabled(&tp->key))
> - static_key_slow_inc(&tp->key);
> + static_key_slow_inc_cpuslocked(&tp->key);
> release_probes(old);
> return 0;
> }
> @@ -250,7 +250,7 @@ static int tracepoint_remove_func(struct tracepoint *tp,
> tp->unregfunc();
>
> if (static_key_enabled(&tp->key))
> - static_key_slow_dec(&tp->key);
> + static_key_slow_dec_cpuslocked(&tp->key);
> }
> rcu_assign_pointer(tp->funcs, tp_funcs);
> release_probes(old);