Re: [PATCH] coresight: etm4x: Handle unreachable sink in perf mode

From: Mathieu Poirier
Date: Tue Sep 01 2020 - 13:18:45 EST


Good morning,

On Tue, Sep 01, 2020 at 11:28:55AM +0100, Suzuki K Poulose wrote:
> On 08/19/2020 08:22 PM, Mathieu Poirier wrote:
> > Hi Suzuki,
> >
> > On Tue, Aug 18, 2020 at 08:29:31PM +0100, Suzuki K Poulose wrote:
> > > If the specified/hinted sink is not reachable from a subset of the CPUs,
> > > we could end up unable to trace the event on those CPUs. This
> > > is the best effort we could do until we support 1:1 configurations.
> > > Fail gracefully in such cases avoiding a WARN_ON, which can be easily
> > > triggered by the user on certain platforms, like :
> > >
> > > [10919.513250] ------------[ cut here ]------------
> > > [10919.517861] WARNING: CPU: 2 PID: 24021 at
> > > drivers/hwtracing/coresight/coresight-etm-perf.c:316 etm_event_start+0xf8/0x100
> > > ...
> > >
> > > [10919.564403] CPU: 2 PID: 24021 Comm: perf Not tainted 5.8.0+ #24
> > > [10919.570308] pstate: 80400089 (Nzcv daIf +PAN -UAO BTYPE=--)
> > > [10919.575865] pc : etm_event_start+0xf8/0x100
> > > [10919.580034] lr : etm_event_start+0x80/0x100
> > > [10919.584202] sp : fffffe001932f940
> > > [10919.587502] x29: fffffe001932f940 x28: fffffc834995f800
> > > [10919.592799] x27: 0000000000000000 x26: fffffe0011f3ced0
> > > [10919.598095] x25: fffffc837fce244c x24: fffffc837fce2448
> > > [10919.603391] x23: 0000000000000002 x22: fffffc8353529c00
> > > [10919.608688] x21: fffffc835bb31000 x20: 0000000000000000
> > > [10919.613984] x19: fffffc837fcdcc70 x18: 0000000000000000
> > > [10919.619281] x17: 0000000000000000 x16: 0000000000000000
> > > [10919.624577] x15: 0000000000000000 x14: 00000000000009f8
> > > [10919.629874] x13: 00000000000009f8 x12: 0000000000000018
> > > [10919.635170] x11: 0000000000000000 x10: 0000000000000000
> > > [10919.640467] x9 : fffffe00108cd168 x8 : 0000000000000000
> > > [10919.645763] x7 : 0000000000000020 x6 : 0000000000000001
> > > [10919.651059] x5 : 0000000000000002 x4 : 0000000000000001
> > > [10919.656356] x3 : 0000000000000000 x2 : 0000000000000000
> > > [10919.661652] x1 : fffffe836eb40000 x0 : 0000000000000000
> > > [10919.666949] Call trace:
> > > [10919.669382] etm_event_start+0xf8/0x100
> > > [10919.673203] etm_event_add+0x40/0x60
> > > [10919.676765] event_sched_in.isra.134+0xcc/0x210
> > > [10919.681281] merge_sched_in+0xb0/0x2a8
> > > [10919.685017] visit_groups_merge.constprop.140+0x15c/0x4b8
> > > [10919.690400] ctx_sched_in+0x15c/0x170
> > > [10919.694048] perf_event_sched_in+0x6c/0xa0
> > > [10919.698130] ctx_resched+0x60/0xa0
> > > [10919.701517] perf_event_exec+0x288/0x2f0
> > > [10919.705425] begin_new_exec+0x4c8/0xf58
> > > [10919.709247] load_elf_binary+0x66c/0xf30
> > > [10919.713155] exec_binprm+0x15c/0x450
> > > [10919.716716] __do_execve_file+0x508/0x748
> > > [10919.720711] __arm64_sys_execve+0x40/0x50
> > > [10919.724707] do_el0_svc+0xf4/0x1b8
> > > [10919.728095] el0_sync_handler+0xf8/0x124
> > > [10919.732003] el0_sync+0x140/0x180
> > >
> > > Fixes: f9d81a657bb8 ("coresight: perf: Allow tracing on hotplugged CPUs")
> > > Reported-by: Jeremy Linton <jeremy.linton@xxxxxxx>
> > > Cc: Mathieu Poirier <mathieu.poirier@xxxxxxxxxx>
> > > Cc: Mike Leach <mike.leach@xxxxxxxxxx>
> > > Signed-off-by: Suzuki K Poulose <suzuki.poulose@xxxxxxx>
> > > ---
> > > drivers/hwtracing/coresight/coresight-etm-perf.c | 10 ++++++++++
> > > 1 file changed, 10 insertions(+)
> > >
> > > diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c
> > > index 1a3169e69bb1..9d61a71da96f 100644
> > > --- a/drivers/hwtracing/coresight/coresight-etm-perf.c
> > > +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c
> > > @@ -321,6 +321,16 @@ static void etm_event_start(struct perf_event *event, int flags)
> > > if (!event_data)
> > > goto fail;
> > > + /*
> > > + * Check if this ETM is allowed to trace, as decided
> > > + * at etm_setup_aux(). This could be due to an unreachable
> > > + * sink from this ETM. We can't do much in this case if
> > > + * the sink was specified or hinted to the driver. For
> > > + * now, simply don't record anything on this ETM.
> > > + */
> >
> > Can you provide more details on the scenario and the topology of the system?
> > Without either it is hard to wrap my head around the problem to address.
> > Having that information in the changelog would go a long way.
>
> Sure. This was detected on N1SDP with the following topology, with the
> command :
>
>
> $ perf record -e cs_etm/@tmc_etf0/ --per-thread dd if=/dev/zero of=BIGFILE
> bs=1M count=100
>
>
> CPU0
> \
> Funnel0 ---- ETF0 --
> / \
> CPU1
> Funnel2
> CPU2
> \ /
> Funnel1 ---- ETF1 --
> /
> CPU3
>
>
> Basically, a pair of CPUS (0&1, 2&3 respectively) are connected to a static
> funnel followed by a TMC-ETF, before connecting to a main
> funnel which merges the ETMs and the STM on the system.
>
> In such a case, if the user selects tmc_etf0 as the sink for a perf
> session this could trigger a warning when starting the event on ETM2
> as we haven't been able to create a path. Also the CPU2 is cleared in
> the event_data->cpumask.

Ok, that's the kind of topology I imagined you were dealing with.

>
> I will add the above to the commit log.

Yes please.

>
> For now we don't really support multiple sinks for a perf session. This
> will need to be addressed for the per-CPU buffer scenario. But, we
> should fix the current logic until we get there, to avoid triggering
> the warnings, which can be done quite easily on these systems, which
> are not really per-CPU buffers.

I agree that it should fail gracefully.

>
> >
> > I'm sure this is a per-thread scenario because there is more than one CPU per
>
> Yes, it is a per-thread scenario.
>
> > event. I'm also suspecting this is on a system where there is one sink per CPU
> > cluster, and that is not supported.
>
> No, that is not exactly the case, from the topology above. But not N:1
> either. More of N:M and this is possible on systems with per cluster ETFs.
>
> >
> > If I am right on both account I am questionning the "Fixes". On a system with
> > N:1 topology the code introduced by f9d81a657bb8 will work in the event a CPU is
> > hotplugged in. The code introduced in this patch is simply to prevent a
>
> Correct. But, without the above commit, we would have failed while
> creating a path to the sink, because if a CPU was offline then we don't
> care about a path from that ETM. So this warning is essentially
> triggered only after the above commit and thus the Fixes tag.

That is an equally valid argument.

>
> > warn_on() trace from being generated on systems that aren't supported. It should
> > have a "stable" tag.
>
> Cheers
> Suzuki