Re: [PATCH] perf: teach perf inject to merge sched_stat_* andsched_switch events (v3)

From: Frederic Weisbecker
Date: Mon Oct 15 2012 - 17:27:04 EST


2012/9/18 Andrew Vagin <avagin@xxxxxxxxxx>:
> You may want to know where and how long a task is sleeping. A callchain
> may be found in sched_switch and a time slice in stat_iowait, so I add
> handler in perf inject for merging this events.
>
> My code saves sched_switch event for each process and when it meets
> stat_iowait, it reports the sched_switch event, because this event
> contains a correct callchain. By another words it replaces all
> stat_iowait events on proper sched_switch events.
>
> v2: - remove the global variable "session"
> - hadle errors from malloc()
>
> v3: - use sample->tid instead of sample->pid for merging events.
>
> Frederic Weisbecker noticed that this code works only in a root pidns.
> It's true, because a pid from trace content is not pid-namespace safe
> and currently no way to get this pid in a current pidns. This problem
> is more general, so I don't think that it should be solved in this series.
>
> Cc: Arnaldo Carvalho de Melo <acme@xxxxxxxxxxxxxxxxxx>

I'm ok with it, so Acked-by: Frederic Weisbecker <fweisbec@xxxxxxxxx>

I just have some suggestions below.

> Cc: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>
> Cc: Paul Mackerras <paulus@xxxxxxxxx>,
> Cc: Ingo Molnar <mingo@xxxxxxxxxx
> Cc: Andi Kleen <andi@xxxxxxxxxxxxxx>
> Cc: David Ahern <dsahern@xxxxxxxxx>
> Signed-off-by: Andrew Vagin <avagin@xxxxxxxxxx>
> ---
> tools/perf/Documentation/perf-inject.txt | 4 ++
> tools/perf/builtin-inject.c | 86 ++++++++++++++++++++++++++++++
> 2 files changed, 90 insertions(+), 0 deletions(-)
>
> diff --git a/tools/perf/Documentation/perf-inject.txt b/tools/perf/Documentation/perf-inject.txt
> index 6be2101..c04e0c6 100644
> --- a/tools/perf/Documentation/perf-inject.txt
> +++ b/tools/perf/Documentation/perf-inject.txt
> @@ -35,6 +35,10 @@ OPTIONS
> -o::
> --output=::
> Output file name. (default: stdout)
> +-s::
> +--sched-stat::
> + Merge sched_stat and sched_switch for getting events where and how long
> + tasks slept.

Please provide some more explanations here. I fear it's not very clear
for the user. May be tell about the fact it results in sched_switch
events weighted with the time slept.

[...]
> +static int perf_event__sched_stat(struct perf_tool *tool,
> + union perf_event *event,
> + struct perf_sample *sample,
> + struct perf_evsel *evsel,
> + struct machine *machine)
> +{
> + const char *evname = NULL;
> + uint32_t size;
> + struct event_entry *ent;
> + union perf_event *event_sw = NULL;
> + struct perf_sample sample_sw;
> + int sched_process_exit;
> +
> + size = event->header.size;
> +
> + evname = evsel->tp_format->name;
> +
> + sched_process_exit = !strcmp(evname, "sched_process_exit");
> +
> + if (!strcmp(evname, "sched_switch") || sched_process_exit) {
> + list_for_each_entry(ent, &samples, node)
> + if (sample->tid == ent->tid)

Make sure you have PERF_SAMPLE_TID.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/