Re: [PATCH] perf: sample after exit loses thread correlation

From: Jiri Olsa
Date: Sat Jul 27 2013 - 08:16:06 EST


On Fri, Jul 26, 2013 at 04:04:14PM -0600, David Ahern wrote:
> Occassionally events (e.g., context-switch, sched tracepoints) are losing
> the conversion of sample data associated with a thread. For example:
>
> $ perf record -e sched:sched_switch -c 1 -a -- sleep 5
> $ perf script
> <selected events shown>
> ls 30482 [000] 1379727.583037: sched:sched_switch: prev_comm=ls prev_pid=30482 ...
> ls 30482 [000] 1379727.586339: sched:sched_switch: prev_comm=ls prev_pid=30482 ...
> :30482 30482 [000] 1379727.589462: sched:sched_switch: prev_comm=ls prev_pid=30482 ...
>
> The last line lost the conversion from tid to comm. If you look at the events
> (perf script -D) you see why - SAMPLE event is generated after the EXIT:
>
> 0 1379727589449774 0x1540b0 [0x38]: PERF_RECORD_EXIT(30482:30482):(30482:30482)
> 0 1379727589462497 0x1540e8 [0x80]: PERF_RECORD_SAMPLE(IP, 1): 30482/30482: 0xffffffff816416f1 period: 1 addr: 0
> ... thread: :30482:30482
>
> When perf processes the EXIT event the thread is moved to the dead_threads
> list. When the SAMPLE event is processed no thread exists for the pid so a new
> one is created by machine__findnew_thread.
>
> This patch addresses the problem by saving the exit time and checking the
> dead_threads list for the requested tid. If the time passed into
> machine_findnew_thread is non-0 the dead_threads list is walked looking for
> the tid. If the thread struct associated with the tid exited within 1 msec
> of the time passed in the thread is considered a match and returned.
>
> If samples do not contain timestamps then sample->time will be 0 and the
> dead_threads list will not be checked. -1 can be used to force always checking
> the dead_threads list and returning a match.
>
> With this patch we get the previous example shows:
>
> ls 30482 [000] 1379727.583037: sched:sched_switch: prev_comm=ls prev_pid=30482 ...
> ls 30482 [000] 1379727.586339: sched:sched_switch: prev_comm=ls prev_pid=30482 ...
> ls 30482 [000] 1379727.589462: sched:sched_switch: prev_comm=ls prev_pid=30482 ...
>
> and
>
> 0 1379727589449774 0x1540b0 [0x38]: PERF_RECORD_EXIT(30482:30482):(30482:30482)
> 0 1379727589462497 0x1540e8 [0x80]: PERF_RECORD_SAMPLE(IP, 1): 30482/30482: 0xffffffff816416f1 period: 1 addr: 0
> ... thread: ls:30482
>
> v2: Rebased to latest perf/core branch. Changed time comparison to use
> a macro which explicitly shows the time basis
>
> Signed-off-by: David Ahern <dsahern@xxxxxxxxx>
> Cc: Frederic Weisbecker <fweisbec@xxxxxxxxx>
> Cc: Ingo Molnar <mingo@xxxxxxxxxx>
> Cc: Jiri Olsa <jolsa@xxxxxxxxxx>

tested and

Acked-by: Jiri Olsa <jolsa@xxxxxxxxxx>

jirka
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/