Re: [PATCH v2] ftrace: record command lines at more appropriate moment

From: Li Zefan
Date: Wed Jul 28 2010 - 22:57:12 EST


Ian Munsie wrote:
> From: Ian Munsie <imunsie@xxxxxxxxxxx>
>
> Previously, when tracing was activated through debugfs, regardless of
> which tracing plugin (if any) were activated, the probe_sched_switch and
> probe_sched_wakeup probes from the sched_switch plugin would be
> activated. This appears to have been a hack to use them to record the
> command lines of active processes as they were scheduled.
>
> That approach would suffer if many processes were being scheduled that
> were not generating events as they would consume entries in the
> saved_cmdlines buffer that could otherwise have been used by other
> processes that were actually generating events.
>
> It also had the problem that events could be mis-attributed - in the
> common situation of a process forking then execing a new process, the
> change of the process command would not be noticed for some time after
> the exec until the process was next scheduled.
>
> If the trace was read after the fact this would generally go unnoticed
> because at some point the process would be scheduled and the entry in
> the saved_cmdlines buffer would be updated so that the new command would
> be reported when the trace was eventually read. However, if the events
> were being read live (e.g. through trace_pipe), the events just after
> the exec and before the process was next scheduled would show the
> incorrect command (though the PID would be correct).
>
> This patch removes the sched_switch hack altogether and instead records
> the commands at a more appropriate moment - when a new trace event is
> committed onto the ftrace ring buffer. This means that the recorded
> command line is much more likely to be correct when the trace is read,
> either live or after the fact, so long as the command line still resides
> in the saved_cmdlines buffer.
>
> It is still not guaranteed to be correct in all situations. For instance
> if the trace is read after the fact rather than live (consider events
> generated by a process before an exec - in the below example they would
> be attributed to sleep rather than stealpid since the entry in
> saved_cmdlines would have changed before the event was read), but this
> is no different to the current situation and the alternative would be to
> store the command line with each and every event.
>
> terminal 1: grep '\-12345' /sys/kernel/debug/tracing/trace_pipe
> terminal 2: ./stealpid 12345 `which sleep` 0.1
>
> Before:
> stealpid-12345 [003] 86.001826: sys_clone -> 0x0
> stealpid-12345 [003] 86.002013: compat_sys_execve(ufilename: ffaaabef, argv: ffaaa7ec, envp: ffaaa7f8)
> stealpid-12345 [002] 86.002292: sys_restart_syscall -> 0x0
> stealpid-12345 [002] 86.002336: sys_brk(brk: 0)
> stealpid-12345 [002] 86.002338: sys_brk -> 0x1007a000
> ...
> stealpid-12345 [002] 86.002582: sys_mmap(addr: 0, len: 1000, prot: 3, flags: 22, fd: ffffffff, offset: 0)
> stealpid-12345 [002] 86.002586: sys_mmap -> 0xf7c21000
> sleep-12345 [002] 86.002771: sys_mprotect(start: ffe8000, len: 4000, prot: 1)
> sleep-12345 [002] 86.002780: sys_mprotect -> 0x0
> ...
>
> After:
> stealpid-12345 [003] 1368.823626: sys_clone -> 0x0
> stealpid-12345 [003] 1368.823820: compat_sys_execve(ufilename: fffa6bef, argv: fffa5afc, envp: fffa5b08)
> sleep-12345 [002] 1368.824125: sys_restart_syscall -> 0x0
> sleep-12345 [002] 1368.824173: sys_brk(brk: 0)
> sleep-12345 [002] 1368.824175: sys_brk -> 0x104ae000
> ...
>
> Signed-off-by: Ian Munsie <imunsie@xxxxxxxxxxx>

I've tested your patch using lmbench(ctx):

Context switching - times in microseconds - smaller is better
-------------------------------------------------------------------------
Host OS 2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw
--------- ------------- ------ ------ ------ ------ ------ ------- -------
(trace-off)
Linux 2.6.35- 2.1300 2.2100 2.0800 2.5900 2.1400 2.59000 2.19000
Linux 2.6.35- 2.1400 2.2000 2.0800 2.6000 2.0900 2.56000 2.15000

(all events on)
Linux 2.6.35- 2.8000 2.9600 2.7200 3.2500 2.8200 3.24000 2.98000
Linux 2.6.35- 2.7100 2.6900 2.7300 3.2200 2.8500 3.25000 2.79000

(all events on without cmdline-recording)
Linux 2.6.35- 2.6100 2.6900 2.5800 3.0300 2.5800 3.04000 2.67000
Linux 2.6.35- 2.5800 2.5900 2.5600 3.0300 2.6600 3.04000 2.61000

(your patch applied)
Linux 2.6.35- 2.7100 2.8000 2.7200 3.2100 2.8400 3.24000 2.82000
Linux 2.6.35- 2.6600 2.8400 2.6900 3.1900 2.7600 3.27000 2.78000

So with your patch applied, the performance is still worse than just disabling
cmdline-recording.

The performance may be worse if I choose some other benchmarks.

I'd suggest another approch, that we add a tracepiont in set_task_comm()
to record cmdname, and that's how perf does.

> ---
>
> Changes since v1 addressing feedback from Li Zefan:
> * Inline trace_save_cmdline and tracing_record_cmdline for a marginal speed
> gain when recording command lines.
> * Move call to tracing_record_cmdline from tracing_generic_entry_update to
> __trace_buffer_unlock_commit to avoid the overhead when using perf or if the
> event was filtered out.
>
> kernel/trace/trace.c | 7 +++----
> kernel/trace/trace_events.c | 11 -----------
> kernel/trace/trace_functions.c | 2 --
> kernel/trace/trace_functions_graph.c | 2 --
> kernel/trace/trace_sched_switch.c | 10 ----------
> 5 files changed, 3 insertions(+), 29 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/