Re: [PATCH] perf/sdt: Directly record cached SDT events

From: Masami Hiramatsu
Date: Mon May 02 2016 - 20:25:55 EST


On Mon, 2 May 2016 11:19:34 -0700
Brendan Gregg <brendan.d.gregg@xxxxxxxxx> wrote:

> On Fri, Apr 29, 2016 at 6:40 AM, Hemant Kumar <hemant@xxxxxxxxxxxxxxxxxx> wrote:
> > This patch adds support for directly recording SDT events which are
> > present in the probe cache. This patch is based on current SDT
> > enablement patchset (v5) by Masami :
> > https://lkml.org/lkml/2016/4/27/828
> > and it implements two points in the TODO list mentioned in the
> > cover note :
> > "- (perf record) Support SDT event recording directly"
> > "- (perf record) Try to unregister SDT events after record."
> >
> > Without this patch, we could probe into SDT events using
> > "perf probe" and "perf record". With this patch, we can probe
> > the SDT events directly using "perf record".
> >
> > For example :
> >
> > # perf list sdt // List the SDT events
> > ...
> > sdt_mysql:update__row__done [SDT event]
> > sdt_mysql:update__row__start [SDT event]
> > sdt_mysql:update__start [SDT event]
> > sdt_python:function__entry [SDT event]
> > sdt_python:function__return [SDT event]
> > sdt_test:marker1 [SDT event]
> > sdt_test:marker2 [SDT event]
> > ...
> >
> > # perf record -e %sdt_test:marker1 -e %sdt_test:marker2 -a
>
> Why do we need the '%'? Can't the "sdt_" prefix be sufficient? ie:
>
> # perf record -e sdt_test:marker1 -e sdt_test:marker2 -a

For the perf-record side, "sdt_test:marker1" gives just a normal
tracepoint event name (which is common with probe events on
ftrace/perftools). For example, if I add a probe event by perf probe,
it is shown same as other tracepoint events. This means I can make
"sdt_test:marker1" with other address in principle.

----
$ sudo ./perf probe -a "sdt_test:marker1=vmalloc"
Added new event:
sdt_test:marker1 (on vmalloc)

You can now use it in all perf tools, such as:

perf record -e sdt_test:marker1 -aR sleep 1
----

So, you can shot you feet, easily:)

One possible solution is reserving "sdt_" prefix for SDT, then
we can avoid using "%" for that.

However, what I intended was more generic solution including probe-cache,
so that user can freely replay on cached probes once the user defines a
probe, even after rebooting the machine. Of course, we can search such
events automatically if a user gives a non-existing event name.

> I find it a bit weird to define it using %sdt_, but then use it using
> sdt_. I'd also be inclined to use it for probe creation, ie:
>
> # perf probe -x /lib/libc-2.17.so sdt_libc:lll_lock_wait_private
>
> That way, the user only learns one way to specify the probe, with the
> sdt_ prefix. It's fine if % existed too, but optional.

OK, if we can see "sdt_" prefix on the first place, we can treat as there
is "%" :)

> > ^C[ perf record: Woken up 1 times to write data ]
> > [ perf record: Captured and wrote 2.087 MB perf.data (22 samples) ]
> >
> > # perf script
> > test_sdt 29230 [002] 405550.548017: sdt_test:marker1: (400534)
> > test_sdt 29230 [002] 405550.548064: sdt_test:marker2: (40053f)
> > test_sdt 29231 [002] 405550.962806: sdt_test:marker1: (400534)
> > test_sdt 29231 [002] 405550.962841: sdt_test:marker2: (40053f)
> > test_sdt 29232 [001] 405551.379327: sdt_test:marker1: (400534)
> > ...
> >
> > After invoking "perf record", behind the scenes, it checks whether the
> > event specified is an SDT event using the flag '%'. After that, it
> > does a lookup of the probe cache to find out the SDT event. If its not
> > present, it throws an error. Otherwise, it goes on and writes the event
> > into the uprobe_events file and sets up the probe event, trace events,
> > etc and starts recording. It also maintains a list of the event names
> > that were written to uprobe_events file. After finishing the record
> > session, it removes the events from the uprobe_events file using the
> > maintained name list.
>
> Does this support semaphore SDT probes (is-enabled)? Those need the
> semaphore incremented when enabled, then decremented when disabled.

No, not actually supported yet. Semaphore and SDT parameters will be
supported afterwards.

Thank you!

--
Masami Hiramatsu <mhiramat@xxxxxxxxxx>