Re: [patch 0/4] perf_counter tools: support annotation of livekernel modules
From: Ingo Molnar
Date: Thu Jul 02 2009 - 03:42:37 EST
* Mike Galbraith <efault@xxxxxx> wrote:
> On Thu, 2009-07-02 at 08:47 +0200, Ingo Molnar wrote:
> > * Mike Galbraith <efault@xxxxxx> wrote:
> >
> > > Per $subject, this patch set only supports for the LIVE kernel.
> > > It adds support infrastructure for path discovery, load address
> > > lookup, and symbol generation of live kernel modules.
> > >
> > > TODO includes resurrection of live annotation in perf top, and
> > > support for annotation and report generation of other than live
> > > modules. As the patch set sits, Perf top can generate symbols
> > > from live binaries, but there's no live annotation capability yet.
> > >
> > > patch1: perf_counter tools: Make symbol loading consistently return number of loaded symbols.
> > > patch2: perf_counter tools: Add infrastructure to support loading of kernel module symbols
> > > patch3: perf_counter tools: connect module support infrastructure to symbol loading infrastructure
> > > patch4: perf_counter tools: Enable kernel module symbol loading in tools
> > >
> > > Comments and suggestions most welcome.
> >
> > Looks very nice! I've applied it with a few minor stylistic fixlets
> > and a tad more verbose changelogs.
>
> Thanks!
>
> (sorry about changelogs, I did stare at them, nothing spiffy
> happened)
[ We want to be verbose in changelogs generally - i.e. it's not a
problem at all to tell a boring story about what happens in the
patch. To _you_ it certainly looks boring - to others it's a
useful summary that sets their mind-set before looking at the
patch. ]
> > I'm wondering about the next step: couldnt we somehow guess at
> > the position of the vmlinux too, validate somehow that it
> > corresponds to the kernel we are running - and then use it
> > automatically and by default?
>
> I don't know of a way to discover where the image lives. Been
> pondering that very thing, along with idiot-proofing.
There's two main usecases:
- distro kernels. Here the vmlinux and module path varies but
should be discoverable with a finite list of try-and-err paths.
- 'make install modules_install' builds of kernel developers. Here
the vmlinux and the source tree might be anywhere. A small trick
might help: we could expose the build position of the kernel
source tree via a new /proc/kernel-buildpath special file, which
contains the vmlinux filename plus an MD5 sum (or CRC32) for good
measure.
Note that /proc/kernel-buildpath might also help the distro case: a
distro could set it thusly to have the correct position for a
debuginfo rpm/deb install.
I.e. /proc/kernel-buildpath and the MD5 could solve both usecases.
Other tools could make use of it too.
A second, more complex possibility would be to expose the kernel
image itself plus the module images as well. This has limitations
though: debuginfo wont be embedded, and symbols are in
/proc/kallsyms (which we do parse).
The advantage is that it's all readily available in memory (just not
exposed), plus it would show the _real_ instructions - the
post-paravirt-fixup post-ftrace-fixup and other dynamic patching
results.
To expose that we'd have to create some sort of special "kernel
image directory" within debugfs that has files like:
/debug/kimage/vmlinux
/debug/kimage/modules/
/debug/kimage/modules/snd_hda_intel.ko
/debug/kimage/modules/firewire_core.ko
Debugfs is quite easy to use and if we dont make it too fancy (no
separate module directories for example) it would be doable without
too much fuss.
It would be assembly-only annotations, without debuginfo.
> > Plus, offline analysis would be nice as well i suspect - being
> > able to look at profiles on a different box?
>
> Yes, that's high on my TODO. I've been pondering a perf archive
> tool that would package everything that's needed to do analysis on
> a different box. One big problem though, is that while you can
> easily package vmlinux and modules, what about all the userland
> binaries? A large perf.data and/or debug info binaries can easily
> make transport impractical enough.
I wouldnt worry about size too much, at least initially.
[ If it ever becomes a big issue then we could do a separate 'perf
compress' pass which could do a 'specific'/sparse snapshot of
affected binaries: i.e. pre-parse the data file, pick out all the
RIPs that matter and check which binaries relate to them, and then
read and pack those bits only. ]
Plus we could use Git's zlib smarts to compress the data file on the
fly as well, during data capture. It's very easy to generate a gig
or two of data currently.
> After I resurrect (well, try) live annotation in top, I'll fiddle
> with offline kernel analysis.
Ok :-)
Btw, another thing: we are thinking about making -F 1000 (1 KHz
auto-freq sampling) the default for perf top and perf record. This
way we'd always gather enough data (and never too much or too little
data), regardless of the intensity of the workload. Have you played
with -F before, what's your general experience about it? It's
particularly useful for 'rare' and highly fluctuating events like
cache-misses.
Maybe 1KHz is a bit too low - Oprofile defaults to 100000 cycles
interval by default which is about 10 KHz on a 1GHz box and 30 KHz
on a 3GHz box. Perhaps 10 KHz is a better default?
Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/