Re: perf not picking up symbols for namespaced processes

From: Arnaldo Carvalho de Melo
Date: Tue Feb 11 2020 - 09:28:41 EST


Em Tue, Feb 11, 2020 at 01:54:33PM +0000, Marek Majkowski escreveu:
> On Tue, Feb 11, 2020 at 1:46 PM Arnaldo Carvalho de Melo <arnaldo.melo@xxxxxxxxx> wrote:
> > Em Tue, Feb 11, 2020 at 10:06:35AM +0000, Marek Majkowski escreveu:
> > > On Tue, Feb 4, 2020 at 7:27 PM Jiri Olsa <jolsa@xxxxxxxxxx> wrote:
> > > > > 11913 openat(AT_FDCWD, "/proc/9512/ns/mnt", O_RDONLY) = 197
> > > > > 11913 setns(197, CLONE_NEWNS) = 0
> > > > > 11913 stat("/home/marek/bin/runsc-debug", 0x7fffffff8480) = -1 ENOENT
> > > > > (No such file or directory)
> > > > > 11913 setns(196, CLONE_NEWNS) = 0

> > > > could you guys please share more details on what you run exactly,
> > > > and perhaps that change you mentioned?

> > > I was debugging gvisor (runsc), which does execve(/proc/self/exe), and
> > > then messes up with its mount namespace. The effect is that the binary
> > > running doesn't exist in the mount namespace. This confuses perf,
> > > which fails to load symbols for that process.

> > > To my understanding, by default, perf looks for the binary in the
> > > process mount namespace. In this case clearly the binary wasn't there.
> > > Ivan wrote a rough patch [1], which I just confirmed works. The patch
> > > attempts, after a failure to load binary from pids mount namespace, to
> > > load binary from the default mount namespace (the one running perf).

> > > [1] https://lkml.org/lkml/2019/12/5/878

> > That is a fallback that works in this specific case, and, with a warning
> > or some explicitely specified option makes perf work with this specific
> > usecase, but either a warning ("fallback to root namespace binary
> > /foo/bar") or the explicit option, please, is that what that patch does?

> You got it right, custom patch, to do something custom (look up in top
> mount ns) yet on failure. I'm not sure how to make it more generic.

We have buildids in binaries:

[acme@quaco ~]$ file /bin/bash
/bin/bash: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=0cb50a07a621d02a0d2c7efec6743fddec845bfb, stripped
[acme@quaco ~]$ file /lib64/libc-2.29.so
/lib64/libc-2.29.so: ELF 64-bit LSB shared object, x86-64, version 1 (GNU/Linux), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=7ddecbbf9f22ec76c9e4a256fd1c06004a1907ce, for GNU/Linux 3.2.0, not stripped, too many notes (256)
[acme@quaco ~]$

We need to get this somehow from a given executable map, this comes and
goes in situations like this :-\

I.e. this info is in an ELF section:

[acme@quaco ~]$ readelf -SW /bin/bash | grep build-id
[ 4] .note.gnu.build-id NOTE 0000000000000340 000340 000024 00 A 0 0 4
[acme@quaco ~]$

Somebody needs to associate that with that executable mmap at load time,
so that perf gets it via PERF_RECORD_MMAP3 instead of having to try,
optimistically, to get it from the binary (that may not be there when we
try to read it, or maybe in some place like you describe in this
message, or...) when generating its build-id perf.data header section:

[acme@seventh ~]$ perf record stress-ng --cpu 1 --timeout 1s
stress-ng: info: [17622] dispatching hogs: 1 cpu
stress-ng: info: [17622] successful run completed in 1.02s
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.159 MB perf.data (4105 samples) ]
[acme@seventh ~]$ perf buildid-list
e9e69be73f7c5a4cee110ced52409371e95fe2a8 [kernel.kallsyms]
7133e5dbdfae821a9bbe4ba5467e49f6cf166e1d /usr/bin/stress-ng
bd5e36f101b175755c7943105390078dff596657 /usr/lib64/ld-2.29.so
1e292b30223c69eff986710c62eda48c561d43ca [vdso]
b8d7438178da8f84d89869addf6b5e36d356c555 /usr/lib64/libm-2.29.so
7ddecbbf9f22ec76c9e4a256fd1c06004a1907ce /usr/lib64/libc-2.29.so
[acme@seventh ~]$ file /usr/bin/stress-ng
/usr/bin/stress-ng: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=7133e5dbdfae821a9bbe4ba5467e49f6cf166e1d, stripped, too many notes (256)
[acme@seventh ~]$

> Furthermore, there is one more use case this patch doesn't support:
> namely a situation when the binary is reachable in some mount
> namespace, but not under sensible path. This can happen when we launch
> a command under gvisor. Gvisor-sandbox runs under empty mount
> namespace, the binary is delivered over 9p from gvisor-gofer process,
> from potentially arbitrary path. In that scenario we have three mount
> namespaces: the empty one running process, another one with access to
> the binary, and host one.

> I have two ideas how to solve the symbol discovery here:
> (a) give perf an explicit link (potentially including mount namespace
> pid) to the binary
> (b) supply perf with /tmp/perf-<pid>.map file with symbols, extracted
> via some external helper.
>
> I tried (b) but failed, I'm not sure how to produce perf-pid.map from
> a proper binary, using basic tools like readelf.

Have you looked at:

[acme@quaco ~]$ perf report -h symfs

Usage: perf report [<options>]

--symfs <directory>
Look for files with symbols relative to this directory

[acme@quaco ~]$

?

- Arnaldo