Re: [PATCH 2/2] perf tools: Fix find_perf_probe_point_from_map() which incorrectly returns success

From: Arnaldo Carvalho de Melo
Date: Fri Nov 06 2015 - 08:43:17 EST


Em Fri, Nov 06, 2015 at 05:27:06PM +0800, Wangnan (F) escreveu:
> On 2015/11/6 16:30, Wangnan (F) wrote:
> >On 2015/11/6 15:12, åæéå / HIRAMATUïMASAMI wrote:
> >>From: acme@xxxxxxxxxx [mailto:acme@xxxxxxxxxx]
> >>>>Em Thu, Nov 05, 2015 at 02:08:48PM +0000, åæéå / HIRAMATUïMASAMI escreveu:
> >>>>>From: Wang Nan [mailto:wangnan0@xxxxxxxxxx]

> [SNIP]
> >>Ah, finally I got what happened. I guess the problem may happen when
> >>we put a probe on the kernel somewhere outside of any functions and
> >>run "perf probe -l".

> >>I think it should not be allowed to put the probe outside any symbol.

> >>The background is here, at first "perf-probe -a somewhere" defines a
> >>probe in the kernel but its address is relative from "_text". (thus,
> >>vfs_read becomes "_text+2348080"

> >> for example). Since it is not readable by human, perf probe -l tries
> >>to get an appropriate
> >>symbol from the "_text+OFFSET".
> >>For the purpose, the first kernel_get_symbol_address_by_name() is for
> >>translating _text to
> >>an address, and the second __find_kernel_function() is for finding a
> >>symbol from the
> >>address+OFFSET.

> >>Then, if the address+OFFSET is out of the symbol map, the second one can
> >>fail.

> >>This means the first symbol and the second symbol is not same.

> >>So, the direction of Wang solution is good :). Just a cleanup is
> >>required.

> >I also tried to finger out the problem for all day and made some progress.
> >It is another
> >problem. It happeneds when probing an address reside in a module on
> >aarch64 system.
> >
> >On my aarch64 system I use kcore. Different from x86, on aarch64, modules
> >address is lower
> >than normal kernel. For example:
> >
> >On x86_64:
> >
> ># readelf -a /proc/kcore
> >
> > Type Offset VirtAddr PhysAddr
> > FileSiz MemSiz Flags Align
> > ...
> > LOAD 0x00007fff81003000 0xffffffff81000000 0x0000000000000000
> ><-- kernel
> > 0x0000000001026000 0x0000000001026000 RWE 1000
> > LOAD 0x00007fffa0003000 0xffffffffa0000000 0x0000000000000000
> ><-- module
> > 0x000000005f000000 0x000000005f000000 RWE 1000
> >
> >On aarch64:
> >
> > Type Offset VirtAddr PhysAddr
> > FileSiz MemSiz Flags Align
> > ...
> > LOAD 0x0000000000002000 0xffffffc000000000 0x0000000000000000
> ><-- kernel
> > 0x000000007fc00000 0x000000007fc00000 RWE 1000
> > LOAD 0xfffffffffc002000 0xffffffbffc000000 0x0000000000000000
> ><-- module
> > 0x0000000004000000 0x0000000004000000 RWE 1000
> >
> >See? On aarch64, Offset field of module address area is negative.
> >
>
> One thing should be noticed that, even if normal kernel code and modules use
> different
> 'struct map', they share a same dso. Please see dso__load_kcore, notice how
> it initialize
> parameters (md) before calling file__read_maps().
>
> >Which causes a problem in dso__split_kallsyms_for_kcore(): when it
> >adjusting symbols
> >using "pos->start -= curr_map->start - curr_map->pgoff", the relative
> >order between
> >module functions and normal kernel function is changed.
> >
> >For example:
> >
> >funca at 0xffffffc00021b428 is a normal kernel function.
> >funcb at 0xffffffbffc000000 is a function in kernel.
> >
> >During parsing /proc/kallsyms, address of funca > address of funcb.

> >However, after the adjusting:

> >funca becomes:

> >0xffffffc00021b428 - (0xffffffc000000000 - 0x2000) = 0x21d428

> >funcb becomes:

> >0xffffffbffc000000 - (0xffffffbffc000000 - 0xfffffffffc002000) =
> >0xfffffffffc002000

> >address of funca < address of funcb.

> >Unfortunately, the rbtree is not adjusted in this case.

> Even if they are in different maps, they share a same dso here, so a same
> rbtree.

Yeah, see the answer to the patch you sent, we can't change the symbols
in a DSO, as it may be shared by multiple maps (think about glibc and
prelink, even without prelink) the same applies for kernel modules, that
we represent in the same way, and in at least one case, i.e. split
kallsyms for modules, core kernel, etc we share the same dso by multiple
maps, so any adjustment that needs to be done should be done to the map
members, not to the dso ones.

CCing Adrian, that originally wrote the kcore code, but IIRC there are
other places that touch sym-> (thus dso internal state) instead of
adjusting map members :-\

- Arnaldo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/