Re: Broken dwarf unwinding - wrong stack pointer register value?

From: Milian Wolff
Date: Mon Oct 22 2018 - 07:17:52 EST


On Montag, 22. Oktober 2018 12:35:39 CEST Milian Wolff wrote:
> On Sonntag, 21. Oktober 2018 00:39:51 CEST Milian Wolff wrote:
> > Hey all,
> >
> > I'm on the quest to figure out why perf regularly fails to unwind (some)
> > samples. I am seeing very strange behavior, where an apparently wrong
> > stack
> > pointer value is read from the register - see below for more information
> > and the end of this (long) mail for my open questions. Any help would be
> > greatly appreciated.
> >
> > I am currently using this trivial C++ code to reproduce the issue:
> >
> > ```
> > #include <cmath>
> > #include <complex>
> > #include <iostream>
> > #include <random>
> >
> > using namespace std;
> >
> > int main()
> > {
> >
> > uniform_real_distribution<double> uniform(-1E5, 1E5);
> > default_random_engine engine;
> > double s = 0;
> > for (int i = 0; i < 10000000; ++i) {
> >
> > s += norm(complex<double>(uniform(engine), uniform(engine)));
> >
> > }
> > cout << s << '\n';
> > return 0;
> >
> > }
> > ```
> >
> > I compile it with `g++ -O2 -g` and then record it with `perf record
> > --call-
> > graph dwarf`. Using perf script, I then see e.g.:
> >
> > ```
> > $ perf script -v --no-inline --time 90229.12668,90229.127158 --ns
> > ...
> > # first frame (working unwinding from __hypot_finite):
> > unwind: reg 16, val 7faf7dca2696
> > unwind: reg 7, val 7ffc80811ca0
> > unwind: find_proc_info dso /usr/lib/libm-2.28.so
> > unwind: access_mem addr 0x7ffc80811ce8 val 7faf7dc88af9, offset 72
> > unwind: find_proc_info dso /usr/lib/libm-2.28.so
> > unwind: access_mem addr 0x7ffc80811d08 val 56382b0fc129, offset 104
> > unwind: find_proc_info dso
> > /home/milian/projects/kdab/rnd/hotspot/build/tests/
> > test-clients/cpp-inlining/cpp-inlining
> > unwind: access_mem addr 0x7ffc80811d58 val 7faf7dabf223, offset 184
> > unwind: find_proc_info dso /usr/lib/libc-2.28.so
> > unwind: access_mem addr 0x7ffc80811e18 val 56382b0fc1ee, offset 376
> > unwind: find_proc_info dso
> > /home/milian/projects/kdab/rnd/hotspot/build/tests/
> > test-clients/cpp-inlining/cpp-inlining
> > unwind: __hypot_finite:ip = 0x7faf7dca2696 (0x29696)
> > unwind: hypotf32x:ip = 0x7faf7dc88af8 (0xfaf8)
> > unwind: main:ip = 0x56382b0fc128 (0x1128)
> > unwind: __libc_start_main:ip = 0x7faf7dabf222 (0x24222)
> > unwind: _start:ip = 0x56382b0fc1ed (0x11ed)
> > # second frame (unrelated)
> > unwind: reg 16, val 56382b0fc114
> > unwind: reg 7, val 7ffc80811d10
> > unwind: find_proc_info dso
> > /home/milian/projects/kdab/rnd/hotspot/build/tests/
> > test-clients/cpp-inlining/cpp-inlining
> > unwind: access_mem addr 0x7ffc80811d58 val 7faf7dabf223, offset 72
> > unwind: access_mem addr 0x7ffc80811e18 val 56382b0fc1ee, offset 264
> > unwind: main:ip = 0x56382b0fc114 (0x1114)
> > unwind: __libc_start_main:ip = 0x7faf7dabf222 (0x24222)
> > unwind: _start:ip = 0x56382b0fc1ed (0x11ed)
> > # third frame (broken unwinding from __hypot_finite)
> > unwind: reg 16, val 7faf7dca2688
> > unwind: reg 7, val 7ffc80811ca0
> > unwind: find_proc_info dso /usr/lib/libm-2.28.so
> > unwind: access_mem addr 0x7ffc80811cc0 val 0, offset 32
> > unwind: __hypot_finite:ip = 0x7faf7dca2688 (0x29688)
> > unwind: '':ip = 0xffffffffffffffff (0x0)
> >
> > cpp-inlining 24617 90229.126685606: 711026 cycles:uppp:
> > 7faf7dca2696 __hypot_finite+0x36 (/usr/lib/libm-2.28.so)
> > 7faf7dc88af8 hypotf32x+0x18 (/usr/lib/libm-2.28.so)
> > 56382b0fc128 main+0x88 (/home/milian/projects/kdab/rnd/hotspot/
> >
> > build/tests/test-clients/cpp-inlining/cpp-inlining)
> >
> > 7faf7dabf222 __libc_start_main+0xf2 (/usr/lib/libc-2.28.so)
> > 56382b0fc1ed _start+0x2d (/home/milian/projects/kdab/rnd/hotspot/
> >
> > build/tests/test-clients/cpp-inlining/cpp-inlining)
> >
> > cpp-inlining 24617 90229.126921551: 714657 cycles:uppp:
> > 56382b0fc114 main+0x74 (/home/milian/projects/kdab/rnd/hotspot/
> >
> > build/tests/test-clients/cpp-inlining/cpp-inlining)
> >
> > 7faf7dabf222 __libc_start_main+0xf2 (/usr/lib/libc-2.28.so)
> > 56382b0fc1ed _start+0x2d (/home/milian/projects/kdab/rnd/hotspot/
> >
> > build/tests/test-clients/cpp-inlining/cpp-inlining)
> >
> > cpp-inlining 24617 90229.127157818: 719976 cycles:uppp:
> > 7faf7dca2688 __hypot_finite+0x28 (/usr/lib/libm-2.28.so)
> >
> > ffffffffffffffff [unknown] ([unknown])
> >
> > ...
> > ```
> >
> > Now I'm trying to figure out why one __hypot_finite sample works but the
> > other one breaks for no apparent reason.
>
> I've now collected some more background information, which is quite helpful
> I believe for the analysis of this issue:
>
> Note how the broken sample has the IP pointing at __hypot_finite+0x28:
>
> unwind: __hypot_finite:ip = 0x7faf7dca2688 (0x29688)
>
> When we run my reproducer code in GDB, we can see that obtaining a backtrace
> from that address works just fine there:
>
> ```
> $ gdb ./cpp-inlining
> GNU gdb (GDB) 8.2
> Copyright (C) 2018 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later
> <http://gnu.org/licenses/gpl.html> This is free software: you are free to
> change and redistribute it. There is NO WARRANTY, to the extent permitted
> by law.
> Type "show copying" and "show warranty" for details.
> This GDB was configured as "x86_64-pc-linux-gnu".
> Type "show configuration" for configuration details.
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>.
> Find the GDB manual and other documentation resources online at:
> <http://www.gnu.org/software/gdb/documentation/>.
>
> For help, type "help".
> Type "apropos word" to search for commands related to "word"...
> Reading symbols from ./cpp-inlining...done.
> (gdb) break main
> Breakpoint 1 at 0x10a0: file /home/milian/projects/kdab/rnd/hotspot/tests/
> test-clients/cpp-inlining/main.cpp, line 34.
> (gdb) run
> Starting program: /home/milian/projects/kdab/rnd/hotspot/build/tests/test-
> clients/cpp-inlining/cpp-inlining
>
> Breakpoint 1, main () at /home/milian/projects/kdab/rnd/hotspot/tests/test-
> clients/cpp-inlining/main.cpp:34
> 34 {
> (gdb) disassemble __hypot_finite
> Dump of assembler code for function __hypot_finite:
> 0x00007ffff7c96660 <+0>: endbr64
> 0x00007ffff7c96664 <+4>: push %r13
> 0x00007ffff7c96666 <+6>: push %r12
> 0x00007ffff7c96668 <+8>: push %rbp
> 0x00007ffff7c96669 <+9>: movq %xmm1,%rbp
> 0x00007ffff7c9666e <+14>: push %rbx
> 0x00007ffff7c9666f <+15>: movq %xmm0,%rbx
> 0x00007ffff7c96674 <+20>: shr $0x20,%rbp
> 0x00007ffff7c96678 <+24>: shr $0x20,%rbx
> 0x00007ffff7c9667c <+28>: and $0x7fffffff,%ebp
> 0x00007ffff7c96682 <+34>: and $0x7fffffff,%ebx
> 0x00007ffff7c96688 <+40>: sub $0x28,%rsp
> 0x00007ffff7c9668c <+44>: cmp %ebp,%ebx
> 0x00007ffff7c9668e <+46>: jge 0x7ffff7c966a2 <__hypot_finite+66>
> 0x00007ffff7c96690 <+48>: mov %ebx,%eax
> 0x00007ffff7c96692 <+50>: movapd %xmm0,%xmm2
> 0x00007ffff7c96696 <+54>: movapd %xmm1,%xmm0
> 0x00007ffff7c9669a <+58>: mov %ebp,%ebx
> 0x00007ffff7c9669c <+60>: movapd %xmm2,%xmm1
> 0x00007ffff7c966a0 <+64>: mov %eax,%ebp
> 0x00007ffff7c966a2 <+66>: movq %xmm0,%r13
> 0x00007ffff7c966a7 <+71>: mov %rbx,%rax
> 0x00007ffff7c966aa <+74>: movq %xmm1,%r12
> --Type <RET> for more, q to quit, c to continue without paging--q
> Quit
> (gdb) break *0x00007ffff7c96688
> Breakpoint 2 at 0x7ffff7c96688
> (gdb) cont
> Continuing.
>
> Breakpoint 2, 0x00007ffff7c96688 in __hypot_finite () from
> /usr/lib/libm.so.6 (gdb) bt
> #0 0x00007ffff7c96688 in __hypot_finite () from /usr/lib/libm.so.6
> #1 0x00007ffff7c7caf9 in hypotf64 () from /usr/lib/libm.so.6
> #2 0x0000555555555129 in std::__complex_abs (__z=<optimized out>) at /usr/
> include/c++/8.2.1/complex:1362
> #3 std::abs<double> (__z=...) at /usr/include/c++/8.2.1/complex:618
> #4 std::_Norm_helper<true>::_S_do_it<double> (__z=...) at /usr/include/c++/
> 8.2.1/complex:675
> #5 std::norm<double> (__z=...) at /usr/include/c++/8.2.1/complex:685
> #6 main () at
> /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-
> inlining/main.cpp:39
> ```
>
> But note how __hypot_finite+0x28 actually points at an instruction that
> modifies the stack pointer by 40, just like I did manually to "fix"
> unwinding for this sample (see my initial email):
>
> 0x00007ffff7c96688 <+40>: sub $0x28,%rsp
>
> So we are still in the function prologue, and can't properly unwind from
> there apparently! Any suggestions on how this should be handled? Can we
> figure out the size of the function prologue somehow and manually fix the
> stack pointer then?

After more digging, it turns out that I've apparently chased a red herring.
I'm running archlinux which isn't shipping debug symbols for libm. Without
those, we are apparently out of luck to unwind from the function prologue.

I still wonder whether we could catch this somehow and handle it ourselves. I
wonder what GDB does to unwind here, despite the lack of debug information in
libm...

Cheers

--
Milian Wolff | milian.wolff@xxxxxxxx | Senior Software Engineer
KDAB (Deutschland) GmbH, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt, C++ and OpenGL Experts

Attachment: smime.p7s
Description: S/MIME cryptographic signature