Re: [patch 2/2] Documentation/process: Add tip tree handbook

From: Ingo Molnar
Date: Thu Nov 08 2018 - 02:19:36 EST



* Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:

> +Backtraces in changelogs
> +^^^^^^^^^^^^^^^^^^^^^^^^
> +
> +Backtraces can be useful to document the call chain which led to a
> +problem. Though not all back traces are really valuable because the call
> +chain is unique and obvious, e.g. in early boot code. Just copying the full
> +dmesg output is adding a lot of distracting information like timestamps,
> +module lists, register and stack dumps etc.
> +
> +Reducing the backtrace to the real relevant data helps to concentrate on
> +the issue and not being distracted by destilling the relevant information
> +out of the dump. Here is an example of a well trimmed backtrace::
> +
> + unchecked MSR access error: WRMSR to 0xd51 (tried to write 0x0000
> + 000000000064) at rIP: 0xffffffffae059994 (native_write_msr+0x4/0x20)
> + Call Trace:
> + mba_wrmsr+0x41/0x80
> + update_domains+0x125/0x130
> + rdtgroup_mkdir+0x270/0x500

Yeah, so I frequently simplify such backtraces even more, i.e.:

> + unchecked MSR access error: WRMSR to 0xd51 (tried to write 0x0000 000000000064) at rIP: 0xffffffffae059994 (native_write_msr())
> +
> + Call Trace:
> + mba_wrmsr()
> + update_domains()
> + rdtgroup_mkdir()

Note how the actual MSR contents and the attempted operation's parameters
are important, the actual hexadecimal offsets of the function call
backtrace are not. They are useful when trying to do fuzzy version
matching and in the occasional case when there's a question about which
exact call chain it is - but those are 0.01% cases really.

See for example this recent commit:

commit e4a02ed2aaf447fa849e3254bfdb3b9b01e1e520 (origin/locking-urgent-for-linus, locking-urgent-for-linus)
Author: Guenter Roeck <linux@xxxxxxxxxxxx>
Date: Tue Oct 2 14:48:49 2018 -0700

locking/ww_mutex: Fix runtime warning in the WW mutex selftest

If CONFIG_WW_MUTEX_SELFTEST=y is enabled, booting an image
in an arm64 virtual machine results in the following
traceback if 8 CPUs are enabled:

DEBUG_LOCKS_WARN_ON(__owner_task(owner) != current)
WARNING: CPU: 2 PID: 537 at kernel/locking/mutex.c:1033 __mutex_unlock_slowpath+0x1a8/0x2e0
...
Call trace:
__mutex_unlock_slowpath()
ww_mutex_unlock()
test_cycle_work()
process_one_work()
worker_thread()
kthread()
ret_from_fork()

If requesting b_mutex fails with -EDEADLK, the error variable
is reassigned to the return value from calling ww_mutex_lock
on a_mutex again. If this call fails, a_mutex is not locked.
It is, however, unconditionally unlocked subsequently, causing
the reported warning. Fix the problem by using two error variables.

With this change, the selftest still fails as follows:

cyclic deadlock not resolved, ret[7/8] = -35

However, the traceback is gone.

The C-style writing of the backtrace is more readable than listing the
offsets.

Thanks,

Ingo