How do I get good backtraces from dump_stack()?

From: Richard Yao
Date: Fri Oct 25 2013 - 14:32:43 EST


ZFSOnLinux does memory allocations using a wrapper that invokes
dump_stack() whenever GFP_KERNEL is used in a performance-critical path
(e.g. one that affects swap).

Unfortunately, dump_stack() seems to always produce nonsensical
backtraces. Here is an example that a Debian user sent me yesterday:

[ 4100.817875] Pid: 1209, comm: txg_sync Tainted: P D O
3.2.0-4-amd64 #1 Debian 3.2.51-1
[ 4100.822370] Call Trace:
[ 4100.826762] [<ffffffffa033355e>] ? spl_debug_dumpstack+0x24/0x2a [spl]
[ 4100.831273] [<ffffffffa0338a09>] ? sanitize_flags+0x6e/0x7c [spl]
[ 4100.835729] [<ffffffffa0338c9d>] ? kmalloc_nofail+0x1f/0x3d [spl]
[ 4100.840192] [<ffffffffa0338e55>] ? kmem_alloc_debug+0x164/0x2d0 [spl]
[ 4100.844599] [<ffffffff810ec6ff>] ? __kmalloc+0x100/0x112
[ 4100.849038] [<ffffffffa02e45b1>] ? nv_mem_zalloc.isra.12+0xa/0x21
[znvpair]
[ 4100.853468] [<ffffffffa02e5f90>] ? nvlist_add_common+0x113/0x2f9
[znvpair]
[ 4100.857954] [<ffffffffa02e61af>] ?
nvlist_copy_pairs.isra.29+0x39/0x4b [znvpair]
[ 4100.862388] [<ffffffffa02e5e5e>] ?
nvlist_copy_embedded.isra.31+0x47/0x66 [znvpair]
[ 4100.866878] [<ffffffffa02e60c9>] ? nvlist_add_common+0x24c/0x2f9
[znvpair]
[ 4100.871314] [<ffffffffa02e74ba>] ?
fnvlist_add_nvlist_array+0x19/0x6b [znvpair]
[ 4100.875839] [<ffffffffa049cb1e>] ? vdev_config_generate+0x330/0x49a
[zfs]
[ 4100.880298] [<ffffffffa0338e55>] ? kmem_alloc_debug+0x164/0x2d0 [spl]
[ 4100.884802] [<ffffffffa0338ca9>] ? kmalloc_nofail+0x2b/0x3d [spl]
[ 4100.889241] [<ffffffff810ec6ff>] ? __kmalloc+0x100/0x112
[ 4100.893718] [<ffffffffa02e5dd1>] ? nvlist_remove_all+0x3d/0x83 [znvpair]
[ 4100.898334] [<ffffffffa02e615c>] ? nvlist_add_common+0x2df/0x2f9
[znvpair]
[ 4100.902788] [<ffffffffa0336d4b>] ? kmem_free_debug+0xc5/0x10d [spl]
[ 4100.907308] [<ffffffff810eb882>] ? kfree+0x5b/0x6c
[ 4100.911747] [<ffffffffa0336d4b>] ? kmem_free_debug+0xc5/0x10d [spl]
[ 4100.916249] [<ffffffffa02e62e4>] ? nvlist_add_uint64+0x1d/0x22 [znvpair]
[ 4100.920747] [<ffffffffa048ff5c>] ? spa_config_generate+0x4b0/0x701 [zfs]
[ 4100.925297] [<ffffffffa0488a1e>] ? spa_sync+0x430/0x942 [zfs]
[ 4100.929831] [<ffffffff81066733>] ? ktime_get_ts+0x5c/0x82
[ 4100.934362] [<ffffffffa0496113>] ? txg_sync_thread+0x2cd/0x4be [zfs]
[ 4100.938864] [<ffffffffa0495e46>] ? txg_thread_wait.isra.2+0x23/0x23
[zfs]
[ 4100.943381] [<ffffffffa033a1bc>] ? thread_generic_wrapper+0x6a/0x75
[spl]
[ 4100.947785] [<ffffffffa033a152>] ? __thread_create+0x2be/0x2be [spl]
[ 4100.952202] [<ffffffff8105f631>] ? kthread+0x76/0x7e
[ 4100.956552] [<ffffffff81356374>] ? kernel_thread_helper+0x4/0x10
[ 4100.960977] [<ffffffff8105f5bb>] ? kthread_worker_fn+0x139/0x139
[ 4100.965362] [<ffffffff81356370>] ? gs_change+0x13/0x13

Here, the stack between kmem_alloc_debug and spa_sync makes no sense. I
guess part of this has to do with the use of static functions, but it is
not clear to me when a static function causes problems.

Does anyone have any suggestions on how to make this better?

I have added the ZFSOnLinux project lead to the CC list. Neither of us
are on the mailing list, so please include both of us on CC.

P.S. It might seem odd for a Gentoo developer to tackle a report made by
a Debian user. I cannot speak for all of us, but I do what I can to
tackle bug reports involving distribution-independent issues in Gentoo
packages that I maintain. Many others do the same. I realize ZFS is not
mainline, but I sincerely hope that people will be as accommodating to
my question as I try to be with bug reports by users of other
distributions. :)

Attachment: signature.asc
Description: OpenPGP digital signature