Re: [PACTH v9] stacktrace: Eliminate task stack trace duplication

From: Robert Foss
Date: Wed Aug 17 2016 - 09:52:03 EST




On 2016-08-17 02:50 AM, Peter Zijlstra wrote:
On Tue, Aug 16, 2016 at 07:12:36PM -0400, robert.foss@xxxxxxxxxxxxx wrote:
From: Ying Han <yinghan@xxxxxxxxxx>

The problem with small dmesg ring buffer like 512k is that only limited number
of task traces will be logged. Sometimes we lose important information only
because of too many duplicated stack traces. This problem occurs when dumping
lots of stacks in a single operation, such as sysrq-T.

This patch tries to reduce the duplication of task stack trace in the dump
message by hashing the task stack. The hashtable is a 32k pre-allocated buffer
during bootup. Each time if we find the identical task trace in the task stack,
we dump only the pid of the task which has the task trace dumped. So it is easy
to back track to the full stack with the pid.

When we do the hashing, we eliminate garbage entries from stack traces. Those
entries are still being printed in the dump to provide more debugging
informations.

[ 53.510162] kworker/0:0 S ffffffff8161d820 0 4 2 0x00000000
[ 53.517237] ffff88027547de60 0000000000000046 ffffffff812ab840 0000000000000000
[ 53.524663] ffff880275460080 ffff88027547dfd8 ffff88027547dfd8 ffff88027547dfd8
[ 53.532092] ffffffff81813020 ffff880275460080 0000000000000000 ffff8808758670c0
[ 53.539521] Call Trace:
[ 53.541974] [<ffffffff812ab840>] ? cfq_init_queue+0x350/0x350
[ 53.547791] [<ffffffff81524d49>] schedule+0x29/0x70
[ 53.552761] [<ffffffff810945a3>] worker_thread+0x233/0x380
[ 53.558318] [<ffffffff81094370>] ? manage_workers.isra.28+0x230/0x230
[ 53.564839] [<ffffffff81099a73>] kthread+0x93/0xa0
[ 53.569714] [<ffffffff8152e6d4>] kernel_thread_helper+0x4/0x10
[ 53.575628] [<ffffffff810999e0>] ? kthread_worker_fn+0x140/0x140
[ 53.581714] [<ffffffff8152e6d0>] ? gs_change+0xb/0xb
[ 53.586762] kworker/u:0 S ffffffff8161d820 0 5 2 0x00000000
[ 53.593858] ffff88027547fe60 0000000000000046 ffffffffa005cc70 0000000000000000
[ 53.601307] ffff8802754627d0 ffff88027547ffd8 ffff88027547ffd8 ffff88027547ffd8
[ 53.608788] ffffffff81813020 ffff8802754627d0 0000000000011fc0 ffff8804758670c0
[ 53.616232] Call Trace:
[ 53.618676] <Same stack as pid 4>


You might want to wait a bit and have a look at this:

https://lkml.kernel.org/r/cover.1471011425.git.jpoimboe@xxxxxxxxxx


I'll have a look through that series!
Thanks!