Re: [PACTH v9] stacktrace: Eliminate task stack trace duplication

From: Robert Foss
Date: Wed Aug 17 2016 - 16:35:02 EST

Next message: Catalin Marinas: "Re: [lkp] [mm] 122708b1b9: PANIC: early exception"
Previous message: Konrad Rzeszutek Wilk: "XSA 154 and ISA region (640K -> 1MB) WB cache instead of UC"
In reply to: Josh Poimboeuf: "Re: [PACTH v9] stacktrace: Eliminate task stack trace duplication"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 2016-08-17 03:57 PM, Josh Poimboeuf wrote:

On Wed, Aug 17, 2016 at 02:41:30PM -0400, Robert Foss wrote:

On 2016-08-17 02:06 PM, Josh Poimboeuf wrote:

On Wed, Aug 17, 2016 at 01:40:33PM -0400, Robert Foss wrote:

On 2016-08-17 12:58 PM, Josh Poimboeuf wrote:

On Wed, Aug 17, 2016 at 09:51:45AM -0400, Robert Foss wrote:

On 2016-08-17 02:50 AM, Peter Zijlstra wrote:

On Tue, Aug 16, 2016 at 07:12:36PM -0400, robert.foss@xxxxxxxxxxxxx wrote:

From: Ying Han <yinghan@xxxxxxxxxx>

The problem with small dmesg ring buffer like 512k is that only limited number
of task traces will be logged. Sometimes we lose important information only
because of too many duplicated stack traces. This problem occurs when dumping
lots of stacks in a single operation, such as sysrq-T.

This patch tries to reduce the duplication of task stack trace in the dump
message by hashing the task stack. The hashtable is a 32k pre-allocated buffer
during bootup. Each time if we find the identical task trace in the task stack,
we dump only the pid of the task which has the task trace dumped. So it is easy
to back track to the full stack with the pid.

When we do the hashing, we eliminate garbage entries from stack traces. Those
entries are still being printed in the dump to provide more debugging
informations.

[ 53.510162] kworker/0:0 S ffffffff8161d820 0 4 2 0x00000000
[ 53.517237] ffff88027547de60 0000000000000046 ffffffff812ab840 0000000000000000
[ 53.524663] ffff880275460080 ffff88027547dfd8 ffff88027547dfd8 ffff88027547dfd8
[ 53.532092] ffffffff81813020 ffff880275460080 0000000000000000 ffff8808758670c0
[ 53.539521] Call Trace:
[ 53.541974] [<ffffffff812ab840>] ? cfq_init_queue+0x350/0x350
[ 53.547791] [<ffffffff81524d49>] schedule+0x29/0x70
[ 53.552761] [<ffffffff810945a3>] worker_thread+0x233/0x380
[ 53.558318] [<ffffffff81094370>] ? manage_workers.isra.28+0x230/0x230
[ 53.564839] [<ffffffff81099a73>] kthread+0x93/0xa0
[ 53.569714] [<ffffffff8152e6d4>] kernel_thread_helper+0x4/0x10
[ 53.575628] [<ffffffff810999e0>] ? kthread_worker_fn+0x140/0x140
[ 53.581714] [<ffffffff8152e6d0>] ? gs_change+0xb/0xb
[ 53.586762] kworker/u:0 S ffffffff8161d820 0 5 2 0x00000000
[ 53.593858] ffff88027547fe60 0000000000000046 ffffffffa005cc70 0000000000000000
[ 53.601307] ffff8802754627d0 ffff88027547ffd8 ffff88027547ffd8 ffff88027547ffd8
[ 53.608788] ffffffff81813020 ffff8802754627d0 0000000000011fc0 ffff8804758670c0
[ 53.616232] Call Trace:
[ 53.618676] <Same stack as pid 4>

You might want to wait a bit and have a look at this:

https://lkml.kernel.org/r/cover.1471011425.git.jpoimboe@xxxxxxxxxx

I'll have a look through that series!
Thanks!

Yeah, those patches replace dump_trace() with a new unwinder interface,
so if they get merged, this will need to be rewritten a little bit.

As for the patch itself, I'm not crazy about how it pushes the decision
of whether to print the stack of a given task down to the stack dump
code in show_trace_log_lvl().

I think I'd prefer to instead change the implementation of sysrq-T so
that it uses save_stack_trace_tsk(), and then uses
printk_stack_address() to print the stack. Then the stack dump code in
dumpstack*.c would be completely unaffected.

Or, even better, instead of sysrq-T, can the user just read
/proc/*/{comm,stack} and /proc/sched_debug? That gives basically the
same information without flooding printk.

Thanks for the feedback Josh!

I think the save_stack_trace_tsk() changes you are suggesting sound very
reasonable. However requiring the user to read /proc/*/{comm,stack} sort of
circumnavigates the goal of the patch, which is to reduce clutter in the
default stack traces that one encounters.

Yes, but maybe the hashing and deduplication of stacks could also be
done in user space?

What would that look like in practice? A user space daemon running in the
background?

The idea was for the user of sysrq-T to instead get the stack
information from /proc. Then they wouldn't have problems with the
printk buffer wrapping. If after doing that, the dedupe is still needed
for some reason, the application which does the reads from /proc could
also do the dedupe.

Ah! Now I understand. Thanks for spelling it out :)

The dedup would be helpful for as long as there is duplication in stack traces. And the dedup would be helpful to any application. Making every application that interacts with stack traces implement deduplication seems to me like going about things the wrong way,

But given that stack traces currently are being re-worked it may not be helpful after that series has landed.

Or am I misunderstanding the problem? If so, it might help to spell out
the problem you're trying to solve with more specifics and explain why
it can't be solved in user space.

Nope, you are on point.

Next message: Catalin Marinas: "Re: [lkp] [mm] 122708b1b9: PANIC: early exception"
Previous message: Konrad Rzeszutek Wilk: "XSA 154 and ISA region (640K -> 1MB) WB cache instead of UC"
In reply to: Josh Poimboeuf: "Re: [PACTH v9] stacktrace: Eliminate task stack trace duplication"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]