Re: [PATCH] rcu/torture: Improve badness extraction from console logs

From: Paul E. McKenney
Date: Sat Sep 02 2023 - 15:24:47 EST


On Sat, Sep 02, 2023 at 02:28:41PM -0400, Joel Fernandes wrote:
> On Sat, Sep 2, 2023 at 9:53 AM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote:
> >
> > On Fri, Sep 01, 2023 at 02:04:24PM -0400, Joel Fernandes wrote:
> > > On Fri, Sep 1, 2023 at 10:49 AM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote:
> > > >
> > > > On Thu, Aug 31, 2023 at 01:22:56AM +0000, Joel Fernandes (Google) wrote:
> > > > > Currently console.log.diags contains an output like follows:
> > > > > [ 2457.293734] WARNING: CPU: 2 PID: 13 at kernel/rcu/tasks.h:1061 rcu_tasks_trace_pregp_step+0x4a/0x50
> > > > > [ 2457.542385] Call Trace:
> > > > >
> > > > > This is not very useful and the Call trace is desired. Improve the
> > > > > script by Extracting more lines after each grep match.
> > > > >
> > > > > With this the above becomes:
> > > > >
> > > > > Issue 1:
> > > > > [ 2457.293734] WARNING: CPU: 2 PID: 13 at kernel/rcu/tasks.h:1061 rcu_tasks_trace_pregp_step+0x4a/0x50
> > > > > [ 2457.326661] Modules linked in:
> > > > > [ 2457.334818] CPU: 2 PID: 13 Comm: rcu_tasks_trace Not tainted 5.15.128+ #381
> > > > > [ 2457.349782] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> > > > > [ 2457.373309] RIP: 0010:rcu_tasks_trace_pregp_step+0x4a/0x50
> > > > > [ 2457.386691] Code: 48 63 c7 48 8b 0
> > > > > [ 2457.421803] RSP: 0018:ffffa80fc0073e40 EFLAGS: 00010202
> > > > > [ 2457.431940] RAX: ffff8db91f580000 RBX: 000000000001b900 RCX: 0000000000000003
> > > > > [ 2457.443206] RDX: 0000000000000008 RSI: ffffffffac6bebd8 RDI: 0000000000000003
> > > > > [ 2457.454428] RBP: 0000000000000004 R08: 0000000000000001 R09: 0000000000000001
> > > > > [ 2457.465668] R10: 0000000000000000 R11: 00000000ffffffff R12: ffff8db902d87f40
> > > > > [ 2457.476971] R13: ffffffffac556620 R14: ffffffffac556630 R15: ffff8db9011a3200
> > > > > [ 2457.488251] FS: 0000000000000000(0000) GS:ffff8db91f500000(0000) knlGS:0000000000000000
> > > > > [ 2457.500834] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > [ 2457.509602] CR2: 0000000000000000 CR3: 0000000002cbc000 CR4: 00000000000006e0
> > > > > [ 2457.520378] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > > > [ 2457.531440] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > > > [ 2457.542385] Call Trace:
> > > > > [ 2457.546756] <TASK>
> > > > > [ 2457.550349] ? __warn+0x7b/0x100
> > > > > [ 2457.567214] ? rcu_tasks_trace_pregp_step+0x4a/0x50
> > > > > -------------------------------------
> > > > > Issue 2:
> > > > > [ 2457.542385] Call Trace:
> > > > > [ 2457.546756] <TASK>
> > > > > [ 2457.550349] ? __warn+0x7b/0x100
> > > > > [ 2457.567214] ? rcu_tasks_trace_pregp_step+0x4a/0x50
> > > > > [ 2457.574948] ? report_bug+0x99/0xc0
> > > > > [ 2457.593824] ? handle_bug+0x3c/0x70
> > > > > [ 2457.599534] ? exc_invalid_op+0x13/0x60
> > > > > [ 2457.625729] ? asm_exc_invalid_op+0x16/0x20
> > > > > [ 2457.632249] ? rcu_tasks_trace_pregp_step+0x4a/0x50
> > > > > [ 2457.660010] rcu_tasks_wait_gp+0x54/0x360
> > > > > [ 2457.677761] ? _raw_spin_unlock_irqrestore+0x2b/0x60
> > > > > [ 2457.705658] rcu_tasks_kthread+0x114/0x200
> > > > > [ 2457.712450] ? wait_woken+0x70/0x70
> > > > > [ 2457.727283] ? synchronize_rcu_tasks_rude+0x10/0x10
> > > > > [ 2457.746221] kthread+0x130/0x160
> > > > > [ 2457.751487] ? set_kthread_struct+0x40/0x40
> > > > > [ 2457.758178] ret_from_fork+0x22/0x30
> > > > > [ 2457.763909] </TASK>
> > > > > [ 2457.767546] irq event stamp: 29544441
> > > > > [ 2457.773344] hardirqs last enabled at (29544451): [<ffffffffaace6cbd>] __up_console_sem+0x4d/0x60
> > > > > [ 2457.786967] hardirqs last disabled at (29544460): [<ffffffffaace6ca2>] __up_console_sem+0x32/0x60
> > > > > -------------------------------------
> > > > >
> > > > > Signed-off-by: Joel Fernandes (Google) <joel@xxxxxxxxxxxxxxxxx>
> > > >
> > > > Nice!!!
> > > >
> > > > One request, though. If I am reading this correctly, if a console.log
> > > > file is filled with splats, then console.log.diags will replicate most
> > > > of console.log. Could you please limit something, perhaps the number
> > > > of lines, the number of splats, or some such?
> > > >
> > > > Just for context, my use case for the current setup is to run
> > > > kvm-find-errors.sh, which puts each console.log.diags/console.log pair
> > > > into my editor. I copy an appropriate string from the console.log.diags,
> > > > and then paste that string to search for it in the corresponding
> > > > console.log file, which is what is edited next.
> > >
> > > I could add a summary on the top for your usecase (which keeps the
> > > original output), and then the more detailed splats below it. Would
> > > that work?
> >
> > That summary would normally be at the top of the file anyway, right?
>
> Not after this patch. I removed the summary in my patch from the
> console .diags in favor of the detailed ones. I can add it back.

My point is rather that the first line of the existing summary would
likely appear in the first few lines of the new output.

> > Either way, what I don't want is many megabytes of splats from console.log
> > duplicated into console.log.diags, especially in those (admittedly sad)
> > cases where the console.log file is the largest file in the results
> > directory. Are you really going to be reading carefully after (say)
> > the fifth splat? ;-)
>
> I could limit it to just the first 5, with the existing unlimited
> summary at the top.

How about limiting the existing summary to (say) 10 lines? I do not
recall every caring about more than that. The only reason that I did
not ever limit it myself is that the summary is unlikely to be more than
a few percent of the full console.log.

> Let me know if that works or does not work for you. :-)

I do believe that we might be converging. ;-)

Thanx, Paul

> thanks,
>
> - Joel
>
>
> >
> > Thanx, Paul
> >
> > > So the whole file would look something like:
> > > =====x8==========
> > > Summary of console.log lines showing issues:
> > > Line 300: [ 2457.293734] WARNING: CPU: 2 PID: 13 at
> > > kernel/rcu/tasks.h:1061 rcu_tasks_trace_pregp_step+0x4a/0x50
> > > Line 400:[ 2457.542385] Call Trace:
> > >
> > > Detailed list of issues:
> > > 1. Issue on line 300:
> > > [ 2457.293734] WARNING: CPU: 2 PID: 13 at kernel/rcu/tasks.h:1061
> > > rcu_tasks_trace_pregp_step+0x4a/0x50
> > > [ 2457.326661] Modules linked in:
> > > [ 2457.334818] CPU: 2 PID: 13 Comm: rcu_tasks_trace Not tainted 5.15.128+ #381
> > > [ 2457.349782] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
> > > BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> > > [ 2457.373309] RIP: 0010:rcu_tasks_trace_pregp_step+0x4a/0x50
> > > [ 2457.386691] Code: 48 63 c7 48 8b 0
> > > [ 2457.421803] RSP: 0018:ffffa80fc0073e40 EFLAGS: 00010202
> > > [ 2457.431940] RAX: ffff8db91f580000 RBX: 000000000001b900 RCX: 0000000000000003
> > > [ 2457.443206] RDX: 0000000000000008 RSI: ffffffffac6bebd8 RDI: 0000000000000003
> > > [ 2457.454428] RBP: 0000000000000004 R08: 0000000000000001 R09: 0000000000000001
> > > [ 2457.465668] R10: 0000000000000000 R11: 00000000ffffffff R12: ffff8db902d87f40
> > > [ 2457.476971] R13: ffffffffac556620 R14: ffffffffac556630 R15: ffff8db9011a3200
> > > [ 2457.488251] FS: 0000000000000000(0000) GS:ffff8db91f500000(0000)
> > > knlGS:0000000000000000
> > > [ 2457.500834] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [ 2457.509602] CR2: 0000000000000000 CR3: 0000000002cbc000 CR4: 00000000000006e0
> > > [ 2457.520378] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > [ 2457.531440] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > [ 2457.542385] Call Trace:
> > > [ 2457.546756] <TASK>
> > > [ 2457.550349] ? __warn+0x7b/0x100
> > > [ 2457.567214] ? rcu_tasks_trace_pregp_step+0x4a/0x50
> > > -------------------------------------
> > > 2. Issue on Line 500:
> > > [ 2457.542385] Call Trace:
> > > [ 2457.546756] <TASK>
> > > [ 2457.550349] ? __warn+0x7b/0x100
> > > [ 2457.567214] ? rcu_tasks_trace_pregp_step+0x4a/0x50
> > > [ 2457.574948] ? report_bug+0x99/0xc0
> > > [ 2457.593824] ? handle_bug+0x3c/0x70
> > > [ 2457.599534] ? exc_invalid_op+0x13/0x60
> > > [ 2457.625729] ? asm_exc_invalid_op+0x16/0x20
> > > [ 2457.632249] ? rcu_tasks_trace_pregp_step+0x4a/0x50
> > > [ 2457.660010] rcu_tasks_wait_gp+0x54/0x360
> > > [ 2457.677761] ? _raw_spin_unlock_irqrestore+0x2b/0x60
> > > [ 2457.705658] rcu_tasks_kthread+0x114/0x200
> > > [ 2457.712450] ? wait_woken+0x70/0x70
> > > [ 2457.727283] ? synchronize_rcu_tasks_rude+0x10/0x10
> > > [ 2457.746221] kthread+0x130/0x160
> > > [ 2457.751487] ? set_kthread_struct+0x40/0x40
> > > [ 2457.758178] ret_from_fork+0x22/0x30
> > > [ 2457.763909] </TASK>
> > > [ 2457.767546] irq event stamp: 29544441
> > > [ 2457.773344] hardirqs last enabled at (29544451):
> > > [<ffffffffaace6cbd>] __up_console_sem+0x4d/0x60
> > > [ 2457.786967] hardirqs last disabled at (29544460):
> > > [<ffffffffaace6ca2>] __up_console_sem+0x32/0x60
> > > -------------------------------------
> > > =====x8==========>