Re: [tracing] WARNING: at kernel/lockdep.c:3545 check_flags()

From: Steven Rostedt
Date: Tue Oct 23 2012 - 10:51:04 EST


On Thu, 2012-10-18 at 19:41 +0800, Fengguang Wu wrote:
> Hi Steven,

Hi Wu,

I was out of town and I don't read my redhat email address while
traveling.

>
> This warning occur from time to time in my tests:
>
> [ 593.080133] Testing tracer function:
> [ 593.149895] ------------[ cut here ]------------
> [ 593.149895] WARNING: at /c/kernel-tests/src/stable/kernel/lockdep.c:3545 check_flags+0x1c8/0x1d0()

This is:

if (!hardirq_count()) {
if (softirq_count()) {
/* like the above, but with softirqs */
DEBUG_LOCKS_WARN_ON(current->softirqs_enabled);
} else {

Which means that some how softirqs are disabled but lockdep thinks they
are enabled.

> [ 593.149895] Modules linked in:
> [ 593.149895] Pid: 1, comm: swapper/0 Not tainted 3.7.0-rc1-12129-gee2dd3c #3
> [ 593.149895] Call Trace:
> [ 593.149895] <IRQ> [<ffffffff810734cf>] warn_slowpath_common+0x8f/0xc0
> [ 593.149895] [<ffffffff8107351a>] warn_slowpath_null+0x1a/0x20
> [ 593.149895] [<ffffffff810fac08>] check_flags+0x1c8/0x1d0
> [ 593.149895] [<ffffffff810fbfed>] lock_is_held+0x4d/0xa0
> [ 593.149895] [<ffffffff8156d570>] ? notify_die+0x30/0x30
> [ 593.149895] [<ffffffff81081b13>] ? __local_bh_enable+0x1d3/0x230
> [ 593.149895] [<ffffffff81133cdf>] rcu_read_lock_held+0x7f/0x90
> [ 593.149895] [<ffffffff81134208>] ftrace_ops_no_ops+0x188/0x190
> [ 593.149895] [<ffffffff81573872>] trace+0x55/0x93
> [ 593.149895] [<ffffffff81573872>] ? trace+0x55/0x93
> [ 593.149895] [<ffffffff81566fc9>] ? _raw_spin_unlock_irq+0x99/0xc0
> [ 593.149895] [<ffffffff8156d575>] ? sub_preempt_count+0x5/0x210
> [ 593.149895] [<ffffffff8156d575>] ? sub_preempt_count+0x5/0x210
> [ 593.149895] [<ffffffff81081b13>] ? __local_bh_enable+0x1d3/0x230

This makes sense, as function tracing of softirqs sometimes causes this
bug. It's a race where a function is traced inside enabling softirqs and
the tracer's lock causes a check of softirqs after lockdep disabled it
but before it was actually disabled. I'll try to figure out how this
happens again. It has been fixed in the past before.


> [ 593.149895] [<ffffffff810827ea>] __do_softirq+0x42a/0x710
> [ 593.149895] [<ffffffff81574c7c>] call_softirq+0x1c/0x30
> [ 593.149895] [<ffffffff81028a05>] do_softirq+0x1d5/0x2a0
> [ 593.149895] [<ffffffff81082d3d>] irq_exit+0xad/0xc0
> [ 593.149895] [<ffffffff81028035>] do_IRQ+0x75/0x100
> [ 593.149895] [<ffffffff81567c72>] common_interrupt+0x72/0x72
> [ 593.149895] <EOI> [<ffffffff810fafbb>] ? mark_held_locks+0x9b/0x100
> [ 593.149895] [<ffffffff815675f3>] ? _raw_spin_unlock_irqrestore+0xd3/0x130
> [ 593.149895] [<ffffffff81307487>] __debug_object_init+0x1b7/0x580
> [ 593.149895] [<ffffffff813078c6>] debug_object_init_on_stack+0x36/0x40
> [ 593.149895] [<ffffffff8108a565>] init_timer_on_stack_key+0x35/0x80
> [ 593.149895] [<ffffffff815616fb>] schedule_timeout+0x8b/0x470
> [ 593.149895] [<ffffffff81573872>] ? trace+0x55/0x93
> [ 593.149895] [<ffffffff81562fb7>] ? __mutex_unlock_slowpath+0x247/0x380
> [ 593.149895] [<ffffffff81133487>] ? update_ftrace_function+0x17/0x150
> [ 593.149895] [<ffffffff81561afe>] schedule_timeout_uninterruptible+0x1e/0x20
> [ 593.149895] [<ffffffff8108ea60>] msleep+0x20/0x30
> [ 593.149895] [<ffffffff81145a6a>] trace_selftest_startup_function+0xda/0x2b0
> [ 593.149895] [<ffffffff81146c4c>] register_tracer+0x32c/0x430
> [ 593.149895] [<ffffffff81c3aee4>] ? init_trace_printk_function_export+0x48/0x48
> [ 593.149895] [<ffffffff81c3aef4>] init_function_trace+0x10/0x12
> [ 593.149895] [<ffffffff8100223a>] do_one_initcall+0x1aa/0x270
> [ 593.149895] [<ffffffff815313bc>] kernel_init+0x11c/0x370
> [ 593.149895] [<ffffffff81c17878>] ? do_early_param+0xc6/0xc6
> [ 593.149895] [<ffffffff810ca474>] ? schedule_tail+0x34/0x110
> [ 593.149895] [<ffffffff815312a0>] ? rest_init+0x170/0x170
> [ 593.149895] [<ffffffff815739ac>] ret_from_fork+0x7c/0xb0
> [ 593.149895] [<ffffffff815312a0>] ? rest_init+0x170/0x170
>
> And it's bisected down to an unbelievable simple commit..
>
> commit 17bb615ad4f8d2d2c0f02794d27d7f83e0009ef4
> Author: Steven Rostedt <srostedt@xxxxxxxxxx>
> Date: Mon May 23 15:27:46 2011 -0400
>
> tracing: Have event with function tracer check error return

This change is simply:

--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -1657,7 +1657,12 @@ static struct ftrace_ops trace_ops __initdata =

static __init void event_trace_self_test_with_function(void)
{
- register_ftrace_function(&trace_ops);
+ int ret;
+ ret = register_ftrace_function(&trace_ops);
+ if (WARN_ON(ret < 0)) {
+ pr_info("Failed to enable function tracer for event tests\n")
+ return;
+ }


I highly doubt that this has anything to do with the bug. The bug is a
race, and that means you can't bisect it with single runs to determine
good and bad. It can only detect bad, but just because it passes,
doesn't actually mean its good.

My search is to see what is being traced again when it shouldn't be.

-- Steve


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/