Re: WARNING: at arch/x86/kernel/cpu/perf_event.c:989

From: Stephane Eranian
Date: Mon Feb 06 2012 - 05:40:39 EST

Next message: Lars-Peter Clausen: "Re: [BUG] genirq: Race condition in ONESHOT IRQ handler disablingIRQ forever"
Previous message: Felipe Balbi: "Re: [PATCH/RFC] usb: fix renesas_usbhs to not schedule in atomiccontext"
In reply to: Eric Dumazet: "Re: WARNING: at arch/x86/kernel/cpu/perf_event.c:989"
Next in thread: Eric Dumazet: "Re: WARNING: at arch/x86/kernel/cpu/perf_event.c:989"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Mon, Feb 6, 2012 at 11:12 AM, Eric Dumazet <eric.dumazet@xxxxxxxxx> wrote:
> Le lundi 06 fÃvrier 2012 Ã 10:54 +0100, Stephane Eranian a Ãcrit :
>> On Sat, Feb 4, 2012 at 7:09 PM, Stephane Eranian <eranian@xxxxxxxxxx> wrote:
>> > I am working on it. It is hard to reproduce for me.
>> >
>> What did you run to trigger this warning? What system is this on?
>>
>> > On Sat, Feb 4, 2012 at 2:51 PM, Ingo Molnar <mingo@xxxxxxx> wrote:
>> >>
>> >> there's yet another one triggering at:
>> >>
>> >> [89214.962603] ------------[ cut here ]------------
>> >> [89214.967441] WARNING: at arch/x86/kernel/cpu/perf_event.c:995 x86_pmu_start+0x79/0xd4()
>> >> [89214.975825] Hardware name: X8DTN
>> >> [89214.979268] Modules linked in:
>> >> [89214.982560] Pid: 0, comm: swapper/6 Not tainted 3.3.0-rc2-tip+ #1
>> >> [89214.988865] Call Trace:
>> >> [89214.991533] Â<IRQ> Â[<ffffffff81065cc7>] warn_slowpath_common+0x7e/0x97
>> >> [89214.998379] Â[<ffffffff81065cf5>] warn_slowpath_null+0x15/0x17
>> >> [89215.004428] Â[<ffffffff8103f626>] x86_pmu_start+0x79/0xd4
>> >> [89215.010042] Â[<ffffffff810e30d1>] perf_adjust_freq_unthr_context.part.63+0xef/0x123
>> >> [89215.018123] Â[<ffffffff810e318c>] perf_event_task_tick+0x87/0x1c1
>> >> [89215.024463] Â[<ffffffff810a2370>] ? tick_nohz_handler+0xda/0xda
>> >> [89215.030595] Â[<ffffffff8108b819>] scheduler_tick+0xd1/0xf3
>> >> [89215.036296] Â[<ffffffff810720b0>] update_process_times+0x5e/0x6f
>> >> [89215.042512] Â[<ffffffff810a23e0>] tick_sched_timer+0x70/0x99
>> >> [89215.048387] Â[<ffffffff810823f9>] __run_hrtimer+0x8c/0x148
>> >> [89215.054087] Â[<ffffffff81082add>] hrtimer_interrupt+0xc1/0x18c
>> >>
>> >> Thanks,
>> >>
>> >> Â Â Â ÂIngo
>
> Stephane, I trigger this as well very easily on my machine, 32bit
> kernel, using the following :
>
>
> perf record -a -g hackbench 10 thread 4000
>
I tried that on my Nehalem 64-bit running 3.3.0-rc2 where I reverted
that chunck of commit 84f2b9b:

--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -986,9 +986,6 @@ static void x86_pmu_start(struct perf_event
*event, int flags)
struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
int idx = event->hw.idx;

- if (WARN_ON_ONCE(!(event->hw.state & PERF_HES_STOPPED)))
- return;
-
if (WARN_ON_ONCE(idx == -1))
return;

I have an explanation for the other two WARN_ON_ONCE() but not for that
one. Friday, I was able to track this down to a situation where from unthr
we call pmu->stop() but because the event is already marked as not active
in cpuc->active_mask, PERF_HES_STOPPED is not set, then
x86_pmu_start() complains. It happens during frequency adjustments and
not unthrottling.

This is odd because the only place where cpuc->active_mask is cleared
(for the event) is x86_pmu_stop(). So looks like we get into a situation where
cpuc->active_mask[b] == 0 && event->state != HES_STOPPED. But I don't
know where this could happen.

>
> [ 1205.338006] ------------[ cut here ]------------
> [ 1205.338028] WARNING: at arch/x86/kernel/cpu/perf_event.c:989
> x86_pmu_start+0xba/0xf0()
> [ 1205.338044] Hardware name: ProLiant BL460c G1
> [ 1205.338053] Modules linked in: xt_hashlimit af_packet tg3 bonding
> [ 1205.338076] Pid: 0, comm: swapper/5 Not tainted
> 3.3.0-rc2-00172-g23783f8 #55
> [ 1205.338090] Call Trace:
> [ 1205.338100] Â[<c0609e06>] ? printk+0x1d/0x1f
> [ 1205.338111] Â[<c022b012>] warn_slowpath_common+0x72/0xa0
> [ 1205.338123] Â[<c021072a>] ? x86_pmu_start+0xba/0xf0
> [ 1205.338134] Â[<c021072a>] ? x86_pmu_start+0xba/0xf0
> [ 1205.338145] Â[<c022b062>] warn_slowpath_null+0x22/0x30
> [ 1205.338157] Â[<c021072a>] x86_pmu_start+0xba/0xf0
> [ 1205.338170] Â[<c02aba7b>] perf_adjust_freq_unthr_context.part.75
> +0xfb/0x180
> [ 1205.338185] Â[<c02abd21>] perf_event_task_tick+0x221/0x290
> [ 1205.338199] Â[<c0258dee>] ? update_cpu_load+0xbe/0xf0
> [ 1205.338210] Â[<c0259728>] scheduler_tick+0x98/0xf0
> [ 1205.338222] Â[<c0239c6a>] update_process_times+0x5a/0x70
> [ 1205.338235] Â[<c026bb10>] tick_sched_timer+0x60/0x1f0
> [ 1205.338248] Â[<c024f1c0>] ? __remove_hrtimer+0x40/0xa0
> [ 1205.338260] Â[<c024f3f7>] __run_hrtimer+0x67/0x1e0
> [ 1205.338270] Â[<c026bab0>] ? tick_init_highres+0x20/0x20
> [ 1205.338297] Â[<c02501b0>] hrtimer_interrupt+0xe0/0x260
> [ 1205.338323] Â[<c025c207>] ? sched_clock_cpu+0xd7/0x160
> [ 1205.338350] Â[<c025c841>] ? set_next_entity+0x31/0x70
> [ 1205.338380] Â[<c0610804>] smp_apic_timer_interrupt+0x54/0x88
> [ 1205.338407] Â[<c060fcda>] apic_timer_interrupt+0x2a/0x30
> [ 1205.338439] Â[<c0209efc>] ? mwait_idle+0x7c/0x1d0
> [ 1205.338470] Â[<c0201606>] cpu_idle+0x66/0xa0
> [ 1205.338495] Â[<c060500f>] start_secondary+0x1bf/0x1c5
> [ 1205.338520] ---[ end trace 94f790d96c8679f1 ]---
> [ 1211.617012] Uhhuh. NMI received for unknown reason 21 on CPU 0.
> [ 1211.617049] Do you have a strange power saving mode enabled?
> [ 1211.617075] Dazed and confused, but trying to continue
> [ 1211.970013] Uhhuh. NMI received for unknown reason 31 on CPU 7.
> [ 1211.970050] Do you have a strange power saving mode enabled?
> [ 1211.970076] Dazed and confused, but trying to continue
> [ 1214.440012] Uhhuh. NMI received for unknown reason 21 on CPU 3.
> [ 1214.440048] Do you have a strange power saving mode enabled?
> [ 1214.440074] Dazed and confused, but trying to continue
> [ 1214.634012] Uhhuh. NMI received for unknown reason 31 on CPU 4.
> [ 1214.634047] Do you have a strange power saving mode enabled?
> [ 1214.634073] Dazed and confused, but trying to continue
> [ 1216.568016] Uhhuh. NMI received for unknown reason 31 on CPU 1.
> [ 1216.568052] Do you have a strange power saving mode enabled?
> [ 1216.568078] Dazed and confused, but trying to continue
> [ 1217.309010] Uhhuh. NMI received for unknown reason 31 on CPU 2.
> [ 1217.309044] Do you have a strange power saving mode enabled?
> [ 1217.309070] Dazed and confused, but trying to continue
>
>
>
>
> processor Â Â Â : 7
> vendor_id Â Â Â : GenuineIntel
> cpu family Â Â Â: 6
> model Â Â Â Â Â : 23
> model name Â Â Â: Intel(R) Xeon(R) CPU Â Â Â Â Â E5450 Â@ 3.00GHz
> stepping Â Â Â Â: 6
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Lars-Peter Clausen: "Re: [BUG] genirq: Race condition in ONESHOT IRQ handler disablingIRQ forever"
Previous message: Felipe Balbi: "Re: [PATCH/RFC] usb: fix renesas_usbhs to not schedule in atomiccontext"
In reply to: Eric Dumazet: "Re: WARNING: at arch/x86/kernel/cpu/perf_event.c:989"
Next in thread: Eric Dumazet: "Re: WARNING: at arch/x86/kernel/cpu/perf_event.c:989"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]