Re: [BUG] Core2 cpu triggers hard lockup with perf test

From: Jiri Olsa
Date: Tue Mar 01 2016 - 01:55:30 EST


On Mon, Feb 29, 2016 at 10:12:08PM +0000, Liang, Kan wrote:
>
>
> >
> > I can't find what's special about Core2 CPU PEBS setup, it seems that oher
> > CPUs are ok (tried on ivb/snb/hsw).
> >
> > reverting the 156174999dd1 fixed the issue for me
> >
> > ideas? thanks,
>
> I think we may just disable the multiple pebs support for core2
> as the patch below.
>
> In SDM "18.4.4.4 Re-configuring PEBS Facilities" it mentioned that
> a quiescent period is needed between stopping the prior event counting and
> setting up a new PEBS event when software needs to reconfigure PEBS facilities.
> The quiescent period is to allow any latent residual PEBS records to complete
> its capture at their previously specified buffer address
> That requirement only can be found in Core Microarchitecture.
>
> I think it may implies that there is some observed delay in writing PEBS buffer.
> So if perf record precise hw event with very small period, the slow PEBS writing
> may lockup the CPU. If so, I think disabling the multiple pebs should be a good
> way.
>
>

hi,
got same lockup with the patch:


[ 167.486514] Kernel panic - not syncing: Hard LOCKUP
[ 167.486514] CPU: 3 PID: 10656 Comm: perf Not tainted 4.5.0-rc4+ #7
[ 167.486514] Hardware name: System Manufacturer To Be Filled By O.E.M. Product Name To Be Filled By O.E.M./BB Name To be filled by O.E.M., BIOS CGELIA55.86
[ 167.486514] 0000000000000086 0000000084986595 ffff88007d985b28 ffffffff8133983f
[ 167.486514] ffffffff8191b723 0000000000000000 ffff88007d985ba8 ffffffff811872d1
[ 167.486514] ffff880000000008 ffff88007d985bb8 ffff88007d985b58 0000000084986595
[ 167.486514] Call Trace:
[ 167.486514] <NMI> [<ffffffff8133983f>] dump_stack+0x63/0x84
[ 167.486514] [<ffffffff811872d1>] panic+0xe2/0x229
[ 167.486514] [<ffffffff8113dc30>] watchdog_overflow_callback+0x100/0x100
[ 167.486514] [<ffffffff8117ee18>] __perf_event_overflow+0x88/0x1c0
[ 167.486514] [<ffffffff8117f994>] perf_event_overflow+0x14/0x20
[ 167.486514] [<ffffffff8100c42f>] intel_pmu_handle_irq+0x1df/0x460
[ 167.486514] [<ffffffff81052e3f>] ? native_apic_wait_icr_idle+0x1f/0x30
[ 167.486514] [<ffffffff81032cc5>] ? arch_irq_work_raise+0x35/0x40
[ 167.486514] [<ffffffff8100563d>] perf_event_nmi_handler+0x2d/0x50
[ 167.486514] [<ffffffff810313a2>] nmi_handle+0x62/0xf0
[ 167.486514] [<ffffffff81031a06>] default_do_nmi+0xf6/0x120
[ 167.486514] [<ffffffff81031b11>] do_nmi+0xe1/0x150
[ 167.486514] [<ffffffff816ad5f1>] end_repeat_nmi+0x1a/0x1e
[ 167.486514] [<ffffffff81063a16>] ? native_write_msr_safe+0x6/0x30
[ 167.486514] [<ffffffff81063a16>] ? native_write_msr_safe+0x6/0x30
[ 167.486514] [<ffffffff81063a16>] ? native_write_msr_safe+0x6/0x30
[ 167.486514] <<EOE>> [<ffffffff8100b5cd>] ? __intel_pmu_enable_all.isra.12+0x4d/0xb0
[ 167.486514] [<ffffffff8100b640>] intel_pmu_enable_all+0x10/0x20
[ 167.486514] [<ffffffff810072c3>] x86_pmu_enable+0x263/0x2f0
[ 167.486514] [<ffffffff81179a72>] perf_pmu_enable+0x22/0x30
[ 167.486514] [<ffffffff8117a721>] ctx_resched+0x51/0x60
[ 167.486514] [<ffffffff8117b2ff>] perf_event_exec+0x10f/0x140
[ 167.486514] [<ffffffff8121949d>] setup_new_exec+0x6d/0x1a0
[ 167.486514] [<ffffffff8126b58a>] load_elf_binary+0x37a/0x10e0
[ 167.486514] [<ffffffff811b77f2>] ? get_user_pages+0x52/0x60
[ 167.486514] [<ffffffff8121779e>] search_binary_handler+0x9e/0x1e0
[ 167.486514] [<ffffffff812191f4>] do_execveat_common.isra.34+0x554/0x6e0
[ 167.486514] [<ffffffff8121960a>] SyS_execve+0x3a/0x50
[ 167.486514] [<ffffffff816ab195>] stub_execve+0x5/0x5
[ 167.486514] [<ffffffff816aaeee>] ? entry_SYSCALL_64_fastpath+0x12/0x71


jirka