Re: [RFC][PATCH 2/3] perf: Add a bit of paranoia

From: Peter Zijlstra
Date: Mon Feb 02 2015 - 10:42:53 EST


On Mon, Feb 02, 2015 at 01:33:14AM -0500, Vince Weaver wrote:
> On Thu, 29 Jan 2015, Peter Zijlstra wrote:
>
> > That said, it does need to do that sibling first leaders later install
> > order too. So I've put the below on top.
>
> so I've lost track of exactly which patches I should be running (do I need
> to apply both of the additional patches?)

Probably, lemme try and get all of the current stuff in tip/master to
make for easier testing.

> Meanwhile my haswell was still fuzzing away (without those two
> updated patches) until it triggered the below and crashed.
>
> [407484.309136] ------------[ cut here ]------------
> [407484.314590] WARNING: CPU: 3 PID: 27209 at kernel/watchdog.c:290 watchdog_overflow_callback+0x92/0xc0()
> [407484.325090] Watchdog detected hard LOCKUP on cpu 3
> [407484.330093] Modules linked in: btrfs xor raid6_pq ntfs vfat msdos fat dm_mod fuse x86_pkg_temp_thermal intel_powerclamp intel_rapl iosf_mbi coretemp kvm crct10dif_pclmul snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic crc32_pclmul snd_hda_intel ghash_clmulni_intel snd_hda_controller aesni_intel snd_hda_codec aes_x86_64 snd_hwdep lrw gf128mul snd_pcm ppdev glue_helper xhci_pci mei_me iTCO_wdt iTCO_vendor_support i915 snd_timer drm_kms_helper snd drm ablk_helper lpc_ich mfd_core evdev pcspkr parport_pc psmouse cryptd soundcore i2c_i801 serio_raw parport xhci_hcd mei wmi tpm_tis tpm video battery button processor i2c_algo_bit sg sr_mod sd_mod cdrom ahci libahci e1000e ehci_pci libata ptp ehci_hcd crc32c_intel usbcore scsi_mod usb_common pps_core thermal fan thermal_sys
> [407484.408496] CPU: 3 PID: 27209 Comm: perf_fuzzer Tainted: G W 3.19.0-rc6+ #126
> [407484.417914] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014
> [407484.426497] ffffffff81a3d3da ffff88011eac5aa0 ffffffff816b6761 0000000000000000
> [407484.435161] ffff88011eac5af0 ffff88011eac5ae0 ffffffff8106dcda ffff88011eac5b00
> [407484.443900] ffff8801195f9800 0000000000000001 ffff88011eac5c40 ffff88011eac5ef8
> [407484.452588] Call Trace:
> [407484.455862] <NMI> [<ffffffff816b6761>] dump_stack+0x45/0x57
> [407484.462741] [<ffffffff8106dcda>] warn_slowpath_common+0x8a/0xc0
> [407484.469851] [<ffffffff8106dd56>] warn_slowpath_fmt+0x46/0x50
> [407484.476743] [<ffffffff8101d90a>] ? native_sched_clock+0x2a/0x90
> [407484.483888] [<ffffffff8101d90a>] ? native_sched_clock+0x2a/0x90
> [407484.490999] [<ffffffff8111b922>] watchdog_overflow_callback+0x92/0xc0
> [407484.498672] [<ffffffff8115c3f1>] __perf_event_overflow+0x91/0x270
> [407484.505984] [<ffffffff8102b15a>] ? x86_perf_event_set_period+0xca/0x170
> [407484.513834] [<ffffffff8115ced9>] perf_event_overflow+0x19/0x20
> [407484.520812] [<ffffffff8103266a>] intel_pmu_handle_irq+0x1ba/0x3a0
> [407484.528119] [<ffffffff8102a04b>] perf_event_nmi_handler+0x2b/0x50
> [407484.535402] [<ffffffff81018610>] nmi_handle+0xa0/0x150
> [407484.541701] [<ffffffff81018575>] ? nmi_handle+0x5/0x150
> [407484.548069] [<ffffffff810188ba>] default_do_nmi+0x4a/0x140
> [407484.554692] [<ffffffff81018a48>] do_nmi+0x98/0xe0
> [407484.560517] [<ffffffff816c0bf1>] end_repeat_nmi+0x1e/0x2e
> [407484.567054] [<ffffffff81058bff>] ? perf_get_regs_user+0xbf/0x190
> [407484.574256] [<ffffffff81058bff>] ? perf_get_regs_user+0xbf/0x190
> [407484.581431] [<ffffffff81058bff>] ? perf_get_regs_user+0xbf/0x190
> [407484.588602] <<EOE>> <IRQ> [<ffffffff8115c28c>] perf_prepare_sample+0x2ec/0x3c0
> [407484.597358] [<ffffffff8115c46e>] __perf_event_overflow+0x10e/0x270
> [407484.604708] [<ffffffff8115c439>] ? __perf_event_overflow+0xd9/0x270
> [407484.612215] [<ffffffff8115c924>] ? perf_tp_event+0xc4/0x210
> [407484.619000] [<ffffffff8115cfe2>] ? __perf_sw_event+0x72/0x1f0
> [407484.625937] [<ffffffff8115c799>] ? perf_swevent_overflow+0xa9/0xe0
> [407484.633287] [<ffffffff8115c799>] perf_swevent_overflow+0xa9/0xe0
> [407484.640467] [<ffffffff8115c837>] perf_swevent_event+0x67/0x90
> [407484.647343] [<ffffffff8115c924>] perf_tp_event+0xc4/0x210
> [407484.653923] [<ffffffff810b6fa9>] ? lock_acquire+0x119/0x130
> [407484.660606] [<ffffffff810b3cf6>] ? perf_trace_lock_acquire+0x146/0x180
> [407484.668332] [<ffffffff810b594f>] ? __lock_acquire.isra.31+0x3af/0xfe0
> [407484.675962] [<ffffffff810b3cf6>] perf_trace_lock_acquire+0x146/0x180
> [407484.683490] [<ffffffff810b6fa9>] ? lock_acquire+0x119/0x130
> [407484.690211] [<ffffffff810b6fa9>] lock_acquire+0x119/0x130
> [407484.696750] [<ffffffff8115b7f5>] ? perf_event_wakeup+0x5/0xf0
> [407484.703640] [<ffffffff811f50ef>] ? kill_fasync+0xf/0xf0
> [407484.710008] [<ffffffff8115b828>] perf_event_wakeup+0x38/0xf0
> [407484.716798] [<ffffffff8115b7f5>] ? perf_event_wakeup+0x5/0xf0
> [407484.723696] [<ffffffff8115b913>] perf_pending_event+0x33/0x60
> [407484.730570] [<ffffffff8114cc7c>] irq_work_run_list+0x4c/0x80
> [407484.737392] [<ffffffff8114ccc8>] irq_work_run+0x18/0x40
> [407484.743765] [<ffffffff8101955f>] smp_trace_irq_work_interrupt+0x3f/0xc0
> [407484.751579] [<ffffffff816c01fd>] trace_irq_work_interrupt+0x6d/0x80
> [407484.759046] <EOI> [<ffffffff810b6f4d>] ? lock_acquire+0xbd/0x130
> [407484.766380] [<ffffffff811f4f42>] ? SyS_fcntl+0x5b2/0x650
> [407484.772786] [<ffffffff816bdb31>] _raw_spin_lock+0x31/0x40
> [407484.779321] [<ffffffff811f4f42>] ? SyS_fcntl+0x5b2/0x650
> [407484.785813] [<ffffffff811f4f42>] SyS_fcntl+0x5b2/0x650
> [407484.792109] [<ffffffff816be6ed>] system_call_fastpath+0x16/0x1b
> [407484.799195] ---[ end trace 55752a03ec8ab979 ]---

That looks like tail recursive fun! An irq work that raises and irq work
ad infinitum. Lemme see if I can squash that.. didn't we have something
like this before... /me goes look.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/