Re: sched_setscheduler() vs idle_balance() race

From: Mike Galbraith
Date: Fri May 29 2015 - 14:30:43 EST


On Thu, 2015-05-28 at 17:24 +0200, Peter Zijlstra wrote:
> On Thu, May 28, 2015 at 04:54:26PM +0200, Mike Galbraith wrote:
>
> > > The below is compile tested only, but it might just work if I didn't
> > > miss anything :-)
> >
> > I'll take it for a spin, and take a peek at the application.
>
> Thanks!

It took quite a bit longer than I thought it would, but I finally
managed to cobble a standalone testcase together that brings nearly
instant gratification on my 8 socket DL980. Patched kernel explodes, so
first cut ain't quite ready to ship ;-)

I applied say no to migration if ->pi_lock is held, and otherwise toxic
testcase was rendered harmless, so seems it is a hole in the patch.

Here's the burp, I haven't rummaged around at all yet.

[ 286.105446] ------------[ cut here ]------------
[ 286.151163] kernel BUG at kernel/sched/rt.c:986!
[ 286.203404] invalid opcode: 0000 [#1] SMP
[ 286.249093] Dumping ftrace buffer:
[ 286.288337] (ftrace buffer empty)
[ 286.328403] Modules linked in: edd af_packet cpufreq_conservative cpufreq_userspace cpufreq_powersave fuse loop md_mod dm_mod iTCO_wdt gpio_ich iTCO_vendor_support ipmi_ssif joydev i7core_edac ipmi_si lpc_ich hpilo hid_generic netxen_nic hpwdt shpchp sr_mod ehci_pci mfd_core pcspkr bnx2 edac_core ipmi_msghandler cdrom sg pcc_cpufreq 8250_fintek acpi_cpufreq acpi_power_meter button usbhid uhci_hcd ehci_hcd usbcore thermal usb_common processor scsi_dh_hp_sw scsi_dh_emc scsi_dh_rdac scsi_dh_alua scsi_dh ata_generic ata_piix hpsa cciss
[ 286.855938] CPU: 3 PID: 6893 Comm: massive_intr_x Not tainted 4.1.0-default #2
[ 286.933673] Hardware name: Hewlett-Packard ProLiant DL980 G7, BIOS P66 07/07/2010
[ 287.009379] task: ffff8802717bc4d0 ti: ffff8802715b4000 task.ti: ffff8802715b4000
[ 287.089247] RIP: 0010:[<ffffffff810a75d4>] [<ffffffff810a75d4>] dequeue_top_rt_rq+0x44/0x50
[ 287.184723] RSP: 0018:ffff8802715b7d98 EFLAGS: 00010046
[ 287.244782] RAX: ffff880277316480 RBX: ffff88007a4ba788 RCX: 00000000000025c7
[ 287.326088] RDX: 0000000000000000 RSI: ffff88007a4ba590 RDI: ffff880277316618
[ 287.407138] RBP: ffff8802715b7d98 R08: ffffffff81c3ff00 R09: 0000000000001aed
[ 287.487730] R10: ffff88007a4ba590 R11: 0000000000000001 R12: ffff880277316480
[ 287.568328] R13: ffff880277316c90 R14: ffff8802715b7ed8 R15: ffff88007a4ba590
[ 287.649732] FS: 00007efc0515c700(0000) GS:ffff8802766c0000(0000) knlGS:0000000000000000
[ 287.741131] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 287.805467] CR2: ffffffffff600400 CR3: 000000026f6a1000 CR4: 00000000000007e0
[ 287.889403] Stack:
[ 287.912171] ffff8802715b7dc8 ffffffff810a81fc ffff88007a4ba788 ffff880277316480
[ 287.995061] ffff880277316c90 ffff8802715b7ed8 ffff8802715b7de8 ffffffff810a909f
[ 288.079516] ffff880277316480 ffff88007a4ba590 ffff8802715b7e18 ffffffff810a9691
[ 288.163784] Call Trace:
[ 288.193691] [<ffffffff810a81fc>] dequeue_rt_stack+0x3c/0x350
[ 288.260484] [<ffffffff810a909f>] dequeue_rt_entity+0x1f/0x80
[ 288.330554] [<ffffffff810a9691>] dequeue_task_rt+0x31/0x80
[ 288.395212] [<ffffffff8108e16c>] dequeue_task+0x5c/0x80
[ 288.472481] [<ffffffff81091ef5>] __sched_setscheduler+0x635/0xa50
[ 288.547063] [<ffffffff81092378>] _sched_setscheduler+0x68/0x70
[ 288.613281] [<ffffffff81092401>] do_sched_setscheduler+0x61/0xa0
[ 288.681984] [<ffffffff81094f82>] SyS_sched_setscheduler+0x12/0x30
[ 288.750797] [<ffffffff81669cb2>] system_call_fastpath+0x16/0x75
[ 288.819013] Code: d7 75 26 8b 97 ac 06 00 00 85 d2 74 1a 8b 50 04 85 d2 74 17 2b 97 50 06 00 00 89 50 04 c7 87 ac 06 00 00 00 00 00 00 5d c3 0f 0b <0f> 0b 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 89 e5
[ 289.037988] RIP [<ffffffff810a75d4>] dequeue_top_rt_rq+0x44/0x50
[ 289.100594] RSP <ffff8802715b7d98>


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/