Re: [PATCH RESEND 0/3] memory_hotplug: hyperv: fix deadlock between memory adding and onlining

From: Rafael J. Wysocki
Date: Thu Feb 12 2015 - 17:04:54 EST


On Thursday, February 12, 2015 10:01:48 PM KY Srinivasan wrote:
>
> > -----Original Message-----
> > From: Rafael J. Wysocki [mailto:rjw@xxxxxxxxxxxxx]
> > Sent: Thursday, February 12, 2015 2:13 PM
> > To: KY Srinivasan
> > Cc: Vitaly Kuznetsov; linux-kernel@xxxxxxxxxxxxxxx; Greg Kroah-Hartman;
> > Haiyang Zhang; Andrew Morton; Yasuaki Ishimatsu; Tang Chen; Vlastimil
> > Babka; David Rientjes; Fabian Frederick; Zhang Zhen; Vladimir Davydov;
> > Wang Nan; devel@xxxxxxxxxxxxxxxxxxxxxx; linux-mm@xxxxxxxxx
> > Subject: Re: [PATCH RESEND 0/3] memory_hotplug: hyperv: fix deadlock
> > between memory adding and onlining
> >
> > On Thursday, February 12, 2015 03:38:42 PM KY Srinivasan wrote:
> > >
> > > > -----Original Message-----
> > > > From: Vitaly Kuznetsov [mailto:vkuznets@xxxxxxxxxx]
> > > > Sent: Thursday, February 12, 2015 2:24 AM
> > > > To: linux-kernel@xxxxxxxxxxxxxxx
> > > > Cc: Greg Kroah-Hartman; KY Srinivasan; Haiyang Zhang; Andrew Morton;
> > > > Yasuaki Ishimatsu; Tang Chen; Vlastimil Babka; David Rientjes;
> > > > Fabian Frederick; Zhang Zhen; Vladimir Davydov; Wang Nan; Rafael J.
> > > > Wysocki; devel@xxxxxxxxxxxxxxxxxxxxxx; linux-mm@xxxxxxxxx
> > > > Subject: [PATCH RESEND 0/3] memory_hotplug: hyperv: fix deadlock
> > > > between memory adding and onlining
> > > >
> > > > RESEND (with no changes) because Rafael J. Wysocki was missing in
> > > > recepients.
> > > >
> > > > If newly added memory is brought online with e.g. udev rule:
> > > > SUBSYSTEM=="memory", ACTION=="add", ATTR{state}="online"
> > > > the following deadlock is observed (and easily reproducable):
> > > >
> > > > First participant, worker thread doing add_memory():
> > > >
> > > > [ 724.948846] kworker/0:1 D ffff88000412f9c8 13248 27 2
> > 0x00000000
> > > > [ 724.973543] Workqueue: events hot_add_req [hv_balloon] [
> > > > 724.991736]
> > > > ffff88000412f9c8 0000000000000000 ffff88003fa1dc30 00000000000151c0
> > > > [ 725.019725] 0000000000000246 ffff88000412ffd8 00000000000151c0
> > > > ffff88003a77a4e0 [ 725.046486] ffff88003fa1dc30 00000001032a6000
> > > > ffff88003a7ca838 ffff88003a7ca898 [ 725.072969] Call Trace:
> > > > [ 725.082690] [<ffffffff81aac0a9>]
> > > > schedule_preempt_disabled+0x29/0x70
> > > > [ 725.103799] [<ffffffff81aae33b>] mutex_lock_nested+0x14b/0x470 [
> > > > 725.122367] [<ffffffff815ed773>] ? device_attach+0x23/0xb0 [
> > > > 725.140992] [<ffffffff815ed773>] device_attach+0x23/0xb0 [
> > > > 725.159131] [<ffffffff815ecba0>] bus_probe_device+0xb0/0xe0 [
> > > > 725.177055] [<ffffffff815ea693>] device_add+0x443/0x650 [
> > > > 725.195558] [<ffffffff815ea8be>] device_register+0x1e/0x30 [
> > > > 725.213133] [<ffffffff81601790>] init_memory_block+0xd0/0xf0 [
> > > > 725.231533] [<ffffffff816018f1>] register_new_memory+0xb1/0xd0 [
> > > > 725.250769] [<ffffffff81a961cf>] __add_pages+0x13f/0x250 [
> > > > 725.269642] [<ffffffff81063770>] ? arch_add_memory+0x70/0xf0 [
> > > > 725.288764] [<ffffffff81063770>] arch_add_memory+0x70/0xf0 [
> > > > 725.306117] [<ffffffff81a95f8f>] add_memory+0xef/0x1f0 [
> > > > 725.322466] [<ffffffffa00293af>] hot_add_req+0x33f/0xf90
> > > > [hv_balloon] [ 725.342777] [<ffffffff8109509f>]
> > > > process_one_work+0x1df/0x4e0 [ 725.361459] [<ffffffff8109502d>] ?
> > > > process_one_work+0x16d/0x4e0 [ 725.380390] [<ffffffff810954bb>]
> > > > worker_thread+0x11b/0x450 [ 725.397684] [<ffffffff810953a0>] ?
> > > > process_one_work+0x4e0/0x4e0 [ 725.416533] [<ffffffff8109ac33>]
> > > > kthread+0xf3/0x110 [ 725.433372] [<ffffffff8109ab40>] ?
> > > > kthread_create_on_node+0x240/0x240
> > > > [ 725.453749] [<ffffffff81ab1dfc>] ret_from_fork+0x7c/0xb0 [
> > > > 725.470994] [<ffffffff8109ab40>] ?
> > > > kthread_create_on_node+0x240/0x240
> > > > [ 725.491469] 6 locks held by kworker/0:1/27:
> > > > [ 725.505037] #0: ("events"){......}, at: [<ffffffff8109502d>]
> > > > process_one_work+0x16d/0x4e0 [ 725.533370] #1:
> > > > ((&dm_device.ha_wrk.wrk)){......}, at: [<ffffffff8109502d>]
> > > > process_one_work+0x16d/0x4e0 [ 725.565580] #2:
> > > > (mem_hotplug.lock){......}, at: [<ffffffff811e6525>]
> > > > mem_hotplug_begin+0x5/0x80 [ 725.594369] #3:
> > > > (mem_hotplug.lock#2){......}, at: [<ffffffff811e656f>]
> > > > mem_hotplug_begin+0x4f/0x80 [ 725.628554] #4:
> > > > (mem_sysfs_mutex){......}, at: [<ffffffff81601873>]
> > > > register_new_memory+0x33/0xd0 [ 725.658519] #5:
> > > > (&dev->mutex){......},
> > > > at: [<ffffffff815ed773>] device_attach+0x23/0xb0
> > > >
> > > > Second participant, udev:
> > > >
> > > > [ 725.750889] systemd-udevd D ffff88003b94fc68 14016 888 530
> > > > 0x00000004
> > > > [ 725.773767] ffff88003b94fc68 0000000000000000 ffff8800034949c0
> > > > 00000000000151c0 [ 725.798332] ffffffff8210d980 ffff88003b94ffd8
> > > > 00000000000151c0 ffff880037a69270 [ 725.822841] ffff8800034949c0
> > > > 0000000100000001 ffff8800034949c0 ffffffff81ff2b48 [ 725.849184] Call
> > Trace:
> > > > [ 725.858987] [<ffffffff81aac0a9>]
> > > > schedule_preempt_disabled+0x29/0x70
> > > > [ 725.879231] [<ffffffff81aae33b>] mutex_lock_nested+0x14b/0x470 [
> > > > 725.897860] [<ffffffff811e656f>] ? mem_hotplug_begin+0x4f/0x80 [
> > > > 725.916698] [<ffffffff811e656f>] mem_hotplug_begin+0x4f/0x80 [
> > > > 725.935064] [<ffffffff811e6525>] ? mem_hotplug_begin+0x5/0x80 [
> > > > 725.953464] [<ffffffff81a9631b>] online_pages+0x3b/0x520 [
> > > > 725.971542] [<ffffffff815eb0b3>] ? device_online+0x23/0xa0 [
> > > > 725.989207] [<ffffffff81601524>] memory_subsys_online+0x64/0xc0 [
> > > > 726.008513] [<ffffffff815eb0fd>] device_online+0x6d/0xa0 [
> > > > 726.025579] [<ffffffff816012eb>] store_mem_state+0x5b/0xe0 [
> > > > 726.043400] [<ffffffff815e8258>] dev_attr_store+0x18/0x30 [
> > > > 726.060506] [<ffffffff8127a808>] sysfs_kf_write+0x48/0x60 [
> > > > 726.077940] [<ffffffff81279d1b>] kernfs_fop_write+0x13b/0x1a0 [
> > > > 726.099416] [<ffffffff811f9f67>] vfs_write+0xb7/0x1f0 [ 726.115748]
> > > > [<ffffffff811fabf8>]
> > > > SyS_write+0x58/0xd0 [ 726.131933] [<ffffffff81ab1ea9>]
> > > > system_call_fastpath+0x12/0x17 [ 726.150691] 7 locks held by
> > > > systemd-
> > > > udevd/888:
> > > > [ 726.165044] #0: (sb_writers#3){......}, at:
> > > > [<ffffffff811fa063>]
> > > > vfs_write+0x1b3/0x1f0 [ 726.192422] #1: (&of->mutex){......}, at:
> > > > [<ffffffff81279c46>] kernfs_fop_write+0x66/0x1a0 [ 726.220289] #2:
> > > > (s_active#60){......}, at: [<ffffffff81279c4e>]
> > > > kernfs_fop_write+0x6e/0x1a0 [ 726.249382] #3:
> > > > (device_hotplug_lock){......}, at: [<ffffffff815e9c15>]
> > > > lock_device_hotplug_sysfs+0x15/0x50
> > > > [ 726.281901] #4: (&dev->mutex){......}, at: [<ffffffff815eb0b3>]
> > > > device_online+0x23/0xa0 [ 726.308619] #5: (mem_hotplug.lock){......},
> > at:
> > > > [<ffffffff811e6525>] mem_hotplug_begin+0x5/0x80 [ 726.337994] #6:
> > > > (mem_hotplug.lock#2){......}, at: [<ffffffff811e656f>]
> > > > mem_hotplug_begin+0x4f/0x80
> > > >
> > > > In short: onlining grabs device lock and then tries to do
> > > > mem_hotplug_begin() while add_memory() is between
> > > > mem_hotplug_begin() and mem_hotplug_done() and it tries grabbing
> > > > device lock.
> > > >
> > > > To my understanding ACPI memory hotplug doesn't have the same issue
> > > > as device_hotplug_lock is being grabbed when the ACPI device is added.
> > > >
> > > > Solve the issue by grabbing device_hotplug_lock before doing
> > > > add_memory(). If we do that, lock_device_hotplug_sysfs() will cause
> > > > syscall retry which will eventually succeed. To support the change
> > > > we need to export lock_device_hotplug/ unlock_device_hotplug. This
> > > > approach can be completely wrong though.
> > >
> > > This issue was first discovered by Andy Whitcroft:
> > > https://lkml.org/lkml/2014/3/14/451
> > > I had sent patches based on Andy's analysis that did not affect the
> > > users of the kernel hot-add memory APIs:
> > > https://lkml.org/lkml/2014/12/2/662
> > >
> > > This patch puts the burden where it needs to be and can address the issue
> > for all clients.
> >
> > That seems to mean that this series is not needed. Is that correct?
>
> This patch was never committed upstream and so the issue still is there.

Well, I'm not sure what to do now to be honest.

Is this series regarded as the right way to address the problem that
everybody is comfortable with? Or is it still under discussion?

Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/