Re: [PATCH] printk: fix one circular lockdep warning about console_lock

From: Andrew Morton
Date: Tue Feb 11 2014 - 16:19:39 EST


On Tue, 11 Feb 2014 14:50:00 +0800 <jiel@xxxxxxxxxxx> wrote:

> From: Jane Li <jiel@xxxxxxxxxxx>
>
> This patch tries to fix a warning about possible circular locking
> dependency.
>
> If do in following sequence:
> enter suspend -> resume -> plug-out CPUx (echo 0 > cpux/online)
> lockdep will show warning as following:
>
> ======================================================
> [ INFO: possible circular locking dependency detected ]
> 3.10.0 #2 Tainted: G O
> -------------------------------------------------------
> sh/1271 is trying to acquire lock:
> (console_lock){+.+.+.}, at: [<c06ebf7c>] console_cpu_notify+0x20/0x2c
> but task is already holding lock:
> (cpu_hotplug.lock){+.+.+.}, at: [<c012b4e8>] cpu_hotplug_begin+0x2c/0x58
> which lock already depends on the new lock.
>
> the existing dependency chain (in reverse order) is:
> -> #2 (cpu_hotplug.lock){+.+.+.}:
> [<c017bb7c>] lock_acquire+0x98/0x12c
> [<c06f5014>] mutex_lock_nested+0x50/0x3d8
> [<c012b4e8>] cpu_hotplug_begin+0x2c/0x58
> [<c06ebfac>] _cpu_up+0x24/0x154
> [<c06ec140>] cpu_up+0x64/0x84
> [<c0981834>] smp_init+0x9c/0xd4
> [<c0973880>] kernel_init_freeable+0x78/0x1c8
> [<c06e7f40>] kernel_init+0x8/0xe4
> [<c010eec8>] ret_from_fork+0x14/0x2c
>
> -> #1 (cpu_add_remove_lock){+.+.+.}:
> [<c017bb7c>] lock_acquire+0x98/0x12c
> [<c06f5014>] mutex_lock_nested+0x50/0x3d8
> [<c012b758>] disable_nonboot_cpus+0x8/0xe8
> [<c016b83c>] suspend_devices_and_enter+0x214/0x448
> [<c016bc54>] pm_suspend+0x1e4/0x284
> [<c016bdcc>] try_to_suspend+0xa4/0xbc
> [<c0143848>] process_one_work+0x1c4/0x4fc
> [<c0143f80>] worker_thread+0x138/0x37c
> [<c014aaf8>] kthread+0xa4/0xb0
> [<c010eec8>] ret_from_fork+0x14/0x2c
>
> -> #0 (console_lock){+.+.+.}:
> [<c017b5d0>] __lock_acquire+0x1b38/0x1b80
> [<c017bb7c>] lock_acquire+0x98/0x12c
> [<c01288c4>] console_lock+0x54/0x68
> [<c06ebf7c>] console_cpu_notify+0x20/0x2c
> [<c01501d4>] notifier_call_chain+0x44/0x84
> [<c012b448>] __cpu_notify+0x2c/0x48
> [<c012b5b0>] cpu_notify_nofail+0x8/0x14
> [<c06e81bc>] _cpu_down+0xf4/0x258
> [<c06e8344>] cpu_down+0x24/0x40
> [<c06e921c>] store_online+0x30/0x74
> [<c03b7298>] dev_attr_store+0x18/0x24
> [<c025fc5c>] sysfs_write_file+0x16c/0x19c
> [<c0207a98>] vfs_write+0xb4/0x190
> [<c0207e58>] SyS_write+0x3c/0x70
> [<c010ee00>] ret_fast_syscall+0x0/0x48
>
> Chain exists of:
> console_lock --> cpu_add_remove_lock --> cpu_hotplug.lock
>
> Possible unsafe locking scenario:
> CPU0 CPU1
> ---- ----
> lock(cpu_hotplug.lock);
> lock(cpu_add_remove_lock);
> lock(cpu_hotplug.lock);
> lock(console_lock);
> *** DEADLOCK ***

These traces hurt my brain.

> There are three locks involved in two sequence:
> a) pm suspend:
> console_lock (@suspend_console())
> cpu_add_remove_lock (@disable_nonboot_cpus())
> cpu_hotplug.lock (@_cpu_down())

But but but. suspend_console() releases console_sem again. So the
sequence is actually

down(&console_sem) (@suspend_console())
up(&console_sem) (@suspend_console())
cpu_add_remove_lock (@disable_nonboot_cpus())
cpu_hotplug.lock (@_cpu_down())

So console_sem *doesn't* nest outside cpu_add_remove_lock and
cpu_hotplug.lock.

> b) Plug-out CPUx:
> cpu_add_remove_lock (@(cpu_down())
> cpu_hotplug.lock (@_cpu_down())
> console_lock (@console_cpu_notify()) => Lockdeps prints warning log.
>
> There should be not real deadlock, as flag of console_suspended can
> protect this.

console_lock() does down(&console_sem) *before* testing
console_suspended, so I don't understand this sentence - a more
detailed description would help.

> Printk registers cpu hotplug notify function. When CPUx is plug-out/in,
> always execute console_lock() and console_unlock(). This patch
> modifies that with console_trylock() and console_unlock(). Then use
> that instead of the unconditional console_lock/unlock pair to avoid the
> warning.
>
> ...
>
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -1893,6 +1893,20 @@ void resume_console(void)
> }
>
> /**
> + * console_flush - flush dmesg if console isn't suspended
> + *
> + * console_unlock always flushes the dmesg buffer, so just try to
> + * grab&drop the console lock. If that fails we know that the current
> + * holder will eventually drop the console lock and so flush the dmesg
> + * buffers at the earliest possible time.
> + */

The comment should describe why we added this code, please: talk about
cpu_hotplug.lock and console_lock.

> +void console_flush(void)
> +{
> + if (console_trylock())
> + console_unlock();
> +}
> +
> +/**
> * console_cpu_notify - print deferred console messages after CPU hotplug
> * @self: notifier struct
> * @action: CPU hotplug event
> @@ -1911,8 +1925,7 @@ static int console_cpu_notify(struct notifier_block *self,
> case CPU_DEAD:
> case CPU_DOWN_FAILED:
> case CPU_UP_CANCELED:
> - console_lock();
> - console_unlock();
> + console_flush();
> }
> return NOTIFY_OK;

Well, this is a bit hacky and makes the already-far-too-complex code
even more complex. If it is indeed the case that the deadlock cannot
really occur then let's try to find a way of suppressing the lockdep
warning without making runtime changes.

What I'm struggling with is what *should* the ranking of these locks be?
>From a conceptual high-level design standpoint, which is the
"innermost" lock? I tend to think that it is console_lock, because
blocking CPU hotplug is a quite high-level operation.

But console_lock is such a kooky special-case in the way it is used to
control the printk corking that it is hard to take general rules and
apply them here.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/