v4.11-rc7, deadlock? workqueue pool->lock and serial8250 port_lock_key

From: J. R. Okajima
Date: Tue Apr 25 2017 - 10:50:36 EST


Hello,

I've found this kernel log from v4.11.0-rc7, and all(?) workqueues
seem stopped working. The system is using a serial console by the
kernel command line
root=3D/dev/sda8 ro console=3DttyS0,115200 console=3Dtty0 mem=3D1G

Does it mean that show_workqueue_state() on a serial console (almost?)
always cause a deadlock?
Who I can ask to check this log?

I am not sure whether this is related, but I added a workqueue for my
development. There is no such message in the log about my workqueue, but
I am afraid my development is related to any change around workqueue.


J. R. Okajima


=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D
[ INFO: possible circular locking dependency detected ]
4.11.0-rc7aufsD+ #114 Not tainted
-------------------------------------------------------
dockerd/777 is trying to acquire lock:
(&port_lock_key){-.-.-.}, at: [<ffffffff81b8030d>] serial8250_console_wri=
te+0x1ad/0x570

but task is already holding lock:
(&(&pool->lock)->rlock){-.-.-.}, at: [<ffffffff81117bd0>] show_workqueue_=
state+0x1e0/0xd90

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #2 (&(&pool->lock)->rlock){-.-.-.}:
lock_acquire+0xd5/0x380
_raw_spin_lock+0x32/0x50
__queue_work+0x143/0x1090
queue_work_on+0x38/0x80
put_pwq+0x152/0x2a0
put_pwq_unlocked+0x22/0x30
destroy_workqueue+0x3bd/0x680
nfs4_state_shutdown+0x15/0x30 [nfsd]
nfsd_shutdown_generic+0x26/0x50 [nfsd]
nfsd_shutdown_net+0x135/0x2b0 [nfsd]
nfsd_last_thread+0x21e/0x2d0 [nfsd]
svc_shutdown_net+0x4c/0xa0
nfsd_destroy+0x1b3/0x2f0 [nfsd]
nfsd+0x28f/0x480 [nfsd]
kthread+0x1a0/0x2d0
ret_from_fork+0x2e/0x40

-> #1 (&pool->lock/1){-.-.-.}:
lock_acquire+0xd5/0x380
_raw_spin_lock+0x32/0x50
__queue_work+0x143/0x1090
queue_work_on+0x38/0x80
tty_schedule_flip+0x5b/0xb0
tty_flip_buffer_push+0xe/0x10
serial8250_rx_chars+0x19c/0x580
serial8250_handle_irq.part.17+0xd5/0x190
serial8250_default_handle_irq+0x5c/0xa0
serial8250_interrupt+0x83/0x160
__handle_irq_event_percpu+0x73/0x8f0
handle_irq_event_percpu+0x25/0x90
handle_irq_event+0x7d/0x160
handle_edge_irq+0x139/0x9e0
handle_irq+0xa1/0x400
do_IRQ+0x82/0x190
ret_from_intr+0x0/0x19
cpuidle_enter_state+0x150/0x820
cpuidle_enter+0x17/0x20
do_idle+0x25a/0x480
cpu_startup_entry+0x70/0x90
rest_init+0x15e/0x170
start_kernel+0x4e1/0x4ee
x86_64_start_reservations+0x2a/0x2c
x86_64_start_kernel+0xe7/0xee
verify_cpu+0x0/0xfc

-> #0 (&port_lock_key){-.-.-.}:
__lock_acquire+0x1fb7/0x2e00
lock_acquire+0xd5/0x380
_raw_spin_lock_irqsave+0x3d/0x60
serial8250_console_write+0x1ad/0x570
univ8250_console_write+0x36/0x60
console_unlock+0x59b/0xa00
vprintk_emit+0x241/0x430
vprintk_default+0x1f/0x30
vprintk_func+0x2c/0x80
printk+0x43/0x4b
show_workqueue_state+0x2a1/0xd90
wq_watchdog_timer_fn+0x245/0x3e0
call_timer_fn+0xd4/0x760
run_timer_softirq+0x541/0xbb0
__do_softirq+0x117/0xcb0
irq_exit+0xf9/0x170
smp_apic_timer_interrupt+0x44/0x70
apic_timer_interrupt+0x90/0xa0
read_hpet+0x107/0x120
ktime_get_ts64+0x115/0x460
poll_select_copy_remaining+0x7e/0x240
SyS_select+0x117/0x160
entry_SYSCALL_64_fastpath+0x18/0xad

other info that might help us debug this:

Chain exists of:
&port_lock_key --> &pool->lock/1 --> &(&pool->lock)->rlock

Possible unsafe locking scenario:

CPU0 CPU1
---- ----
lock(&(&pool->lock)->rlock);
lock(&pool->lock/1);
lock(&(&pool->lock)->rlock);
lock(&port_lock_key);

*** DEADLOCK ***

4 locks held by dockerd/777:
#0: (/proj/aufs/aufs4-linux.git/kernel/workqueue.c:5320){+.-.-.}, at: [<=
ffffffff811d278b>] call_timer_fn+0x3b/0x760
#1: (rcu_read_lock_sched){......}, at: [<ffffffff811179f5>] show_workque=
ue_state+0x5/0xd90
#2: (&(&pool->lock)->rlock){-.-.-.}, at: [<ffffffff81117bd0>] show_workq=
ueue_state+0x1e0/0xd90
#3: (console_lock){+.+.+.}, at: [<ffffffff8119c184>] vprintk_emit+0x234/=
0x430

stack backtrace:
CPU: 0 PID: 777 Comm: dockerd Not tainted 4.11.0-rc7aufsD+ #114
Hardware name: Pegatron Pegatron/IPM41, BIOS 0001 02/05/2009
Call Trace:
<IRQ>
dump_stack+0x68/0x9d
print_circular_bug+0x2a6/0x2b7
__lock_acquire+0x1fb7/0x2e00
lock_acquire+0xd5/0x380
? lock_acquire+0xd5/0x380
? serial8250_console_write+0x1ad/0x570
_raw_spin_lock_irqsave+0x3d/0x60
? serial8250_console_write+0x1ad/0x570
serial8250_console_write+0x1ad/0x570
? _raw_spin_unlock+0x69/0x80
univ8250_console_write+0x36/0x60
console_unlock+0x59b/0xa00
? __down_trylock_console_sem+0x68/0x80
? vprintk_emit+0x234/0x430
vprintk_emit+0x241/0x430
vprintk_default+0x1f/0x30
vprintk_func+0x2c/0x80
printk+0x43/0x4b
show_workqueue_state+0x2a1/0xd90
? check_preemption_disabled+0x40/0x160
wq_watchdog_timer_fn+0x245/0x3e0
? destroy_workqueue+0x680/0x680
call_timer_fn+0xd4/0x760
? destroy_workqueue+0x680/0x680
run_timer_softirq+0x541/0xbb0
__do_softirq+0x117/0xcb0
irq_exit+0xf9/0x170
smp_apic_timer_interrupt+0x44/0x70
apic_timer_interrupt+0x90/0xa0
RIP: 0010:read_hpet+0x107/0x120
RSP: 0018:ffff88001ed5bdf8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10
RAX: 000000000052aa86 RBX: 0000000000000246 RCX: 0000000000000000
RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff880019c34a00
RBP: ffff88001ed5be20 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000091838cd1
R13: 0000000001594bf8 R14: ffff88001ed5bf00 R15: ffff88001ed5be88
</IRQ>
? hpet_interrupt_handler+0x80/0x80
ktime_get_ts64+0x115/0x460
poll_select_copy_remaining+0x7e/0x240
? poll_select_set_timeout+0x91/0xf0
SyS_select+0x117/0x160
entry_SYSCALL_64_fastpath+0x18/0xad
RIP: 0033:0x472017
RSP: 002b:00007f9eb391ece8 EFLAGS: 00000246 ORIG_RAX: 0000000000000017
RAX: ffffffffffffffda RBX: 000000000000006e RCX: 0000000000472017
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 0000000000000046 R08: 00007f9eb391ece8 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00000000000000f3 R14: 0000000000000030 R15: 0000000000000003