[PATCH] migrate_pages: fix deadlock on waiting writeback

From: Huang Ying
Date: Mon Feb 20 2023 - 01:56:34 EST


Pengfei reported a system soft lockup issue with Syzkaller. The stack
traces are as follows,

...
[ 300.124933] INFO: task kworker/u4:3:73 blocked for more than 147 seconds.
[ 300.125214] Not tainted 6.2.0-rc4-kvm+ #1314
[ 300.125408] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 300.125736] task:kworker/u4:3 state:D stack:0 pid:73 ppid:2 flags:0x00004000
[ 300.126059] Workqueue: writeback wb_workfn (flush-7:3)
[ 300.126282] Call Trace:
[ 300.126378] <TASK>
[ 300.126464] __schedule+0x43b/0xd00
[ 300.126601] ? __blk_flush_plug+0x142/0x180
[ 300.126765] schedule+0x6a/0xf0
[ 300.126912] io_schedule+0x4a/0x80
[ 300.127051] folio_wait_bit_common+0x1b5/0x4e0
[ 300.127227] ? __pfx_wake_page_function+0x10/0x10
[ 300.127403] __folio_lock+0x27/0x40
[ 300.127541] write_cache_pages+0x350/0x870
[ 300.127699] ? __pfx_iomap_do_writepage+0x10/0x10
[ 300.127889] iomap_writepages+0x3f/0x80
[ 300.128037] xfs_vm_writepages+0x94/0xd0
[ 300.128192] ? __pfx_xfs_vm_writepages+0x10/0x10
[ 300.128370] do_writepages+0x10a/0x240
[ 300.128514] ? lock_is_held_type+0xe6/0x140
[ 300.128675] __writeback_single_inode+0x9f/0xa90
[ 300.128854] writeback_sb_inodes+0x2fb/0x8d0
[ 300.129030] __writeback_inodes_wb+0x68/0x150
[ 300.129212] wb_writeback+0x49c/0x770
[ 300.129357] wb_workfn+0x6fb/0x9d0
[ 300.129500] process_one_work+0x3cc/0x8d0
[ 300.129669] worker_thread+0x66/0x630
[ 300.129824] ? __pfx_worker_thread+0x10/0x10
[ 300.129989] kthread+0x153/0x190
[ 300.130116] ? __pfx_kthread+0x10/0x10
[ 300.130264] ret_from_fork+0x29/0x50
[ 300.130409] </TASK>
[ 300.179347] INFO: task repro:1023 blocked for more than 147 seconds.
[ 300.179905] Not tainted 6.2.0-rc4-kvm+ #1314
[ 300.180317] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 300.180955] task:repro state:D stack:0 pid:1023 ppid:360 flags:0x00004004
[ 300.181660] Call Trace:
[ 300.181879] <TASK>
[ 300.182085] __schedule+0x43b/0xd00
[ 300.182407] schedule+0x6a/0xf0
[ 300.182694] io_schedule+0x4a/0x80
[ 300.183020] folio_wait_bit_common+0x1b5/0x4e0
[ 300.183506] ? compaction_alloc+0x77/0x1150
[ 300.183892] ? __pfx_wake_page_function+0x10/0x10
[ 300.184304] folio_wait_bit+0x30/0x40
[ 300.184640] folio_wait_writeback+0x2e/0x1e0
[ 300.185034] migrate_pages_batch+0x555/0x1ac0
[ 300.185462] ? __pfx_compaction_alloc+0x10/0x10
[ 300.185808] ? __pfx_compaction_free+0x10/0x10
[ 300.186022] ? __this_cpu_preempt_check+0x17/0x20
[ 300.186234] ? lock_is_held_type+0xe6/0x140
[ 300.186423] migrate_pages+0x100e/0x1180
[ 300.186603] ? __pfx_compaction_free+0x10/0x10
[ 300.186800] ? __pfx_compaction_alloc+0x10/0x10
[ 300.187011] compact_zone+0xe10/0x1b50
[ 300.187182] ? lock_is_held_type+0xe6/0x140
[ 300.187374] ? check_preemption_disabled+0x80/0xf0
[ 300.187588] compact_node+0xa3/0x100
[ 300.187755] ? __sanitizer_cov_trace_const_cmp8+0x1c/0x30
[ 300.187993] ? _find_first_bit+0x7b/0x90
[ 300.188171] sysctl_compaction_handler+0x5d/0xb0
[ 300.188376] proc_sys_call_handler+0x29d/0x420
[ 300.188583] proc_sys_write+0x2b/0x40
[ 300.188749] vfs_write+0x3a3/0x780
[ 300.188912] ksys_write+0xb7/0x180
[ 300.189070] __x64_sys_write+0x26/0x30
[ 300.189260] do_syscall_64+0x3b/0x90
[ 300.189424] entry_SYSCALL_64_after_hwframe+0x72/0xdc
[ 300.189654] RIP: 0033:0x7f3a2471f59d
[ 300.189815] RSP: 002b:00007ffe567f7288 EFLAGS: 00000217 ORIG_RAX: 0000000000000001
[ 300.190137] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f3a2471f59d
[ 300.190397] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000005
[ 300.190653] RBP: 00007ffe567f72a0 R08: 0000000000000010 R09: 0000000000000010
[ 300.190910] R10: 0000000000000010 R11: 0000000000000217 R12: 00000000004012e0
[ 300.191172] R13: 00007ffe567f73e0 R14: 0000000000000000 R15: 0000000000000000
[ 300.191440] </TASK>
...

To migrate a folio, we may wait the writeback of a folio to complete
when we already have held the lock of some folios. But the writeback
code may wait to lock some folio we held lock. This causes the
deadlock. To fix the issue, we will avoid to wait the writeback to
complete if we have locked some folios. After moving the locked
folios and unlocked, we will retry.

Signed-off-by: "Huang, Ying" <ying.huang@xxxxxxxxx>
Reported-by: "Xu, Pengfei" <pengfei.xu@xxxxxxxxx>
Cc: Hugh Dickins <hughd@xxxxxxxxxx>
Cc: Christoph Hellwig <hch@xxxxxx>
Cc: Stefan Roesch <shr@xxxxxxxxxxxx>
Cc: Tejun Heo <tj@xxxxxxxxxx>
Cc: Xin Hao <xhao@xxxxxxxxxxxxxxxxx>
Cc: Zi Yan <ziy@xxxxxxxxxx>
Cc: Yang Shi <shy828301@xxxxxxxxx>
Cc: Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx>
Cc: Matthew Wilcox <willy@xxxxxxxxxxxxx>
Cc: Mike Kravetz <mike.kravetz@xxxxxxxxxx>
---
mm/migrate.c | 12 ++++++++++++
1 file changed, 12 insertions(+)

diff --git a/mm/migrate.c b/mm/migrate.c
index 28b435cdeac8..bc9a8050f1b0 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1205,6 +1205,18 @@ static int migrate_folio_unmap(new_page_t get_new_page, free_page_t put_new_page
}
if (!force)
goto out;
+ /*
+ * We have locked some folios and are going to wait the
+ * writeback of this folio to complete. But it's possible for
+ * the writeback to wait to lock the folios we have locked. To
+ * avoid a potential deadlock, let's bail out and not do that.
+ * The locked folios will be moved and unlocked, then we
+ * can wait the writeback of this folio.
+ */
+ if (avoid_force_lock) {
+ rc = -EDEADLOCK;
+ goto out;
+ }
folio_wait_writeback(src);
}

--
2.39.1