Re: [PATCH -next 1/6] Revert "md: unlock mddev before reap sync_thread in action_store"

From: Guoqing Jiang
Date: Wed Mar 22 2023 - 03:19:46 EST




On 3/22/23 14:41, Yu Kuai wrote:
From: Yu Kuai <yukuai3@xxxxxxxxxx>

This reverts commit 9dfbdafda3b34e262e43e786077bab8e476a89d1.

Because it will introduce a defect that sync_thread can be running while
MD_RECOVERY_RUNNING is cleared, which will cause some unexpected problems,
for example:

list_add corruption. prev->next should be next (ffff0001ac1daba0), but was ffff0000ce1a02a0. (prev=ffff0000ce1a02a0).
Call trace:
__list_add_valid+0xfc/0x140
insert_work+0x78/0x1a0
__queue_work+0x500/0xcf4
queue_work_on+0xe8/0x12c
md_check_recovery+0xa34/0xf30
raid10d+0xb8/0x900 [raid10]
md_thread+0x16c/0x2cc
kthread+0x1a4/0x1ec
ret_from_fork+0x10/0x18

This is because work is requeued while it's still inside workqueue:

If the workqueue subsystem can have such problem because of md flag,
then I have to think workqueue is fragile.

t1: t2:
action_store
mddev_lock
if (mddev->sync_thread)
mddev_unlock
md_unregister_thread
// first sync_thread is done
md_check_recovery
mddev_try_lock
/*
* once MD_RECOVERY_DONE is set, new sync_thread
* can start.
*/
set_bit(MD_RECOVERY_RUNNING, &mddev->recovery)
INIT_WORK(&mddev->del_work, md_start_sync)
queue_work(md_misc_wq, &mddev->del_work)
test_and_set_bit(WORK_STRUCT_PENDING_BIT, ...)

Assume you mean below,

1551 if(!test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work))) {
1552                 __queue_work(cpu, wq, work);
1553                 ret = true;
1554         }

Could you explain how the same work can be re-queued? Isn't the PENDING_BIT
is already set in t3? I believe queue_work shouldn't do that per the comment
but I am not expert ...

Returns %false if @work was already on a queue, %true otherwise.

// set pending bit
insert_work
list_add_tail
mddev_unlock
mddev_lock_nointr
md_reap_sync_thread
// MD_RECOVERY_RUNNING is cleared
mddev_unlock

t3:

// before queued work started from t2
md_check_recovery
// MD_RECOVERY_RUNNING is not set, a new sync_thread can be started
INIT_WORK(&mddev->del_work, md_start_sync)
work->data = 0
// work pending bit is cleared
queue_work(md_misc_wq, &mddev->del_work)
insert_work
list_add_tail
// list is corrupted

This patch revert the commit to fix the problem, the deadlock this
commit tries to fix will be fixed in following patches.

Pls cc the previous users who had encounter the problem to test the
second patch.

And can you share your test which can trigger the re-queued issue?
I'd like to try with latest mainline such as 6.3-rc3, and your test is
not only run against 5.10 kernel as you described before, right?

Thanks,
Guoqing