Re: [PATCH v2] driver core: Fix concurrent problem of deferred_probe_extend_timeout()

From: wangwensheng (C)
Date: Thu Aug 14 2025 - 21:43:45 EST




在 2025/8/14 22:19, Greg KH 写道:
On Thu, Aug 14, 2025 at 08:29:49PM +0800, Wang Wensheng wrote:
The deferred_probe_timeout_work may be canceled forever unexpected when
deferred_probe_extend_timeout() executes concurrently. Start with
deferred_probe_timeout_work pending, and the problem would
occur after the following sequence.

CPU0 CPU1
deferred_probe_extend_timeout
-> cancel_delayed_work => true
deferred_probe_extend_timeout
-> cancel_delayed_wrok
-> __cancel_work
-> try_grab_pending
-> schedule_delayed_work
-> queue_delayed_work_on
since pending bit is grabbed,
just return without doing anything
-> set_work_pool_and_clear_pending
this __cancel_work return false and
the work would never be queued again

The root cause is that the PENDING_BIT of the work_struct would be set
temporaily in __cancel_work and this bit could prevent the work_struct
to be queued in another CPU.

Use deferred_probe_mutex to protect the cancel and queue operations for
the deferred_probe_timeout_work to fix this problem.

Fixes: 2b28a1a84a0e ("driver core: Extend deferred probe timeout on driver registration")
Cc: <stable@xxxxxxxxxxxxxxx>
Signed-off-by: Wang Wensheng <wangwensheng4@xxxxxxxxxx>
---
drivers/base/dd.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 13ab98e033ea..00419d2ee910 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -323,6 +323,7 @@ static DECLARE_DELAYED_WORK(deferred_probe_timeout_work, deferred_probe_timeout_
void deferred_probe_extend_timeout(void)
{
+ guard(mutex)(&deferred_probe_mutex);

But if you grab the lock here, in the probe timeout function, the lock
will be grabbed again, causing a deadlock, right? If not, why not?

It's not a sync version of cancel_work, so the execuation of the work function doesn't block us here, nor does the schedule_delayed_work does.

Indead, deferred_probe_mutex is used to protect the deferred_probe_*_list, it looks better to use a new lock here. Right?


Have you run this patch with lockdep enabled?

This feels broken to me, what am I missing?

thanks,

greg k-h