Re: [PATCH] nbd: fix false lockdep deadlock warning

From: Yu Kuai
Date: Wed Jul 02 2025 - 03:31:15 EST


Hi,

在 2025/07/02 14:22, Nilay Shroff 写道:


On 7/2/25 8:02 AM, Ming Lei wrote:
On Wed, Jul 02, 2025 at 09:12:09AM +0800, Yu Kuai wrote:
Hi,

在 2025/07/01 21:28, Nilay Shroff 写道:


On 6/28/25 6:18 AM, Yu Kuai wrote:
Hi,

在 2025/06/27 19:04, Ming Lei 写道:
I guess the patch in the following link may be simper, both two take
similar approach:

https://lore.kernel.org/linux-block/aFjbavzLAFO0Q7n1@fedora/

I this the above approach has concurrent problems if nbd_start_device
concurrent with nbd_start_device:

t1:
nbd_start_device
lock
num_connections = 1
unlock
    t2:
    nbd_add_socket
    lock
    config->num_connections++
    unlock
        t3:
        nbd_start_device
        lock
        num_connections = 2
        unlock
        blk_mq_update_nr_hw_queues

blk_mq_update_nr_hw_queues
//nr_hw_queues updated to 1 before failure
return -EINVAL


In the above case, yes I see that t1 would return -EINVAL (as
config->num_connections doesn't match with num_connections)
but then t3 would succeed to update nr_hw_queue (as both
config->num_connections and num_connections set to 2 this
time). Isn't it? If yes, then the above patch (from Ming)
seems good.

Emm, I'm confused, If you agree with the concurrent process, then
t3 update nr_hw_queues to 2 first and return sucess, later t1 update
nr_hw_queues back to 1 and return failure.

It should be easy to avoid failure by simple retrying.

Yeah I think retry should be a safe bet here.


I really not sure about the retry, the above is just a scenario that I
think of with a quick review, and there are still many concurrent
scenarios that need to be checked, I'm kind of lost here.

Except nbd_start_device() and nbd_add_socked(), I'm not confident
other context that is synchronized with config_lock is not broken.
However, I'm ok with the bet.

On another note, synchronizing nbd_start_device and nbd_add_socket
using nbd->task_setup looks more complex and rather we may use
nbd->pid to synchronize both. We need to move setting of nbd->pid
before we invoke blk_mq_update_nr_hw_queues in nbd_start_device.
Then in nbd_add_socket we can evaluate nbd->pid and if it's
non-NULL then we could assume that either nr_hw_queues update is in
progress or device has been setup and so return -EBUSY. I think
anyways updating number of connections once device is configured
would not be possible, so once nbd_start_device is initiated, we
shall prevent user adding more connections. If we follow this
approach then IMO we don't need to add retry discussed above.

It's ok for me to forbit nbd_add_socked after nbd is configured, there
is nowhere to use the added sock. And if there really are other contexts
need to be synchronized, I think nbd->pid can be used as well.


Thanks,
--Nilay
.