RE: [PATCH 4/5] nvmet-rdma: add a NVMe over Fabrics RDMA target driver

From: Steve Wise
Date: Tue Jun 14 2016 - 12:22:28 EST


>
> Hey Sean,
>
> Am I correct here? IE: Is it ok for the rdma application to rdma_reject() and
> rmda_destroy_id() the CONNECT_REQUEST cm_id _inside_ its event handler as
> long
> as it returns 0?
>
> Thanks,
>
> Steve.


Looking at rdma_destroy_id(), I think it is invalid to call it from the event
handler:

void rdma_destroy_id(struct rdma_cm_id *id)
{

<snip>

/*
* Wait for any active callback to finish. New callbacks will find
* the id_priv state set to destroying and abort.
*/
mutex_lock(&id_priv->handler_mutex);
mutex_unlock(&id_priv->handler_mutex);

And indeed when I tried to destroy the CONNECT request cm_id in the nvmet event
handler, I see the event handler thread is stuck:

INFO: task kworker/u32:0:6275 blocked for more than 120 seconds.
Tainted: G E 4.7.0-rc2-nvmf-all.3+ #81
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/u32:0 D ffff880f90737768 0 6275 2 0x10000080
Workqueue: iw_cm_wq cm_work_handler [iw_cm]
ffff880f90737768 ffff880f907376d8 ffffffff81c0b500 0000000000000005
ffff8810226a4940 ffff88102b894490 ffffffffa02cf4cd ffff880f00000000
ffff880fcd917c00 ffff880f00000000 0000000000000004 ffff880f00000000
Call Trace:
[<ffffffffa02cf4cd>] ? stop_ep_timer+0x2d/0xe0 [iw_cxgb4]
[<ffffffff8163e6a7>] schedule+0x47/0xc0
[<ffffffffa024d276>] ? iw_cm_reject+0x96/0xe0 [iw_cm]
[<ffffffff8163e8e5>] schedule_preempt_disabled+0x15/0x20
[<ffffffff8163fd78>] __mutex_lock_slowpath+0x108/0x310
[<ffffffff8163ffb1>] mutex_lock+0x31/0x50
[<ffffffffa0261498>] rdma_destroy_id+0x38/0x200 [rdma_cm]
[<ffffffffa03145f0>] ? nvmet_rdma_queue_connect+0x1a0/0x1a0 [nvmet_rdma]
[<ffffffffa0262fe1>] ? rdma_create_id+0x171/0x1a0 [rdma_cm]
[<ffffffffa03146f8>] nvmet_rdma_cm_handler+0x108/0x168 [nvmet_rdma]
[<ffffffffa026407a>] iw_conn_req_handler+0x1ca/0x240 [rdma_cm]
[<ffffffffa024efc6>] cm_conn_req_handler+0x606/0x680 [iw_cm]
[<ffffffffa024f109>] process_event+0xc9/0xf0 [iw_cm]
[<ffffffffa024f277>] cm_work_handler+0x147/0x1c0 [iw_cm]
[<ffffffff8107d4f6>] ? trace_event_raw_event_workqueue_execute_start+0x66/0xa0
[<ffffffff81081736>] process_one_work+0x1c6/0x550
...

So I withdraw my comment about nvmet. I think the code is fine as-is. The 2nd
reject results in a no-op since the connection request was rejected by nvmet.

Steve.