Re: [RFC PATCH v2 1/2] scsi: ufs: Add Multi-Circular Queue support

From: Asutosh Das (asd)
Date: Wed Aug 24 2022 - 21:51:46 EST


Hello Bart,

Thanks for taking a look.

Sorry for the delay in response.

Please find the response to your comments below.

On 8/18/2022 7:41 PM, Bart Van Assche wrote:
On 8/11/22 03:33, Can Guo wrote:
[...]
+    /* One more reserved for dev_cmd_queue */
+    hba->nr_hw_queues = host->nr_hw_queues + 1;

Should '1' above perhaps be changed into 'dev_cmd_queue'? Are you sure that the comment above is in sync with the code?

+    ret = ufshcd_mcq_vops_config_rop(hba);
+    if (ret) {
+        dev_err(hba->dev, "MCQ Runtime Operation Pointers not configured\n");
+        goto out_err;
+    }
[...]
+static inline void ufshcd_mcq_process_event(struct ufs_hba *hba,
+                        struct ufs_hw_queue *hwq)
+{
+    struct cq_entry *cqe = ufshcd_mcq_cur_cqe(hwq);
+    int tag;
+
+    tag = ufshcd_mcq_get_tag(hba, hwq, cqe);
+    ufshcd_compl_one_task(hba, tag, cqe);
+}

Consider changing "process_event" into "process_cqe". Consider renaming ufshcd_compl_one_task() into ufshcd_compl_one_cqe().

The preparatory patch that would precede this change would define ufshcd_compl_one_task() in ufshcd.c. Since this function would be invoked both from Single Doorbell mode and MCQ mode, ufshcd_compl_one_task() sounds more relevant. What say?

+unsigned long ufshcd_mcq_poll_cqe_lock(struct ufs_hba *hba,
+                       struct ufs_hw_queue *hwq)
+{
+    unsigned long completed_reqs, flags;
+
+    spin_lock_irqsave(&hwq->cq_lock, flags);
+    completed_reqs = ufshcd_mcq_poll_cqe_nolock(hba, hwq);
+    spin_unlock_irqrestore(&hwq->cq_lock, flags);
+
+    return completed_reqs;
+}

Why are interrupts disabled around ufshcd_mcq_poll_cqe_nolock() calls?

Why are the ufshcd_mcq_poll_cqe_nolock() protected by a spinlock?

Because ufshcd_mcq_poll_cqe_lock() is invoked by ufshcd_poll() which may be invoked simultaneously from different CPUs.
But only spin_[un]lock() variant can suffice here.

[...]

+static irqreturn_t ufshcd_handle_mcq_cq_events(struct ufs_hba *hba)
+{
+    struct ufs_hw_queue *hwq;
+    unsigned long outstanding_cqs;
+    unsigned int nr_queues;
+    int i, ret;
+    u32 events;
+
+    ret = ufshcd_vops_get_outstanding_cqs(hba, &outstanding_cqs);
+    if (ret)
+        outstanding_cqs = (1U << hba->nr_hw_queues) - 1;
+
+    /* Exclude the poll queues */
+    nr_queues = hba->nr_hw_queues - hba->nr_queues[HCTX_TYPE_POLL];
+    for_each_set_bit(i, &outstanding_cqs, nr_queues) {
+        hwq = &hba->uhq[i];
+
+        events = ufshcd_mcq_read_cqis(hba, i);
+        if (events)
+            ufshcd_mcq_write_cqis(hba, events, i);
+
+        if (events & UFSHCD_MCQ_CQIS_TEPS)
+            ufshcd_mcq_poll_cqe_nolock(hba, hwq);
+    }
+
+    return IRQ_HANDLED;
+}

Why the loop over the completion queues? Shouldn't UFSHCI 4.0 compliant controllers support one interrupt per completion queue?

MCQ specification doesn't define that UFSHCI 4.0 compliant HC should support one interrupt per completion queue. I guess it would depend on HC vendors. But it specifies ESI as an alternate method; which is implemented in this patch.

-/* Complete requests that have door-bell cleared */
+/*
+ * Complete requests that have door-bell cleared and/or pending completion
+ * entries on completion queues if MCQ is enabled
+ */

Since the above comment has been changed, please spell the word doorbell correctly (no hyphen).

@@ -6823,7 +6947,7 @@ static int ufshcd_issue_devman_upiu_cmd(struct ufs_hba *hba,
                      enum query_opcode desc_op)
  {
      DECLARE_COMPLETION_ONSTACK(wait);
-    const u32 tag = hba->reserved_slot;
+    u32 tag = hba->reserved_slot;

Why has the 'const' keyword been removed?

+    if (hba->nutrs != old_nutrs) {
+        ufshcd_release_sdb_queue(hba, old_nutrs);
+        ret = ufshcd_memory_alloc(hba);
+        if (ret)
+            return ret;
+        ufshcd_host_memory_configure(hba);
+    }

Can this freeing + reallocating be avoided?

Umm, we thought about this. Only after reading the device params, the ext_iid support and the device queue depth be determined. So didn't look like there's any other way than this. If you have any ideas, please let us know.

Agree with the rest of the suggestions, would address it in the next version.

-asd/can