Re: [RFC PATCH v3 2/4] ufs: core: mcq: Adds Multi-Circular Queue support

From: Bart Van Assche
Date: Thu Sep 08 2022 - 19:22:06 EST


On 9/2/22 15:41, Asutosh Das wrote:
In v4.0, UFSHCI specification has added Multi-Circular Queue
support to enhance performance. It is backward compatible to
the legacy Single Doorbell mode.

The above may be confusing for someone who is not familiar with the UFSHCI 4.0 specification. Consider changing the above text into the following: "The UFSHCI 4.0 register interface defines two modes: (1) a legacy mode that is compatible with UFSHCI 3.0 and (2) an MCQ mode in which multiple submission and completion queues are supported. A capability register indicates whether (1) and/or (2) are supported."

+static int rw_queue_count_set(const char *val, const struct kernel_param *kp)
+{
+ unsigned int n;
+ int ret;
+
+ ret = kstrtouint(val, 10, &n);
+ if (ret)
+ return ret;
+ if (n > num_possible_cpus())
+ return -EINVAL;
+ return param_set_uint(val, kp);
+}
+
+static const struct kernel_param_ops rw_queue_count_ops = {
+ .set = rw_queue_count_set,
+ .get = param_get_uint,
+};
+
+static unsigned int rw_queues = 8;
+module_param_cb(rw_queues, &rw_queue_count_ops, &rw_queues, 0644);
+MODULE_PARM_DESC(rw_queues,
+ "Number of interrupt driven I/O queues used for rw. Default value is 8");

rw_queues should be >= 1 and <= 32. However, rw_queue_count_set() allows to set this variable to zero and also to values above 32. I think this needs to be improved.

+/* resources */
+static const struct ufshcd_res_info_t ufshcd_res_info[RES_MAX] = {
+ {"ufs_mem", NULL, NULL},
+ {"mcq", NULL, NULL},
+ {"mcq_sqd", NULL, NULL},
+ {"mcq_sqis", NULL, NULL},
+ {"mcq_cqd", NULL, NULL},
+ {"mcq_cqis", NULL, NULL},
+ {"mcq_vs", NULL, NULL},
+};

Please make the above easier to verify by using designated initializers ([RES_MEM] = { .name = "ufs_mem" }, ...).

+
+/**
+ * ufshcd_mcq_decide_queue_depth - decide the queue depth
+ * @hba - per adapter instance
+ *
+ * MAC - Max. Active Command of the Host Controller (HC)
+ * HC wouldn't send more than this commands to the device.

wouldn't -> won't

+ /* MAC is a 0 based value. */
+ mac += 1;
+ qd = min_t(u32, mac, hba->dev_info.bqueuedepth);
+ if (!qd)
+ qd = mac;

Please use min_not_zero instead of open-coding it.

+}
+
+/**
+ * ufshcd_mcq_config_mac - Set the #Max Activ Cmds.
+ * @hba - per adpater instance

adpater -> adapter

+ * @max_active_cmds - maximum # of active commands to the device at any time.
+ *
+ * The controller wouldn't send more than the max_active_cmds to the device at

wouldn't -> won't

+/* ROP - Runtime and operation registers configuration */

The above comment seems misleading to me. In the UFSHCI 4.0 specification the operation and runtime registers start with IS - Interrupt Status while the definitions below are for the MCQ submission and completion queue registers.

+#define MCQ_CFG_n(r, i) \
+ ((r) + MCQ_QCFG_SIZE * (i))

Please add a clear comment above this macro that explains its role and also the meaning of its arguments. I assume that 'i' is the queue index?

+#define MCQ_ROP_OFFSET_n(p, i) \
+ (hba->mcq_rop[(p)].offset + hba->mcq_rop[(p)].stride * (i))

Also for this macro, please explain its role. I assume that MCQ_CFG_n() is used to calculate offsets of registers defined in the UFSHCI 4.0 specification and that MCQ_ROP_OFFSET_n() is used to calculate offsets in the data structure with submission and completion queues?

+ /* Sbmission Qqueue Doorbell Address Offset */

Sbmission -> Submission
Qqueue -> Queue

+ /* Save the base addresses for quicker access */

base -> head and tail?

+ /*
+ * Submission Qeueue Enable|Size|Completion Queue ID to
+ * Submission Queue Attribute
+ */
+ ufsmcq_writel(hba, (1 << 31) | qsize | (i << 16),
+ MCQ_CFG_n(REG_SQATTR, i));

Qeueue -> Queue

+static void ufshcd_mcq_config_nr_queues(struct ufs_hba *hba)
+{
+ int i, rem;
+ u32 hbaq_cap, cmp;
+ struct Scsi_Host *host = hba->host;
+
+ hbaq_cap = hba->mcq_capabilities & 0xff;
+
+ rem = hbaq_cap - dev_cmd_queue;

Please handle the case dev_cmd_queue > hbaq_cap (rem < 0) explicitly.

+ /* min() compares variables of same type */

Please remove this comment. Every kernel developer should know this.

+ hba->nr_queues[HCTX_TYPE_DEFAULT] = min(hbaq_cap - dev_cmd_queue,
+ num_possible_cpus());
+ rem -= hba->nr_queues[HCTX_TYPE_DEFAULT];
+ if (rem <= 0)
+ goto out;
+ cmp = rem;
+ hba->nr_queues[HCTX_TYPE_POLL] = min(cmp, poll_queues);
+ rem -= hba->nr_queues[HCTX_TYPE_POLL];
+ if (rem <= 0)
+ goto out;
+ cmp = rem;
+ hba->nr_queues[HCTX_TYPE_READ] = min(cmp, read_queues);
+
+out:
+ for (i = 0; i < HCTX_MAX_TYPES; i++)
+ host->nr_hw_queues += hba->nr_queues[i];
+
+ hba->nr_hw_queues = host->nr_hw_queues + dev_cmd_queue;

I think there are code paths in this function that can result in host->nr_hw_queues becoming larger than the number of submission queues supported by the controller. Please fix.

+static int ufshcd_mcq_get_tag(struct ufs_hba *hba,
+ struct ufs_hw_queue *hwq,
+ struct cq_entry *cqe)
+{
+ dma_addr_t dma_addr;
+
+ /* Bits 31:7 UCD base address, 6:5 are reserved, 4:0 is SQ ID */
+ dma_addr = ((u64)le32_to_cpu(cqe->command_desc_base_addr_hi) << 32) |
+ (le32_to_cpu(cqe->command_desc_base_addr_lo) & CQE_UCD_BA);

The above code would become faster if command_desc_base_addr_hi and command_desc_base_addr_lo would be combined into a single __le64 member variable. Has this been considered?

+static inline void ufshcd_inc_tp(struct ufs_hw_queue *q)
+{
+ u32 mask = q->max_entries - 1;
+ u32 val;
+
+ q->sq_tp_slot = (q->sq_tp_slot + 1) & mask;
+ val = q->sq_tp_slot * sizeof(struct utp_transfer_req_desc);
+ writel(val, q->mcq_sq_tp);
+}

Please consider changing "tp" into "tail" or "tail_pointer" and "hp" into "head" or "head_pointer" to improve code readability.

+EXPORT_SYMBOL_GPL(ufshcd_transfer_req_compl);

Why has this function been exported?

+enum ufshcd_res {
+ RES_MEM,
+ RES_MCQ,
+ RES_MCQ_SQD,
+ RES_MCQ_SQIS,
+ RES_MCQ_CQD,
+ RES_MCQ_CQIS,
+ RES_MCQ_VS,
+ RES_MAX,
+};

Please add a comment that explains the SQD, SQIS etc. acronyms.

+/* MCQ Runtime Operation Pointer info structure */
+struct ufshcd_mcq_rop_info_t {
+ unsigned long offset;
+ unsigned long stride;
+ void __iomem *base;
+};

Please document the role of the members of this data structure.

Thanks,

Bart.