Re: 6.2 nvme-pci: something wrong

From: Christoph Hellwig
Date: Sun Dec 25 2022 - 00:31:07 EST


On Sat, Dec 24, 2022 at 03:06:38PM -0700, Keith Busch wrote:
> Your observation is a queue-wrap condition that makes it impossible for
> the controller know there are new commands.
>
> Your patch does look like the correct thing to do. The "zero means one"
> thing is a confusing distraction, I think. It makes more sense if you
> consider sqsize as the maximum number of tags we can have outstanding at
> one time and it looks like all the drivers set it that way. We're
> supposed to leave one slot empty for a full NVMe queue, so adding one
> here to report the total number slots isn't right since that would allow
> us to fill all slots.

Yes, and pcie did actually do the ‐ 1 from q_depth, so we should
drop the +1 for sqsize. And add back the missing BLK_MQ_MAX_DEPTH.
But we still need to keep sqsize updated as well.

> Fabrics drivers have been using this method for a while, though, so
> interesting they haven't had a simiar problem.

Fabrics doesn't have a real queue and thus no actual wrap, so
I don't think they will be hit as bad by this.

So we'll probably need something like this, split into two patches.
And then for 6.2 clean up the sqsize vs q_depth mess for real.

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 95c488ea91c303..5b723c65fbeab5 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -4926,7 +4926,7 @@ int nvme_alloc_io_tag_set(struct nvme_ctrl *ctrl, struct blk_mq_tag_set *set,

memset(set, 0, sizeof(*set));
set->ops = ops;
- set->queue_depth = ctrl->sqsize + 1;
+ set->queue_depth = min_t(unsigned, ctrl->sqsize, BLK_MQ_MAX_DEPTH - 1);
/*
* Some Apple controllers requires tags to be unique across admin and
* the (only) I/O queue, so reserve the first 32 tags of the I/O queue.
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index f0f8027644bbf8..ec5e1c578a710b 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2332,10 +2332,12 @@ static int nvme_setup_io_queues(struct nvme_dev *dev)
if (dev->cmb_use_sqes) {
result = nvme_cmb_qdepth(dev, nr_io_queues,
sizeof(struct nvme_command));
- if (result > 0)
+ if (result > 0) {
dev->q_depth = result;
- else
+ dev->ctrl.sqsize = dev->q_depth - 1;
+ } else {
dev->cmb_use_sqes = false;
+ }
}

do {
@@ -2536,7 +2538,6 @@ static int nvme_pci_enable(struct nvme_dev *dev)

dev->q_depth = min_t(u32, NVME_CAP_MQES(dev->ctrl.cap) + 1,
io_queue_depth);
- dev->ctrl.sqsize = dev->q_depth - 1; /* 0's based queue depth */
dev->db_stride = 1 << NVME_CAP_STRIDE(dev->ctrl.cap);
dev->dbs = dev->bar + 4096;

@@ -2577,7 +2578,7 @@ static int nvme_pci_enable(struct nvme_dev *dev)
dev_warn(dev->ctrl.device, "IO queue depth clamped to %d\n",
dev->q_depth);
}
-
+ dev->ctrl.sqsize = dev->q_depth - 1; /* 0's based queue depth */

nvme_map_cmb(dev);