6.2 nvme-pci: something wrong

From: Hugh Dickins
Date: Sat Dec 24 2022 - 00:25:18 EST


Hi Christoph,

There's something wrong with the nvme-pci heading for 6.2-rc1:
no problem booting here on this Lenovo ThinkPad X1 Carbon 5th,
but under load...

nvme nvme0: I/O 0 (I/O Cmd) QID 2 timeout, aborting
nvme nvme0: I/O 1 (I/O Cmd) QID 2 timeout, aborting
nvme nvme0: I/O 2 (I/O Cmd) QID 2 timeout, aborting
nvme nvme0: I/O 3 (I/O Cmd) QID 2 timeout, aborting
nvme nvme0: Abort status: 0x0
nvme nvme0: Abort status: 0x0
nvme nvme0: Abort status: 0x0
nvme nvme0: Abort status: 0x0
nvme nvme0: I/O 0 QID 2 timeout, reset controller

...and more, until I just have to poweroff and reboot.

Bisection points to your
0da7feaa5913 ("nvme-pci: use the tagset alloc/free helpers")
And that does revert cleanly, giving a kernel which shows no problem.

I've spent a while comparing old nvme_pci_alloc_tag_set() and new
nvme_alloc_io_tag_set(), I do not know my way around there at all
and may be talking nonsense, but it did look as if there might now
be a difference in the queue_depth, sqsize, q_depth conversions.

I'm running load successfully with the patch below, but I strongly
suspect that the right patch will be somewhere else: over to you!

Hugh

--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -4926,7 +4926,7 @@ int nvme_alloc_io_tag_set(struct nvme_ct

memset(set, 0, sizeof(*set));
set->ops = ops;
- set->queue_depth = ctrl->sqsize + 1;
+ set->queue_depth = ctrl->sqsize;
/*
* Some Apple controllers requires tags to be unique across admin and
* the (only) I/O queue, so reserve the first 32 tags of the I/O queue.