[PATCH] nvme-tcp: Do not reset transport on data digest errors

From: Daniel Wagner
Date: Thu Aug 05 2021 - 08:15:56 EST


The spec says

7.4.6.1 Digest Error handling

When a host detects a data digest error in a C2HData PDU, that host
shall continue processing C2HData PDUs associated with the command and
when the command processing has completed, if a successful status was
returned by the controller, the host shall fail the command with a
non-fatal transport error.

Currently the transport is reseted when a data digest error is
detected. To fix this, keep track of the final status in the queue
object and use it when completing the request.

The new member can be placed adjacent to the receive related members and
fits in the cacheline as there is a 4 byte hole.

Signed-off-by: Daniel Wagner <dwagner@xxxxxxx>
---

Hi,

I've tested this by modifying the receive path. Via the fault_inject
interface I injecting wrong hash values. The request would then be
completed with status != 0 and nvme_decide_disposition decices to
retry the request. So this seems be in more sync with what the spec
says on this topic.

Daniel

drivers/nvme/host/tcp.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 097f7dd10ed3..5253147df4c7 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -89,6 +89,7 @@ struct nvme_tcp_queue {
size_t data_remaining;
size_t ddgst_remaining;
unsigned int nr_cqe;
+ u16 status;

/* send state */
struct nvme_tcp_request *request;
@@ -496,7 +497,8 @@ static int nvme_tcp_process_nvme_cqe(struct nvme_tcp_queue *queue,
return -EINVAL;
}

- if (!nvme_try_complete_req(rq, cqe->status, cqe->result))
+ if (!nvme_try_complete_req(rq, queue->status ?
+ queue->status : cqe->status, cqe->result))
nvme_complete_rq(rq);
queue->nr_cqe++;

@@ -676,6 +678,7 @@ static int nvme_tcp_recv_pdu(struct nvme_tcp_queue *queue, struct sk_buff *skb,

switch (hdr->type) {
case nvme_tcp_c2h_data:
+ queue->status = NVME_SC_SUCCESS;
return nvme_tcp_handle_c2h_data(queue, (void *)queue->pdu);
case nvme_tcp_rsp:
nvme_tcp_init_recv_ctx(queue);
@@ -758,7 +761,7 @@ static int nvme_tcp_recv_data(struct nvme_tcp_queue *queue, struct sk_buff *skb,
queue->ddgst_remaining = NVME_TCP_DIGEST_LENGTH;
} else {
if (pdu->hdr.flags & NVME_TCP_F_DATA_SUCCESS) {
- nvme_tcp_end_request(rq, NVME_SC_SUCCESS);
+ nvme_tcp_end_request(rq, queue->status);
queue->nr_cqe++;
}
nvme_tcp_init_recv_ctx(queue);
@@ -792,14 +795,14 @@ static int nvme_tcp_recv_ddgst(struct nvme_tcp_queue *queue,
"data digest error: recv %#x expected %#x\n",
le32_to_cpu(queue->recv_ddgst),
le32_to_cpu(queue->exp_ddgst));
- return -EIO;
+ queue->status = NVME_SC_DATA_XFER_ERROR;
}

if (pdu->hdr.flags & NVME_TCP_F_DATA_SUCCESS) {
struct request *rq = nvme_cid_to_rq(nvme_tcp_tagset(queue),
pdu->command_id);

- nvme_tcp_end_request(rq, NVME_SC_SUCCESS);
+ nvme_tcp_end_request(rq, queue->status);
queue->nr_cqe++;
}

--
2.29.2