Re: [External] Re: [PATCH] nvme: fix heap-use-after-free and oops in bio_endio for nvme multipath

From: Sagi Grimberg
Date: Tue Mar 21 2023 - 08:27:04 EST



Thank you for your reply

This problem occurs in nvme over rdma and nvme over tcp with nvme generate multipath. Delete the ns gendisk is caused by nvmf target subsystem is faulty, then host detect all path keep alive overtime and io timeout. After ctrl-loss-tmo seconds, host will remove fail ctrl and ns gendisk.

That is fine, but it is a problem if it does not correctly drain
inflight I/O, weather it was split or not. And this looks like the wrong
place to address this.

We have reappear this proble in Linux-5.10.136, Linux-5.10.167 and the latest commit in linux-5.10.y, and this patch is only applicable to Linux-5.10.y

So my understanding that this does not reproduce upstream?


Yes , this is absolutely the wrong place to do this . Can i move this modification after nvme_trace_bio_complete?

Do I need to resubmit a patch, if modifications are needed?

Yes, but a backport fix needs to be sent to stable mailing list
(stable@xxxxxxxxxxxxxxx) and cc'd to linux-nvme mailing list.

But I don't think that this fix is the correct one. What is needed is
to identify where this was fixed upstream and backport that fix instead.
If that is too involving because of code dependencies, it may be
possible to send an alternative surgical fix, but it needs to be
justified.