回复: [External] Re: [PATCH] nvme: fix heap-use-after-free and oops in bio_endio for nvme multipath

From: Lei Lei2 Yin
Date: Tue Mar 21 2023 - 09:30:46 EST



No, I have not verified this issue with a system larger than 5.10.y(such as 5.15.y and 6.0 or furthor), because some function we need like cgroup in upper version kernel has changed too much, we can't use these upper version kernel.


In addition , uptreams have change bi_disk's modify to bio_set_dev(bio, ns->disk->part0), and as you said there is no bi_disk in struct bio anymore. So that is too involving because of code dependencies, i want to do is what you said, to send an alternative surgical fix.
(I will confirm upstream for this problem in the near future, if it have same problem, i will submit this fix.)

I'm not sure what evidence is needed to prove this problem and patch. The following is child bio and parent bio struct when heap-use-after-free occur catched by crash(I turn on kasan and panic_on_warn).

Please help me confirm if this is enough, thanks.

all bio from nvme_ns_head_submit_bio to bio_endio is nvme head disk, and failed bio is origin bio's parent, and its bi_disk(0xffff888153ead000) is kfreed before kasan warn(I confirmed this by adding a log).


KERNEL: /usr/lib/debug/vmlinux [TAINTED]
DUMPFILE: vmcore [PARTIAL DUMP]
CPUS: 8
DATE: Mon Mar 20 19:43:39 CST 2023
UPTIME: 00:05:33
LOAD AVERAGE: 73.43, 20.60, 7.11
TASKS: 526
NODENAME: C8
RELEASE: 5.10.167
VERSION: #1 SMP Fri Feb 17 11:02:17 CST 2023
MACHINE: x86_64 (2194 Mhz)
MEMORY: 64 GB
PANIC: "Kernel panic - not syncing: KASAN: panic_on_warn set ..."
PID: 417
COMMAND: "kworker/5:1H"
TASK: ffff888126972040 [THREAD_INFO: ffff888126972040]
CPU: 5
STATE: TASK_RUNNING (PANIC)

crash> bt
PID: 417 TASK: ffff888126972040 CPU: 5 COMMAND: "kworker/5:1H"
#0 [ffff88810ebcf828] machine_kexec at ffffffff8f701b3e
#1 [ffff88810ebcf948] __crash_kexec at ffffffff8f9d28eb
#2 [ffff88810ebcfa60] panic at ffffffff913967e9
#3 [ffff88810ebcfb30] bio_endio at ffffffff902541f7
#4 [ffff88810ebcfb78] bio_endio at ffffffff902541f7
#5 [ffff88810ebcfba8] bio_endio at ffffffff902541f7
#6 [ffff88810ebcfbd8] nvme_ns_head_submit_bio at ffffffffc13cf960 [nvme_core]
#7 [ffff88810ebcfcc8] submit_bio_noacct at ffffffff9026b134
#8 [ffff88810ebcfdb8] nvme_requeue_work at ffffffffc13cdc40 [nvme_core]
#9 [ffff88810ebcfdf8] process_one_work at ffffffff8f8133c8
#10 [ffff88810ebcfe78] worker_thread at ffffffff8f813fb7
#11 [ffff88810ebcff10] kthread at ffffffff8f825e6f
#12 [ffff88810ebcff50] ret_from_fork at ffffffff8f60619f
crash> p *(struct bio *)0xffff8881890f4900 // child bio
$1 = {
bi_next = 0x0,
bi_disk = 0xdae00188000001a1,
bi_opf = 33605633,
bi_flags = 1922,
bi_ioprio = 0,
bi_write_hint = 0,
bi_status = 10 '\n',
bi_partno = 0 '\000',
__bi_remaining = {
counter = 1
},
bi_iter = {
bi_sector = 12287744,
bi_size = 65536,
bi_idx = 3,
bi_bvec_done = 106496
},
bi_end_io = 0xffffffff90254280 <bio_chain_endio>,
bi_private = 0xffff888198b778d0,
bi_blkg = 0x0,
bi_issue = {
value = 288230712376101481
},
{
bi_integrity = 0x0
},
bi_vcnt = 0,
bi_max_vecs = 0,
__bi_cnt = {
counter = 1
},
bi_io_vec = 0xffff8881a4530000,
bi_pool = 0xffff888141bd7af8,
bi_inline_vecs = 0xffff8881890f4978
}

crash> p *(struct bio *)0xffff888198b778d0 // parent bio
$2 = {
bi_next = 0x0,
bi_disk = 0xffff888153ead000,
bi_opf = 33589249,
bi_flags = 1664,
bi_ioprio = 0,
bi_write_hint = 0,
bi_status = 10 '\n',
bi_partno = 0 '\000',
__bi_remaining = {
counter = 0
},
bi_iter = {
bi_sector = 12288000,
bi_size = 0,
bi_idx = 5,
bi_bvec_done = 0
},
bi_end_io = 0xffffffff8ff8df80 <blkdev_bio_end_io_simple>,
bi_private = 0xffff8881b0c54080,
bi_blkg = 0xffff8881974df400,
bi_issue = {
value = 288230665264113654
},
{
bi_integrity = 0x0
},
bi_vcnt = 5,
bi_max_vecs = 256,
__bi_cnt = {
counter = 1
},
bi_io_vec = 0xffff8881a4530000,
bi_pool = 0x0,
bi_inline_vecs = 0xffff888198b77948
}



-----邮件原件-----
发件人: Sagi Grimberg <sagi@xxxxxxxxxxx>
发送时间: 2023年3月21日 20:26
收件人: Lei Lei2 Yin <yinlei2@xxxxxxxxxx>; kbusch@xxxxxxxxxx; axboe@xxxxxx; hch@xxxxxx
抄送: linux-nvme@xxxxxxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; cybeyond@xxxxxxxxxxx
主题: Re: [External] Re: [PATCH] nvme: fix heap-use-after-free and oops in bio_endio for nvme multipath


> Thank you for your reply
>
> This problem occurs in nvme over rdma and nvme over tcp with nvme generate multipath. Delete the ns gendisk is caused by nvmf target subsystem is faulty, then host detect all path keep alive overtime and io timeout. After ctrl-loss-tmo seconds, host will remove fail ctrl and ns gendisk.

That is fine, but it is a problem if it does not correctly drain inflight I/O, weather it was split or not. And this looks like the wrong place to address this.

> We have reappear this proble in Linux-5.10.136, Linux-5.10.167 and
> the latest commit in linux-5.10.y, and this patch is only applicable
> to Linux-5.10.y

So my understanding that this does not reproduce upstream?

>
> Yes , this is absolutely the wrong place to do this . Can i move this modification after nvme_trace_bio_complete?
>
> Do I need to resubmit a patch, if modifications are needed?

Yes, but a backport fix needs to be sent to stable mailing list
(stable@xxxxxxxxxxxxxxx) and cc'd to linux-nvme mailing list.

But I don't think that this fix is the correct one. What is needed is to identify where this was fixed upstream and backport that fix instead.
If that is too involving because of code dependencies, it may be possible to send an alternative surgical fix, but it needs to be justified.