Re: [External] Re: [PATCH] nvme: fix heap-use-after-free and oops in bio_endio for nvme multipath

From: Lei Lei2 Yin
Date: Tue Mar 21 2023 - 07:55:10 EST


Thank you for your reply

This problem occurs in nvme over rdma and nvme over tcp with nvme generate multipath. Delete the ns gendisk is caused by nvmf target subsystem is faulty, then host detect all path keep alive overtime and io timeout. After ctrl-loss-tmo seconds, host will remove fail ctrl and ns gendisk.

We have reappear this proble in Linux-5.10.136, Linux-5.10.167 and the latest commit in linux-5.10.y, and this patch is only applicable to Linux-5.10.y

Yes , this is absolutely the wrong place to do this . Can i move this modification after nvme_trace_bio_complete?

Do I need to resubmit a patch, if modifications are needed?



-----邮件原件-----
发件人: Sagi Grimberg <sagi@xxxxxxxxxxx>
发送时间: 2023年3月21日 19:09
收件人: Lei Lei2 Yin <yinlei2@xxxxxxxxxx>; kbusch@xxxxxxxxxx; axboe@xxxxxx; hch@xxxxxx
抄送: linux-nvme@xxxxxxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; cybeyond@xxxxxxxxxxx
主题: [External] Re: [PATCH] nvme: fix heap-use-after-free and oops in bio_endio for nvme multipath



On 3/21/23 12:50, Lei Lei2 Yin wrote:
> From b134e7930b50679ce48e5522ddd37672b1802340 Mon Sep 17 00:00:00
> 2001
> From: Lei Yin <yinlei2@xxxxxxxxxx>
> Date: Tue, 21 Mar 2023 16:09:08 +0800
> Subject: [PATCH] nvme: fix heap-use-after-free and oops in bio_endio for nvme
> multipath
>
> When blk_queue_split works in nvme_ns_head_submit_bio, input bio will
> be splited to two bios. If parent bio is completed first, and the
> bi_disk in parent bio is kfreed by nvme_free_ns, child will access
> this freed bi_disk in bio_endio. This will trigger heap-use-after-free
> or null pointer oops.

Can you explain further? It is unclear to me how we can delete the ns gendisk

>
> The following is kasan report:
>
> BUG: KASAN: use-after-free in bio_endio+0x477/0x500 Read of size 8 at
> addr ffff888106f2e3a8 by task kworker/1:1H/241
>
> CPU: 1 PID: 241 Comm: kworker/1:1H Kdump: loaded Tainted: G O
> 5.10.167 #1
> Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
> Workqueue: kblockd nvme_requeue_work [nvme_core] Call Trace:
> dump_stack+0x92/0xc4
> ? bio_endio+0x477/0x500
> print_address_description.constprop.7+0x1e/0x230
> ? record_print_text.cold.40+0x11/0x11
> ? _raw_spin_trylock_bh+0x120/0x120
> ? blk_throtl_bio+0x225/0x3050
> ? bio_endio+0x477/0x500
> ? bio_endio+0x477/0x500
> kasan_report.cold.9+0x37/0x7c
> ? bio_endio+0x477/0x500
> bio_endio+0x477/0x500
> nvme_ns_head_submit_bio+0x950/0x1130 [nvme_core]
> ? nvme_find_path+0x7f0/0x7f0 [nvme_core]
> ? __kasan_slab_free+0x11a/0x150
> ? bio_endio+0x213/0x500
> submit_bio_noacct+0x2a4/0xd10
> ? _dev_info+0xcd/0xff
> ? _dev_notice+0xff/0xff
> ? blk_queue_enter+0x6c0/0x6c0
> ? _raw_spin_lock_irq+0x81/0xd5
> ? _raw_spin_lock+0xd0/0xd0
> nvme_requeue_work+0x144/0x18c [nvme_core]
> process_one_work+0x878/0x13e0
> worker_thread+0x87/0xf70
> ? __kthread_parkme+0x8f/0x100
> ? process_one_work+0x13e0/0x13e0
> kthread+0x30f/0x3d0
> ? kthread_parkme+0x80/0x80
> ret_from_fork+0x1f/0x30
>
> Allocated by task 52:
> kasan_save_stack+0x19/0x40
> __kasan_kmalloc.constprop.11+0xc8/0xd0
> __alloc_disk_node+0x5c/0x320
> nvme_alloc_ns+0x6e9/0x1520 [nvme_core]
> nvme_validate_or_alloc_ns+0x17c/0x370 [nvme_core]
> nvme_scan_work+0x2d4/0x4d0 [nvme_core]
> process_one_work+0x878/0x13e0
> worker_thread+0x87/0xf70
> kthread+0x30f/0x3d0
> ret_from_fork+0x1f/0x30
>
> Freed by task 54:
> kasan_save_stack+0x19/0x40
> kasan_set_track+0x1c/0x30
> kasan_set_free_info+0x1b/0x30
> __kasan_slab_free+0x108/0x150
> kfree+0xa7/0x300
> device_release+0x98/0x210
> kobject_release+0x109/0x3a0
> nvme_free_ns+0x15e/0x1f7 [nvme_core]
> nvme_remove_namespaces+0x22f/0x390 [nvme_core]
> nvme_do_delete_ctrl+0xac/0x106 [nvme_core]
> process_one_work+0x878/0x13e0
> worker_thread+0x87/0xf70
> kthread+0x30f/0x3d0
> ret_from_fork+0x1f/0x30
>
> Signed-off-by: Lei Yin <yinlei2@xxxxxxxxxx>
> ---
> drivers/nvme/host/nvme.h | 11 ++++++++++-
> 1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h index
> c3e4d9b6f9c0..b441c5ce4157 100644
> --- a/drivers/nvme/host/nvme.h
> +++ b/drivers/nvme/host/nvme.h
> @@ -749,8 +749,17 @@ static inline void nvme_trace_bio_complete(struct request *req,
> {
> struct nvme_ns *ns = req->q->queuedata;
>
> - if ((req->cmd_flags & REQ_NVME_MPATH) && req->bio)
> + if ((req->cmd_flags & REQ_NVME_MPATH) && req->bio) {
> trace_block_bio_complete(ns->head->disk->queue, req->bio);
> +
> + /* Point bio->bi_disk to head disk.
> + * This bio maybe as other bio's parent in bio chain. If this bi_disk
> + * is kfreed by nvme_free_ns, other bio may get this bio by __bio_chain_endio
> + * in bio_endio, and access this bi_disk. This will trigger heap-use-after-free
> + * or null pointer oops.
> + */
> + req->bio->bi_disk = ns->head->disk;
> + }

This is absolutely the wrong place to do this. This is a tracing function, it should not have any other logic.

What tree is this against anyways? There is no bi_disk in struct bio anymore.