Re: [PATCH -next v3 2/6] nbd: fix race between nbd_alloc_config() and module removal

From: Josef Bacik
Date: Mon May 23 2022 - 10:15:12 EST


On Sat, May 21, 2022 at 03:37:45PM +0800, Yu Kuai wrote:
> When nbd module is being removing, nbd_alloc_config() may be
> called concurrently by nbd_genl_connect(), although try_module_get()
> will return false, but nbd_alloc_config() doesn't handle it.
>
> The race may lead to the leak of nbd_config and its related
> resources (e.g, recv_workq) and oops in nbd_read_stat() due
> to the unload of nbd module as shown below:
>
> BUG: kernel NULL pointer dereference, address: 0000000000000040
> Oops: 0000 [#1] SMP PTI
> CPU: 5 PID: 13840 Comm: kworker/u17:33 Not tainted 5.14.0+ #1
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
> Workqueue: knbd16-recv recv_work [nbd]
> RIP: 0010:nbd_read_stat.cold+0x130/0x1a4 [nbd]
> Call Trace:
> recv_work+0x3b/0xb0 [nbd]
> process_one_work+0x1ed/0x390
> worker_thread+0x4a/0x3d0
> kthread+0x12a/0x150
> ret_from_fork+0x22/0x30
>
> Fixing it by checking the return value of try_module_get()
> in nbd_alloc_config(). As nbd_alloc_config() may return ERR_PTR(-ENODEV),
> assign nbd->config only when nbd_alloc_config() succeeds to ensure
> the value of nbd->config is binary (valid or NULL).
>
> Also adding a debug message to check the reference counter
> of nbd_config during module removal.
>
> Signed-off-by: Hou Tao <houtao1@xxxxxxxxxx>
> Signed-off-by: Yu Kuai <yukuai3@xxxxxxxxxx>

Reviewed-by: Josef Bacik <josef@xxxxxxxxxxxxxx>

Thanks,

Josef