Re: [v2] nbd: fix potential NULL pointer fault in nbd_genl_disconnect

From: Mike Christie
Date: Tue Jan 21 2020 - 20:07:01 EST


On 01/21/2020 08:09 AM, Josef Bacik wrote:
> On 1/20/20 7:45 AM, Sun Ke wrote:
>> Open /dev/nbdX first, the config_refs will be 1 and
>> the pointers in nbd_device are still null. Disconnect
>> /dev/nbdX, then reference a null recv_workq. The
>> protection by config_refs in nbd_genl_disconnect is useless.
>>
>> To fix it, just add a check for a non null task_recv in
>> nbd_genl_disconnect.
>>
>> Signed-off-by: Sun Ke <sunke32@xxxxxxxxxx>
>> ---
>> v1 -> v2:
>>
>> add an omitted mutex_unlock.
>> ---
>> drivers/block/nbd.c | 4 ++++
>> 1 file changed, 4 insertions(+)
>>
>> diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
>> index b4607dd96185..668bc9cb92ed 100644
>> --- a/drivers/block/nbd.c
>> +++ b/drivers/block/nbd.c
>> @@ -2008,6 +2008,10 @@ static int nbd_genl_disconnect(struct sk_buff
>> *skb, struct genl_info *info)
>> index);
>> return -EINVAL;
>> }
>> + if (!nbd->task_recv) {
>> + mutex_unlock(&nbd_index_mutex);
>> + return -EINVAL;
>> + }
>> if (!refcount_inc_not_zero(&nbd->refs)) {
>> mutex_unlock(&nbd_index_mutex);
>> printk(KERN_ERR "nbd: device at index %d is going down\n",
>>
>
> This doesn't even really protect us, we need to have the
> nbd->config_lock held here to make sure it's ok. The IOCTL path is safe
> because it creates the device on open so it's sure to exist by the time
> we get to the disconnect, we don't have that for genl_disconnect. So
> I'd add the config_mutex before getting the config_ref, and then do the
> check, something like
>
> mutex_lock(&nbd->config_lock);
> if (!refcount_inc_not_zero(&nbd->refs)) {
> }
> if (!nbd->recv_workq) {
> }
> mutex_unlock(&nbd->config_lock);
>

We will be doing a mix of checks/behavior. Maybe we want to settle on one?

It seems the code, before my patch, would let you do a open() then do a
nbd_genl_disconnect. This function would then try to cleanup what it
could and return success.

To keep the current behavior/style in nbd_disconnect_and_put would you
want to do:

nbd_disconnect_and_put()

....

if (nbd->task_recv)
flush_workqueue(nbd->recv_workq);

?

Alternatively, I think if we want to make it so calling
nbd_genl_disconnect is not allowed on a device that we have not done a
successful nbd_genl_connect/nbd_start_device call on then we want to add
the new state bit to indicate nbd_start_device was successful.

Or, we could stick to one variable that gets set at start and always use
that to indicate nbd_start_device was called ok. For example, for
nbd_genl_reconfigure we already check if task_recv is set to check if
nbd_start_device was called successfully.