RE: [Patch v2] storvsc: setup 1:1 mapping between hardware queue and CPU queue

From: Long Li
Date: Thu Aug 22 2019 - 18:29:23 EST


>>>Subject: RE: [Patch v2] storvsc: setup 1:1 mapping between hardware
>>>queue and CPU queue
>>>
>>>>>>Subject: RE: [Patch v2] storvsc: setup 1:1 mapping between hardware
>>>>>>queue and CPU queue
>>>>>>
>>>>>>From: Long Li <longli@xxxxxxxxxxxxxxxxx> Sent: Thursday, August 22,
>>>>>>2019
>>>>>>1:42 PM
>>>>>>>
>>>>>>> storvsc doesn't use a dedicated hardware queue for a given CPU
>>>queue.
>>>>>>> When issuing I/O, it selects returning CPU (hardware queue)
>>>>>>> dynamically based on vmbus channel usage across all channels.
>>>>>>>
>>>>>>> This patch advertises num_possible_cpus() as number of hardware
>>>>>>> queues. This will have upper layer setup 1:1 mapping between
>>>>>>> hardware queue and CPU queue and avoid unnecessary locking when
>>>issuing I/O.
>>>>>>>
>>>>>>> Changes:
>>>>>>> v2: rely on default upper layer function to map queues. (suggested
>>>>>>> by Ming Lei
>>>>>>> <tom.leiming@xxxxxxxxx>)
>>>>>>>
>>>>>>> Signed-off-by: Long Li <longli@xxxxxxxxxxxxx>
>>>>>>> ---
>>>>>>> drivers/scsi/storvsc_drv.c | 3 +--
>>>>>>> 1 file changed, 1 insertion(+), 2 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/scsi/storvsc_drv.c
>>>>>>> b/drivers/scsi/storvsc_drv.c index b89269120a2d..dfd3b76a4f89
>>>>>>> 100644
>>>>>>> --- a/drivers/scsi/storvsc_drv.c
>>>>>>> +++ b/drivers/scsi/storvsc_drv.c
>>>>>>> @@ -1836,8 +1836,7 @@ static int storvsc_probe(struct hv_device
>>>>>>*device,
>>>>>>> /*
>>>>>>> * Set the number of HW queues we are supporting.
>>>>>>> */
>>>>>>> - if (stor_device->num_sc != 0)
>>>>>>> - host->nr_hw_queues = stor_device->num_sc + 1;
>>>>>>> + host->nr_hw_queues = num_possible_cpus();
>>>>>>
>>>>>>For a lot of the VM sizes in Azure, num_possible_cpus() is 128, even
>>>>>>if the VM has only 4 or 8 or some other smaller number of vCPUs.
>>>>>>So I'm wondering if you really want num_present_cpus() here instead,
>>>>>>which would include only the vCPUs that actually exist in the VM.
>>>
>>>I think reporting num_possible_cpus() doesn't do more harm or take more
>>>resources. Because block layer allocates map for all the possible CPUs.
>>>
>>>The actual mapping is done in blk_mq_map_queues(), and it iterates all the
>>>possible CPUs. If we report num_present_cpus(), the rest of the CPUs also
>>>need to be mapped.

Actually I get your point, reporting num_present_cpus() will get less number of struct blk_mq_hw_ctx created. So it saves memory.

If we don't plan to support adding/onlining CPUs, we should use num_present_cpus().

>>>
>>>>>>
>>>>>>Michael
>>>>>>
>>>>>>>
>>>>>>> /*
>>>>>>> * Set the error handler work queue.
>>>>>>> --
>>>>>>> 2.17.1