Re: [PATCH] blk-mq: allow hardware queue to get more tag while sharing a tag set

From: yukuai (C)
Date: Thu Aug 05 2021 - 21:50:35 EST


On 2021/08/04 2:38, Bart Van Assche wrote:
On 8/2/21 7:57 PM, yukuai (C) wrote:
The cpu I'm testing is Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz, and
after switching to io_uring with "--thread --gtod_reduce=1
--ioscheduler=none", the numbers can increase to 330k, yet still
far behind 6000k.

On https://ark.intel.com/content/www/us/en/ark/products/120485/intel-xeon-gold-6140-processor-24-75m-cache-2-30-ghz.html I found the following information about that CPU:
18 CPU cores
36 hyperthreads

so 36 fio jobs should be sufficient. Maybe IOPS are lower than expected because of how null_blk has been configured? This is the configuration that I used in my test:

modprobe null_blk nr_devices=0 &&
    udevadm settle &&
    cd /sys/kernel/config/nullb &&
    mkdir nullb0 &&
    cd nullb0 &&
    echo 0 > completion_nsec &&
    echo 512 > blocksize &&
    echo 0 > home_node &&
    echo 0 > irqmode &&
    echo 1024 > size &&
    echo 0 > memory_backed &&
    echo 2 > queue_mode &&
    echo 1 > power ||
    exit $?

hi Bart,

After applying this configuration, the number of null_blk in my
machine is about 650k(330k before). Is this still too low?

By the way, there are no performance degradation.

Thanks
Kuai

The new atomic operations in the hot path is atomic_read() from
hctx_may_queue(), and the atomic variable will change in two
situations:

a. fail to get driver tag with dbusy not set, increase and set dbusy.
b. if dbusy is set when queue switch from busy to dile, decrease and
clear dbusy.

During the period a device "idle -> busy -> idle", the new atomic
variable can be writen twice at most, which means this is almost
readonly in the above test situation. So I guess the impact on
performance is minimal ?

Please measure the performance impact of your patch.

Thanks,

Bart.

.