[PATCH 0/6] blk-mq-sched: support request batch dispatching for sq elevator

From: Yu Kuai
Date: Tue Jul 22 2025 - 03:33:29 EST


From: Yu Kuai <yukuai3@xxxxxxxxxx>

Currently, both mq-deadline and bfq have global spin lock that will be
grabbed inside elevator methods like dispatch_request, insert_requests,
and bio_merge. And the global lock is the main reason mq-deadline and
bfq can't scale very well.

For dispatch_request method, current behavior is dispatching one request at
a time. In the case of multiple dispatching contexts, This behavior, on the
one hand, introduce intense lock contention:

t1: t2: t3:
lock lock lock
// grab lock
ops.dispatch_request
unlock
// grab lock
ops.dispatch_request
unlock
// grab lock
ops.dispatch_request
unlock

on the other hand, messing up the requests dispatching order:
t1:

lock
rq1 = ops.dispatch_request
unlock
t2:
lock
rq2 = ops.dispatch_request
unlock

lock
rq3 = ops.dispatch_request
unlock

lock
rq4 = ops.dispatch_request
unlock

//rq1,rq3 issue to disk
// rq2, rq4 issue to disk

In this case, the elevator dispatch order is rq 1-2-3-4, however,
such order in disk is rq 1-3-2-4, the order for rq2 and rq3 is inversed.

While dispatching request, blk_mq_get_disatpch_budget() and
blk_mq_get_driver_tag() must be called, and they are not ready to be
called inside elevator methods, hence introduce a new method like
dispatch_requests is not possible.

In conclusion, this set factor the global lock out of dispatch_request
method, and support request batch dispatch by calling the methods
multiple time while holding the lock.

nullblk setup:
modprobe null_blk nr_devices=0 &&
udevadm settle &&
cd /sys/kernel/config/nullb &&
mkdir nullb0 &&
cd nullb0 &&
echo 0 > completion_nsec &&
echo 512 > blocksize &&
echo 0 > home_node &&
echo 0 > irqmode &&
echo 128 > submit_queues &&
echo 1024 > hw_queue_depth &&
echo 1024 > size &&
echo 0 > memory_backed &&
echo 2 > queue_mode &&
echo 1 > power ||
exit $?

Test script:
fio -filename=/dev/$disk -name=test -rw=randwrite -bs=4k -iodepth=32 \
-numjobs=16 --iodepth_batch_submit=8 --iodepth_batch_complete=8 \
-direct=1 -ioengine=io_uring -group_reporting -time_based -runtime=30

Test result: iops

| | deadline | bfq |
| --------------- | -------- | -------- |
| before this set | 263k | 124k |
| after this set | 475k | 292k |

Yu Kuai (6):
mq-deadline: switch to use high layer elevator lock
block, bfq: don't grab queue_lock from io path
block, bfq: switch to use elevator lock
elevator: factor elevator lock out of dispatch_request method
blk-mq-sched: refactor __blk_mq_do_dispatch_sched()
blk-mq-sched: support request batch dispatching for sq elevator

block/bfq-cgroup.c | 4 +-
block/bfq-iosched.c | 73 ++++++-------
block/bfq-iosched.h | 2 +-
block/blk-ioc.c | 43 +++++++-
block/blk-mq-sched.c | 240 ++++++++++++++++++++++++++++++-------------
block/blk-mq.h | 21 ++++
block/blk.h | 2 +-
block/elevator.c | 1 +
block/elevator.h | 4 +-
block/mq-deadline.c | 58 +++++------
10 files changed, 293 insertions(+), 155 deletions(-)

--
2.39.2