Re: [RFC PATCH 1/2] blk-mq: don't call callbacks for requests that bypassed the scheduler

From: Ming Lei
Date: Mon Aug 30 2021 - 06:11:35 EST


On Mon, Aug 30, 2021 at 09:48:06AM +0000, Niklas Cassel wrote:
> On Fri, Aug 27, 2021 at 09:28:07PM +0800, Ming Lei wrote:
> > On Fri, Aug 27, 2021 at 12:41:31PM +0000, Niklas Cassel wrote:
> > > From: Niklas Cassel <niklas.cassel@xxxxxxx>
> > >
> > > Currently, __blk_mq_alloc_request() calls ops.prepare_request and sets
> > > RQF_ELVPRIV.
> > >
> > > Therefore, (if the request is not a flush) the RQF_ELVPRIV flag will be
> > > set for the request in blk_mq_submit_bio(), regardless if the request
> > > was submitted to a scheduler, or bypassed the scheduler.
> > >
> > > Later, blk_mq_free_request() checks if the RQF_ELVPRIV flag is set,
> > > if it is, the ops.finish_request callback will be called.
> > >
> > > The problem with this is that the finish_request scheduler callback
> > > will be called for requests that bypassed the scheduler.
> > >
> > > Fix this by calling the scheduler ops.prepare_request callback, and
> > > set the RQF_ELVPRIV flag only immediately before calling the insert
> > > callback.
> >
> > One request could be inserted more than one times, such as requeue,
> > however __blk_mq_alloc_request() is just run once, so is it fine to
> > call ->prepare_request more than one time for same request?
>
> Calling ->prepare_request multiple times is fine.
> All the different I/O schedulers (BFQ, mq-deadline, kyber)
> simply use .prepare_request to clear/set elv->priv to a fixed value.
>
> >
> > Or I am wondering why not call ->prepare_request when the following
> > check is true?
> >
> > if (e && e->type->ops.prepare_request && !op_is_flush(data->cmd_flags) &&
> > !blk_op_is_passthrough(data->cmd_flags))
> > e->type->ops.prepare_request()
>
>
> That might work, and might be a nicer solution indeed.
>
> If a request got plugged, it will be inserted to the scheduler through
> blk_flush_plug_list() -> blk_mq_flush_plug_list() -> blk_mq_sched_insert_requests()
> which will insert them unconditionally.
> In this case. we know that !op_is_flush() (because if it was, blk_mq_submit_bio()
> would have inserted directly.)
>
>
> If we didn't plug, we do blk_mq_sched_insert_request(), which will add it if
> blk_mq_sched_bypass_insert() returns false:
>
> blk_mq_sched_bypass_insert() is defined as:
>
> if ((rq->rq_flags & RQF_FLUSH_SEQ) || blk_rq_is_passthrough(rq))
> return true;
> Also in this case. we know that !op_is_flush() (blk_mq_submit_bio() would have
> inserted directly.)
>
>
> So, we could easily add && !blk_op_is_passthrough(data->cmd_flags) to the
> ->prepare_request condition in blk_mq_rq_ctx_init() like you suggested,
> but since the bypass condition also seems to look at RQF_FLUSH_SEQ, wouldn't
> we need to add RQF_FLUSH_SEQ to the condition in blk_mq_rq_ctx_init() as well?
>
> This flag is set after blk_mq_rq_ctx_init(). Are we sure that RQF_FLUSH_SEQ
> flag will only be set for a request which op_is_flush() returned true?
>
> (If so, then only adding && !blk_op_is_passthrough(data->cmd_flags) should
> be fine.)

BTW, what I meant is the following change, is it fine?

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 0a33d16a7298..f98f8cc05644 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -327,20 +327,6 @@ static struct request *blk_mq_rq_ctx_init(struct blk_mq_alloc_data *data,

data->ctx->rq_dispatched[op_is_sync(data->cmd_flags)]++;
refcount_set(&rq->ref, 1);
-
- if (!op_is_flush(data->cmd_flags)) {
- struct elevator_queue *e = data->q->elevator;
-
- rq->elv.icq = NULL;
- if (e && e->type->ops.prepare_request) {
- if (e->type->icq_cache)
- blk_mq_sched_assign_ioc(rq);
-
- e->type->ops.prepare_request(rq);
- rq->rq_flags |= RQF_ELVPRIV;
- }
- }
-
data->hctx->queued++;
return rq;
}
@@ -359,17 +345,25 @@ static struct request *__blk_mq_alloc_request(struct blk_mq_alloc_data *data)
if (data->cmd_flags & REQ_NOWAIT)
data->flags |= BLK_MQ_REQ_NOWAIT;

- if (e) {
+ if (e && !op_is_flush(data->cmd_flags) &&
+ !blk_op_is_passthrough(data->cmd_flags)) {
/*
* Flush/passthrough requests are special and go directly to the
* dispatch list. Don't include reserved tags in the
* limiting, as it isn't useful.
*/
- if (!op_is_flush(data->cmd_flags) &&
- !blk_op_is_passthrough(data->cmd_flags) &&
- e->type->ops.limit_depth &&
- !(data->flags & BLK_MQ_REQ_RESERVED))
+ if (e->type->ops.limit_depth &&
+ !(data->flags & BLK_MQ_REQ_RESERVED))
e->type->ops.limit_depth(data->cmd_flags, data);
+
+ rq->elv.icq = NULL;
+ if (e->type->ops.prepare_request) {
+ if (e->type->icq_cache)
+ blk_mq_sched_assign_ioc(rq);
+
+ e->type->ops.prepare_request(rq);
+ rq->rq_flags |= RQF_ELVPRIV;
+ }
}

retry:

Thanks,
Ming