Re: [PATCH 2/2] Revert "mq-deadline: Fix request accounting"

From: Niklas Cassel
Date: Wed Sep 08 2021 - 07:57:32 EST


On Tue, Sep 07, 2021 at 10:12:49AM -0700, Bart Van Assche wrote:
> On 9/7/21 9:28 AM, Niklas Cassel wrote:
> > On Tue, Sep 07, 2021 at 08:15:03AM -0700, Bart Van Assche wrote:
> > > On 9/7/21 7:21 AM, Niklas Cassel wrote:
> > > > blk-mq will no longer call the I/O scheduler .finish_request() callback
> > > > for requests that were never inserted to the I/O scheduler.
> > >
> > > I do not agree. Even with patch 1/2 from this series applied, finish_request()
> > > will still be called for requests inserted by blk_insert_cloned_request()
> > > although these requests are never inserted to the I/O scheduler.
> >
> > Looking at blk_mq_free_request(),
> > e->type->ops.finish_request() will only be called if RQF_ELVPRIV
> > is set.
> >
> > blk_insert_cloned_request() doesn't seem to allocate a request
> > itself, but instead takes an already cloned request.
> >
> > So I guess it depends on how the supplied request was cloned.
> >
> > I would assume if the original request doesn't have RQF_ELVPRIV set,
> > then neither will the cloned request?
> >
> > I tried to look at blk_rq_prep_clone(), which seems to be a common
> > cloning function, but I don't see req->rq_flags being copied
> > (except for RQF_SPECIAL_PAYLOAD).
> >
> > Anyway, I don't see how .finish_request() will be called in relation
> > to blk_insert_cloned_request(). Could you please help me out and
> > give me an example of a call chain where this can happen?
>
> Hi Niklas,
>
> This is a bit outside my area of expertise. Anyway: map_request() calls
> .clone_and_map_rq(). At least multipath_clone_and_map() calls
> blk_get_request(). I think this shows that blk_insert_cloned_request()
> may insert an entirely new request. Is my understanding correct that
> blk_mq_rq_ctx_init() will set RQF_ELVPRIV for the cloned request if a
> scheduler is associated with the request queue associated with the
> cloned request?
>
> Bart.

Thank you Bart.


dm-rq.c:map_request() calls .clone_and_map_rq()

one example of a .clone_and_map_rq() implementation is
multipath_clone_and_map().

multipath_clone_and_map(), to get a clone simply does blk_get_request(),
which does blk_mq_alloc_request(), which calls __blk_mq_alloc_request(),
which calls blk_mq_rq_ctx_init(), which will set RQF_ELVPRIV, as long as
the request is not a flush or a passthrough request.

dm-rq.c:dm_dispatch_clone_request() calls blk_insert_cloned_request(),
which will bypass the I/O scheduler.

So, a request cloned by e.g. drivers/md/dm-mpath.c will
bypass the I/O scheduler, but can still have the RQF_ELVPRIV flag set.



This just tells me that if someone wants to clean up this mess,
we have to do something similar to what I proposed in my initial RFC:
https://lore.kernel.org/linux-block/20210827124100.98112-2-Niklas.Cassel@xxxxxxx/

i.e., set RQF_ELVPRIV flag immediately before calling
e->type->ops.insert_requests(), and only then.

It seems logical to simply set the flag before actually inserting the request
to the scheduler, rather than finding all the corner cases where we don't.

Anyway, I'll leave this potential cleanup to people more familiar with the
blk-mq code for now.


Kind regards,
Niklas