Re: Flush requests not going through IO scheduler

From: Jan Kara
Date: Thu Nov 12 2015 - 08:40:42 EST


On Tue 03-11-15 10:24:12, Jens Axboe wrote:
> On 11/03/2015 10:18 AM, Jeff Moyer wrote:
> >Jens Axboe <axboe@xxxxxxxxx> writes:
> >
> >>>>Certainly, the current behavior is undoubtedly broken. The least
> >>>>intrusive fix would be to kick off scheduling when we add it to the
> >>>>request, but the elevator should handle it. Are you going to be up
> >>>>for hacking up a fix?
> >>>
> >>>I have some trouble understanding what do you mean exactly. Do you think we
> >>>should just call __blk_run_queue() after we add the request to
> >>>q->queue_head?
> >>
> >>No, that won't be enough, as it won't always break out of the idle
> >>logic. We need to ensure that the new request is noticed, so that CFQ
> >>knows and can decide to kick off things.
> >
> >Hmm? __blk_run_queue calls the request_fn, which will call
> >blk_peek_request, which calls __elv_next_request, which will find the
> >request on queue_head. Right?
> >
> > while (1) {
> > if (!list_empty(&q->queue_head)) {
> > rq = list_entry_rq(q->queue_head.next);
> > return rq;
>
> I guess that will bypass the schedule. Ugh, but that's pretty ugly,
> since cfq is still effectively idling. These flush requests really
> should go to an internal scheduler list for dispatch.
>
> But as a quick fix, it might be enough to just kick off the queue
> with blk_run_queue().

So I was looking more into this and in the end tracked this down to be
mostly a blktrace issue. The first thing is: blk_queue_bio() will actually
kick the queue after the flush request is queued but at that moment, there
is only a request for the initial flush queued and that is invisible to
blktrace so it seems the disk is idle although it is not. After this
request completes, we queue & dispatch the request with data which is
visible in blktrace. So in this case requests are dispatched as they
should. The only question I cannot really answer is why the initial flush
is not visible in the block trace - at least trace_block_rq_issue() tracepoint
and corresponding completion should trigger and should be visible... Anyone
has idea?

Also blk_insert_flush() can add request directly to q->queue_head when no
flushing is required. I've sent patch to fix that to go through IO
scheduler but it is mostly a non-issue as usually
generic_make_request_checks() removes FLUSH and FUA flags when they are not
needed.

Honza
--
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/