Re: v4.15 and I/O hang with BFQ

From: Paolo Valente
Date: Mon Feb 05 2018 - 14:07:32 EST




> Il giorno 30 gen 2018, alle ore 16:40, Paolo Valente <paolo.valente@xxxxxxxxxx> ha scritto:
>
>
>
>> Il giorno 30 gen 2018, alle ore 15:40, Ming Lei <ming.lei@xxxxxxxxxx> ha scritto:
>>
>> On Tue, Jan 30, 2018 at 03:30:28PM +0100, Oleksandr Natalenko wrote:
>>> Hi.
>>>
>> ...
>>> systemd-udevd-271 [000] .... 4.311033: bfq_insert_requests: insert
>>> rq->0
>>> systemd-udevd-271 [000] ...1 4.311037: blk_mq_do_dispatch_sched:
>>> not get rq, 1
>>> cfdisk-408 [000] .... 13.484220: bfq_insert_requests: insert
>>> rq->1
>>> kworker/0:1H-174 [000] .... 13.484253: blk_mq_do_dispatch_sched:
>>> not get rq, 1
>>> ===
>>>
>>> Looks the same, right?
>>
>> Yeah, same with before.
>>
>
> Hi guys,
> sorry for the delay with this fix. We are proceeding very slowly on
> this, because I'm super busy. Anyway, now I can at least explain in
> more detail the cause that leads to this hang. Commit 'a6a252e64914
> ("blk-mq-sched: decide how to handle flush rq via RQF_FLUSH_SEQ")'
> makes all non-flush re-prepared requests be re-inserted into the I/O
> scheduler. With this change, I/O schedulers may get the same request
> inserted again, even several times, without a finish_request invoked
> on the request before each re-insertion.
>
> For the I/O scheduler, every such re-prepared request is equivalent
> to the insertion of a new request. For schedulers like mq-deadline
> or kyber this fact causes no problems. In contrast, it confuses a stateful
> scheduler like BFQ, which preserves states for an I/O request until
> finish_request is invoked on it. In particular, BFQ has no way
> to know that the above re-insertions concerns the same, already dispatched
> request. So it may get stuck waiting for the completion of these
> re-inserted requests forever, thus preventing any other queue of
> requests to be served.
>
> We are trying to address this issue by adding the hook requeue_request
> to bfq interface.
>
> Unfortunately, with our current implementation of requeue_request in
> place, bfq eventually gets to an incoherent state. This is apparently
> caused by a requeue of an I/O request, immediately followed by a
> completion of the same request. This seems rather absurd, and drives
> bfq crazy. But this is something for which we don't have definite
> results yet.
>
> We're working on it, sorry again for the delay.
>

Ok, patch arriving ... Please test it.

Thanks,
Paolo

> Thanks,
> Paolo
>
>> --
>> Ming
>
> --
> You received this message because you are subscribed to the Google Groups "bfq-iosched" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to bfq-iosched+unsubscribe@xxxxxxxxxxxxxxxxx
> For more options, visit https://groups.google.com/d/optout.