Re: [RFC PATCH v3 3/3] blk-mq: Lockout tagset iterator when exiting elevator

From: John Garry
Date: Mon Mar 08 2021 - 06:20:11 EST


On 06/03/2021 04:43, Bart Van Assche wrote:
On 3/5/21 7:14 AM, John Garry wrote:
diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 7ff1b20d58e7..5950fee490e8 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -358,11 +358,16 @@ void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset,
{
int i;
+ if (!atomic_inc_not_zero(&tagset->iter_usage_counter))
+ return;
+
for (i = 0; i < tagset->nr_hw_queues; i++) {
if (tagset->tags && tagset->tags[i])
__blk_mq_all_tag_iter(tagset->tags[i], fn, priv,
BT_TAG_ITER_STARTED);
}
+
+ atomic_dec(&tagset->iter_usage_counter);
}
EXPORT_SYMBOL(blk_mq_tagset_busy_iter);

Hi Bart,

This changes the behavior of blk_mq_tagset_busy_iter(). What will e.g.
happen if the mtip driver calls blk_mq_tagset_busy_iter(&dd->tags,
mtip_abort_cmd, dd) concurrently with another blk_mq_tagset_busy_iter()
call and if that causes all mtip_abort_cmd() calls to be skipped?

I'm not sure that I understand this problem you describe. So if blk_mq_tagset_busy_iter(&dd->tags, mtip_abort_cmd, dd) is called, either can happen:
a. normal operation, iter_usage_counter initially holds >= 1, and then iter_usage_counter is incremented in blk_mq_tagset_busy_iter() and we iter the busy tags. Any parallel call to blk_mq_tagset_busy_iter() will also increase iter_usage_counter.
b. we're switching IO scheduler. In this scenario, first we quiesce all queues. After that, there should be no active requests. At that point, we ensure any calls to blk_mq_tagset_busy_iter() are finished and block (or discard may be a better term) any more calls. Blocking any more calls should be safe as there are no requests to iter. atomic_cmpxchg() is used to set iter_usage_counter to 0, blocking any more calls.


+ while (atomic_cmpxchg(&set->iter_usage_counter, 1, 0) != 1);
Isn't it recommended to call cpu_relax() inside busy-waiting loops?

Maybe, but I am considering changing this patch to use percpu_refcnt() - I need to check it further.


blk_mq_sched_free_requests(q);
__elevator_exit(q, e);
+ atomic_set(&set->iter_usage_counter, 1);
Can it happen that the above atomic_set() call happens while a
blk_mq_tagset_busy_iter() call is in progress?

No, as at this point it should be ensured that iter_usage_counter holds 0 from atomic_cmpxchg(), so there should be no active processes in blk_mq_tagset_busy_iter() sensitive region. Calls to blk_mq_tagset_busy_iter() are blocked when iter_usage_counter holds 0.

Should that atomic_set()
call perhaps be changed into an atomic_inc() call?

They have the same affect in practice, but we use atomic_set() in blk_mq_alloc_tag_set(), so at least consistent.

Thanks,
John